Fine-tuning JSONL Validator
Catch the errors that make a fine-tuning upload fail before you spend a job on it. Paste your JSONL dataset — one example per line, in either the chat messages format or the legacy prompt/completion format — and every line is checked for valid JSON, a recognised structure, correct roles, and non-empty content. It flags examples with no assistant turn to train on, conversations that start on the wrong role, exact-duplicate lines, and bad weights, then summarises the dataset with an example count, the detected format, a rough token estimate and the role distribution. Everything runs in your browser, so your training data never leaves your machine.
How to use the Fine-tuning JSONL Validator
Paste your dataset with one JSON object per line — the JSONL format every major fine-tuning API expects. The validator auto-detects whether you're using the modern chat format (each line an object with a messages array of role/content turns) or the legacy prompt/completion format, and checks each line against the right rules. For chat examples it confirms messages is a non-empty array, every turn has a known role (system, developer, user, assistant, tool or function) and some content or tool calls, that there is at least one assistant message to learn from, and that the conversation doesn't open on an assistant turn. For prompt/completion it checks both fields are strings and warns when a completion lacks the conventional leading space.
Each problem is reported against its line number, with errors (which would break the upload) separated from warnings (things worth checking but not fatal). The summary cards give you the totals at a glance: how many examples are valid versus broken, the detected format, a rough token estimate (characters ÷ 4) for budgeting the training cost, the number of exact-duplicate lines, and how the roles are distributed across the set. Because everything is parsed and validated locally, you can safely run a real, private dataset through it — nothing is uploaded — and fix the flagged lines before sending the file to OpenAI, Together, Fireworks or whichever platform you're training on.
Why fine-tuning datasets fail validation
Supervised fine-tuning expects training data as JSONL: a plain-text file with one self-contained JSON example per line. The format is deliberately simple so it streams cheaply, but that simplicity means a single malformed line — a trailing comma, an unescaped quote, a stray blank — can fail the whole upload, and the error a platform returns is often just a line number with little explanation. Validating locally first turns a slow remote round-trip into an instant local check, and surfaces the structural problems that a bare JSON parser won't even notice.
The dominant format today is the chat schema: each line is an object with a messages array, and each message has a role and content, mirroring the conversation format used at inference. The most common mistakes are semantic rather than syntactic. An example with no assistant message gives the model nothing to learn to produce — it's valid JSON but useless for training. A conversation that begins with an assistant turn, a typo'd role like "asistant", an empty content string, or a stray weight value other than 0 or 1 will all either error out or quietly degrade the run. The older prompt/completion format is simpler — two string fields per line — but has its own convention, like the leading space that often belongs at the start of a completion, that's easy to forget.
Beyond per-line correctness, a few dataset-level properties are worth knowing before you train. Duplicates waste compute and can skew the model toward repeated examples; spotting exact-match lines is a cheap way to catch a botched export. A token estimate, even the rough characters-÷-4 approximation, lets you sanity-check the size of the job and its likely cost before committing. And the role distribution is a quick signal that the data is shaped the way you intended — that there are roughly as many assistant turns as you expect, that system prompts appear where they should. None of this replaces the platform's own validation, which applies the exact tokenizer and limits, but catching the obvious failures locally means the version you upload is far more likely to train on the first try.
Common use cases
- Pre-upload checks. Catch malformed JSON, bad roles and missing assistant turns before a fine-tuning job fails.
- Export debugging. Spot duplicate or empty lines from a botched dataset export.
- Cost estimation. Get a rough token count to budget a training run before submitting it.
- Format conversion. Confirm a dataset is consistently chat or prompt/completion, not a mix.