ShareGPT ↔ OpenAI Dataset Converter

Convert conversation datasets between ShareGPT format ({conversations:[{from,value}]}) and OpenAI messages format ({messages:[{role,content}]}) in both directions. Accepts a single object, a JSON array, or JSONL — one conversation per line — and preserves the input shape on output.

How to use the ShareGPT ↔ OpenAI Dataset Converter

Paste your data into the top textarea. The tool accepts three input shapes:

  • A single JSON object — one conversation.
  • A JSON array — multiple conversations wrapped in [...].
  • JSONL — one conversation per line (no wrapping brackets).

Select the conversion direction, then click Convert. The output preserves the input shape: a single object in produces a single object out; an array in produces an array out; JSONL in produces JSONL out.

Role mapping: humanuser, gptassistant, systemsystem. Any unrecognized role is passed through unchanged with a comment in the output. Lines that cannot be parsed are reported as errors.

ShareGPT and OpenAI conversation formats

ShareGPT is a community dataset format popularized by the ShareGPT Chrome extension and widely used in open-source fine-tuning datasets (Vicuna, Alpaca, WizardLM, etc.). A ShareGPT conversation object has a conversations array where each turn has a from field (the role: "human", "gpt", or "system") and a value field (the content). The format is widely distributed on Hugging Face as JSONL files.

OpenAI's fine-tuning format uses a messages array with role (user/assistant/system) and content. This is also the format accepted by the Chat Completions API at inference time and by the OpenAI fine-tuning endpoint. Many fine-tuning frameworks (Axolotl, LLaMA-Factory, Unsloth) accept either format, but the OpenAI SDK and fine-tuning validation tools expect the OpenAI format.

Converting between the two is mechanically simple but error-prone at scale — role name mismatches, field name typos, and structural differences (conversations vs messages) cause silent failures in training pipelines. This tool handles the conversion with explicit error reporting per invalid line, so you can identify and fix bad records before feeding a dataset to a fine-tuning job.

Common use cases

  • Fine-tuning dataset prep — convert Hugging Face ShareGPT JSONL datasets to OpenAI format for fine-tuning with the OpenAI API or Axolotl.
  • Dataset merging — standardize multiple datasets from different sources into a single format before combining them.
  • Synthetic data export — convert OpenAI conversation logs back to ShareGPT for community sharing or dataset contribution.
  • Pipeline debugging — validate that role names are mapped correctly and no turns are lost in conversion.
  • Framework compatibility — convert to whichever format a specific fine-tuning framework or evaluation library expects.

Frequently asked questions

What roles does ShareGPT format use?

ShareGPT uses "human" (maps to user), "gpt" (maps to assistant), and "system" (maps to system). Some datasets also use "bing", "bard", or custom model names in the "from" field — these are passed through unchanged with a warning.

Does the converter handle multi-turn conversations?

Yes — conversations with any number of turns are fully supported. Each turn in the input becomes the corresponding turn in the output array, in order.

What happens if a line has invalid JSON?

The converter reports the line number and the parse error in the output, then continues processing remaining lines. This lets you identify and fix bad records without losing the rest of the conversion.

Can I convert a file with thousands of conversations?

Yes — JSONL mode processes each line independently so large files are handled efficiently in the browser. For very large files (hundreds of MB), performance may degrade; consider splitting before pasting.