ShareGPT ↔ OpenAI Dataset Converter
Convert conversation datasets between ShareGPT format ({conversations:[{from,value}]}) and OpenAI messages format ({messages:[{role,content}]}) in both directions. Accepts a single object, a JSON array, or JSONL — one conversation per line — and preserves the input shape on output.
How to use the ShareGPT ↔ OpenAI Dataset Converter
Paste your data into the top textarea. The tool accepts three input shapes:
- A single JSON object — one conversation.
- A JSON array — multiple conversations wrapped in
[...]. - JSONL — one conversation per line (no wrapping brackets).
Select the conversion direction, then click Convert. The output preserves the input shape: a single object in produces a single object out; an array in produces an array out; JSONL in produces JSONL out.
Role mapping: human ↔ user, gpt ↔ assistant, system ↔ system. Any unrecognized role is passed through unchanged with a comment in the output. Lines that cannot be parsed are reported as errors.
ShareGPT and OpenAI conversation formats
ShareGPT is a community dataset format popularized by the ShareGPT Chrome extension and widely used in open-source fine-tuning datasets (Vicuna, Alpaca, WizardLM, etc.). A ShareGPT conversation object has a conversations array where each turn has a from field (the role: "human", "gpt", or "system") and a value field (the content). The format is widely distributed on Hugging Face as JSONL files.
OpenAI's fine-tuning format uses a messages array with role (user/assistant/system) and content. This is also the format accepted by the Chat Completions API at inference time and by the OpenAI fine-tuning endpoint. Many fine-tuning frameworks (Axolotl, LLaMA-Factory, Unsloth) accept either format, but the OpenAI SDK and fine-tuning validation tools expect the OpenAI format.
Converting between the two is mechanically simple but error-prone at scale — role name mismatches, field name typos, and structural differences (conversations vs messages) cause silent failures in training pipelines. This tool handles the conversion with explicit error reporting per invalid line, so you can identify and fix bad records before feeding a dataset to a fine-tuning job.
Common use cases
- Fine-tuning dataset prep — convert Hugging Face ShareGPT JSONL datasets to OpenAI format for fine-tuning with the OpenAI API or Axolotl.
- Dataset merging — standardize multiple datasets from different sources into a single format before combining them.
- Synthetic data export — convert OpenAI conversation logs back to ShareGPT for community sharing or dataset contribution.
- Pipeline debugging — validate that role names are mapped correctly and no turns are lost in conversion.
- Framework compatibility — convert to whichever format a specific fine-tuning framework or evaluation library expects.