ChatML Formatter
Turn a plain conversation into ChatML — the role-tagged prompt format, built from <|im_start|> and <|im_end|> tokens, that many open models expect. Write your turns with simple system:, user:, and assistant: prefixes and get correctly formatted ChatML, optionally with a trailing assistant prompt for generation. Formatted entirely in your browser.
How to use the ChatML Formatter
Write your conversation in the input box, starting each message with its role followed by a colon: system:, user:, or assistant:. A message can span multiple lines — any line that does not start with a recognized role prefix is treated as a continuation of the previous turn, so multi-line content works naturally. The ChatML output regenerates as you type.
Leave Append assistant prompt checked when you are building a prompt to send to a model — it adds the opening <|im_start|>assistant tag at the end so the model knows to continue as the assistant. Uncheck it when you are formatting a complete, finished transcript. The trailing newline option matches the whitespace convention most tokenizers expect. Copy the result straight into your API call, a raw completion request, or a tokenizer to inspect.
What ChatML is
ChatML (Chat Markup Language) is a convention for encoding a multi-turn conversation as a single string of tokens. Each message is wrapped in special tokens: it opens with <|im_start|> followed by the role name and a newline, contains the message content, and closes with <|im_end|>. "im" stands for "instant message". The format originated with OpenAI's chat models and was adopted as the default chat template for many open-weight families, including Qwen and a number of fine-tunes built on the format.
The point of ChatML is to give the model an unambiguous structure for who said what. Without role markers, a model cannot reliably tell the system instruction from the user's question or its own previous answers. The special tokens are part of the tokenizer's vocabulary, so they are single tokens the model learns to recognize as turn boundaries — not ordinary text it might generate by accident. When you call a chat API, the server applies a template like this for you; you only need the raw form when you are using a base completion endpoint, debugging a tokenizer, or building prompts for a local model that expects ChatML directly.
Different model families use different templates — Llama 3 uses its own header tokens, Mistral uses [INST] blocks, Gemma uses <start_of_turn> — so ChatML is one convention among several, not a universal standard. Always check which template your specific model was trained with; sending ChatML to a model that expects a different format degrades quality. This formatter produces canonical ChatML, which is correct for ChatML-trained models and a useful reference for understanding how chat templates work in general.
Common use cases
- Raw completion endpoints. Format a conversation by hand when you are calling a base completion API rather than a chat endpoint.
- Local model prompting. Build correctly tagged prompts for a ChatML-trained model running in llama.cpp or similar.
- Debugging templates. See exactly what the special tokens look like to diagnose a misbehaving chat pipeline.
- Building few-shot prompts. Assemble a multi-turn example block in the precise format the model was trained on.