ChatML Formatter

Turn a plain conversation into ChatML — the role-tagged prompt format, built from <|im_start|> and <|im_end|> tokens, that many open models expect. Write your turns with simple system:, user:, and assistant: prefixes and get correctly formatted ChatML, optionally with a trailing assistant prompt for generation. Formatted entirely in your browser.

ChatML output

How to use the ChatML Formatter

Write your conversation in the input box, starting each message with its role followed by a colon: system:, user:, or assistant:. A message can span multiple lines — any line that does not start with a recognized role prefix is treated as a continuation of the previous turn, so multi-line content works naturally. The ChatML output regenerates as you type.

Leave Append assistant prompt checked when you are building a prompt to send to a model — it adds the opening <|im_start|>assistant tag at the end so the model knows to continue as the assistant. Uncheck it when you are formatting a complete, finished transcript. The trailing newline option matches the whitespace convention most tokenizers expect. Copy the result straight into your API call, a raw completion request, or a tokenizer to inspect.

What ChatML is

ChatML (Chat Markup Language) is a convention for encoding a multi-turn conversation as a single string of tokens. Each message is wrapped in special tokens: it opens with <|im_start|> followed by the role name and a newline, contains the message content, and closes with <|im_end|>. "im" stands for "instant message". The format originated with OpenAI's chat models and was adopted as the default chat template for many open-weight families, including Qwen and a number of fine-tunes built on the format.

The point of ChatML is to give the model an unambiguous structure for who said what. Without role markers, a model cannot reliably tell the system instruction from the user's question or its own previous answers. The special tokens are part of the tokenizer's vocabulary, so they are single tokens the model learns to recognize as turn boundaries — not ordinary text it might generate by accident. When you call a chat API, the server applies a template like this for you; you only need the raw form when you are using a base completion endpoint, debugging a tokenizer, or building prompts for a local model that expects ChatML directly.

Different model families use different templates — Llama 3 uses its own header tokens, Mistral uses [INST] blocks, Gemma uses <start_of_turn> — so ChatML is one convention among several, not a universal standard. Always check which template your specific model was trained with; sending ChatML to a model that expects a different format degrades quality. This formatter produces canonical ChatML, which is correct for ChatML-trained models and a useful reference for understanding how chat templates work in general.

Common use cases

  • Raw completion endpoints. Format a conversation by hand when you are calling a base completion API rather than a chat endpoint.
  • Local model prompting. Build correctly tagged prompts for a ChatML-trained model running in llama.cpp or similar.
  • Debugging templates. See exactly what the special tokens look like to diagnose a misbehaving chat pipeline.
  • Building few-shot prompts. Assemble a multi-turn example block in the precise format the model was trained on.

Frequently asked questions

Which models use ChatML?

ChatML is the default chat template for the Qwen family and many community fine-tunes, and it originated with OpenAI chat models. Other families use different templates — Llama 3 has its own header tokens, Mistral uses [INST] blocks, Gemma uses start_of_turn — so confirm your model was trained on ChatML before using it.

What does "append assistant prompt" do?

It adds a trailing <|im_start|>assistant tag with no closing tag, signaling the model to begin generating its reply. Include it when sending a prompt for completion; omit it when you are formatting a finished transcript that should not be continued.

Are im_start and im_end real tokens?

Yes. In ChatML-trained models these are special tokens in the tokenizer vocabulary, each encoded as a single token rather than as the literal characters. That is what lets the model treat them as reliable turn boundaries instead of ordinary text.

Do I need ChatML when using a chat API?

No. Chat completion endpoints apply the correct template server-side from your structured messages array. You only need raw ChatML for base completion endpoints, local inference, tokenizer debugging, or when constructing prompts manually.

Can a message contain multiple lines?

Yes. Only lines that begin with a recognized role prefix (system:, user:, assistant:) start a new turn; every other line is appended to the current message, so multi-line and multi-paragraph content is preserved correctly.
Embed this tool on your site

Free to embed, no attribution required (but appreciated). Paste this where you want the tool to appear: