Mistral Token Counter

Estimate tokens and cost for Mistral's models. Current Mistral models use Tekken, a tiktoken-based BPE tokenizer with a 128K vocabulary, which replaced the older 32K SentencePiece tokenizer used by the first Mistral and Mixtral releases. Tekken is roughly 30% more token-efficient on source code and many non-English languages. Paste text, pick a model, and the token count, context-window usage, and La Plateforme cost update live — all in your browser.

Default prices are representative La Plateforme list rates; edit them to match your contract or a third-party host. Use the API's usage field for exact billing.

How to use the Mistral Token Counter

Paste your text and pick a model. The estimate, characters-per-token ratio, and context-window usage update as you type. Each Mistral model has a different price and a different context window — switching the model updates both. Enter your expected output length and monthly call volume to see the per-call and per-month cost, and override the price fields if your rate differs.

Mistral Large 2 is the flagship for hard reasoning; Mistral Small is the cost-efficient general model; Mistral Nemo is a small, cheap 12B built with NVIDIA; Codestral targets code completion and fill-in-the-middle; and Ministral 8B is an edge-class model. The token count itself is the same across them — only price, context size, and quality change.

Tekken vs the old SentencePiece tokenizer

The original Mistral 7B and Mixtral models used a SentencePiece tokenizer with a 32,000-token vocabulary, similar in size to Llama 2's. Starting with Mistral Nemo, Mistral switched to Tekken, a byte-pair-encoding tokenizer trained with OpenAI's tiktoken library on a vocabulary of around 128,000 tokens. The larger, code-aware vocabulary made Tekken substantially more efficient: Mistral reports it compresses source code and several languages (including Korean and Arabic) far better than the previous tokenizer, while staying competitive with Llama 3's tokenizer on English.

For English prose the practical ratio is about 3.8–4.0 characters per token, which is what this estimator uses. As always, real BPE is deterministic but content-dependent — code, JSON, and non-Latin scripts tokenize at different rates than plain English — so treat the number as a close estimate rather than an exact bill.

Context windows vary by model: Mistral Large 2, Nemo, and Ministral 8B offer 128K tokens, while Mistral Small and Codestral are commonly deployed at 32K. The context window is shared between the prompt you send and the output the model generates, so this tool shows the percentage of the selected model's window your input occupies, which is the quickest way to tell whether a long document will fit.

Common use cases

  • Costing La Plateforme calls. See the per-call and monthly cost of a prompt on each Mistral tier.
  • Choosing a model. Compare how the same prompt's cost changes between Large, Small, and Nemo.
  • Fitting the context window. Confirm a long input fits a 32K Codestral or a 128K Large window.
  • Code-token budgeting. Estimate how Tekken's code efficiency affects a code-completion prompt.

Frequently asked questions

Which tokenizer does this estimate?

It targets Tekken, the tiktoken-based BPE tokenizer used by Mistral Nemo and newer models, calibrated to about 3.9 characters per token for English. The original Mistral 7B and Mixtral used a 32K SentencePiece tokenizer that encodes English at a similar rate but is much less efficient on code and non-English text.

Is the count exact?

No — it is a heuristic estimate within a few percent for English. Exact counts require running the real Tekken tokenizer with its full merge table. For billing, use the usage field the Mistral API returns with each response.

Why do different Mistral models show different context usage?

Because their context windows differ. Mistral Large 2, Nemo, and Ministral 8B support 128K tokens, while Mistral Small and Codestral are commonly served at 32K. The same token count fills a larger fraction of a smaller window, so switching models changes the percentage shown.

Are the prices official?

The defaults are representative La Plateforme list rates and are fully editable. Mistral models are also available through cloud marketplaces and third-party hosts at different prices, and the open-weight ones (like Nemo and Codestral) can be self-hosted, so always confirm your actual rate.

Does Tekken handle code better?

Yes. Tekken was trained with a code-aware vocabulary and Mistral reports markedly better compression on source code and languages like Korean and Arabic than the old SentencePiece tokenizer, meaning the same code uses fewer tokens and costs less.