Mistral Token Counter
Estimate tokens and cost for Mistral's models. Current Mistral models use Tekken, a tiktoken-based BPE tokenizer with a 128K vocabulary, which replaced the older 32K SentencePiece tokenizer used by the first Mistral and Mixtral releases. Tekken is roughly 30% more token-efficient on source code and many non-English languages. Paste text, pick a model, and the token count, context-window usage, and La Plateforme cost update live — all in your browser.
Default prices are representative La Plateforme list rates; edit them to match your contract or a third-party host. Use the API's usage field for exact billing.
How to use the Mistral Token Counter
Paste your text and pick a model. The estimate, characters-per-token ratio, and context-window usage update as you type. Each Mistral model has a different price and a different context window — switching the model updates both. Enter your expected output length and monthly call volume to see the per-call and per-month cost, and override the price fields if your rate differs.
Mistral Large 2 is the flagship for hard reasoning; Mistral Small is the cost-efficient general model; Mistral Nemo is a small, cheap 12B built with NVIDIA; Codestral targets code completion and fill-in-the-middle; and Ministral 8B is an edge-class model. The token count itself is the same across them — only price, context size, and quality change.
Tekken vs the old SentencePiece tokenizer
The original Mistral 7B and Mixtral models used a SentencePiece tokenizer with a 32,000-token vocabulary, similar in size to Llama 2's. Starting with Mistral Nemo, Mistral switched to Tekken, a byte-pair-encoding tokenizer trained with OpenAI's tiktoken library on a vocabulary of around 128,000 tokens. The larger, code-aware vocabulary made Tekken substantially more efficient: Mistral reports it compresses source code and several languages (including Korean and Arabic) far better than the previous tokenizer, while staying competitive with Llama 3's tokenizer on English.
For English prose the practical ratio is about 3.8–4.0 characters per token, which is what this estimator uses. As always, real BPE is deterministic but content-dependent — code, JSON, and non-Latin scripts tokenize at different rates than plain English — so treat the number as a close estimate rather than an exact bill.
Context windows vary by model: Mistral Large 2, Nemo, and Ministral 8B offer 128K tokens, while Mistral Small and Codestral are commonly deployed at 32K. The context window is shared between the prompt you send and the output the model generates, so this tool shows the percentage of the selected model's window your input occupies, which is the quickest way to tell whether a long document will fit.
Common use cases
- Costing La Plateforme calls. See the per-call and monthly cost of a prompt on each Mistral tier.
- Choosing a model. Compare how the same prompt's cost changes between Large, Small, and Nemo.
- Fitting the context window. Confirm a long input fits a 32K Codestral or a 128K Large window.
- Code-token budgeting. Estimate how Tekken's code efficiency affects a code-completion prompt.