OpenAI Token Counter
Count tokens for any OpenAI model — GPT-5, GPT-4o, GPT-4.1, the o-series, GPT-3.5 — using the official
tiktoken tokenizer compiled to WebAssembly. Get exact counts, per-message breakdowns, and
cost estimates against current list prices. Runs entirely in your browser; your text never leaves the
page.
Try an example
How to use the OpenAI Token Counter
Paste any text into the input area, pick the model you'll be calling, and press Count tokens. The tool returns the token count, the character count, the average characters per token, and (if the toggle is on) a cost estimate against current list pricing.
For a more useful workflow when you're sizing prompts, turn on Visualize tokens in the options row. The visualization colors each token a different shade so you can see exactly how the tokenizer broke up your text. This is the fastest way to learn why "tokenization" splits into one token while "tokenizer" splits into two, or why a punctuation-heavy prompt is more expensive than prose with the same character count.
If you're deciding which model to use for a workload, the Compare models tab runs
the same text through every OpenAI tokenizer variant in one go. o200k_base (used by
GPT-5, GPT-4o, GPT-4.1, o-series) is usually 5-15% cheaper per token than the older
cl100k_base for English, and noticeably better for non-English languages — sometimes
2-3x fewer tokens for the same Mandarin or Japanese text.
The Bulk tab takes one input per line and returns counts for each, which is useful when you're estimating the token cost of a dataset: paste a CSV column of user messages, get back per-row counts and a total.
What is a token in an LLM context?
A token is the unit of text an LLM actually sees. The tokenizer splits your input string into a sequence of integer IDs that map to pieces of words — sometimes whole words, sometimes sub-word fragments, sometimes single characters or bytes. The model never sees the raw string; it only sees the sequence of token IDs.
OpenAI uses Byte Pair Encoding (BPE). The tiktoken library is the official open-source
implementation. For modern models it ships two main encodings:
cl100k_base— ~100,000 tokens. Used by GPT-3.5 Turbo, GPT-4, GPT-4 Turbo. The vocabulary heavily favors English.o200k_base— ~200,000 tokens. Used by GPT-4o, GPT-4o Mini, GPT-4.1, the o-series, and GPT-5. Roughly twice the vocabulary, including many more multilingual and code tokens. The larger vocabulary means more text fits in fewer tokens, which is one reason GPT-4o is meaningfully cheaper per "useful chunk of text" than GPT-4 Turbo despite the published per-token pricing being similar.
A common rule of thumb is "1 token ≈ 4 characters in English." That's roughly right but masks a lot
of variation. Code with lots of punctuation and short identifiers tokenizes around 2-3 characters per
token. Long technical English with many compound words is more like 5-6 characters per token. CJK
text under cl100k_base often runs one token per character; the same text under
o200k_base can be 2-4 characters per token because more Chinese/Japanese/Korean
sub-strings made it into the larger vocabulary.
Why token count matters
- Cost. OpenAI bills per million tokens. Knowing the count up front lets you size budgets, decide whether to compress a prompt, or pick a cheaper model.
- Context window. Every model has a hard input limit (e.g., 400K tokens for GPT-5, 128K for GPT-4o). If your prompt + expected output + tool definitions exceed it, the API rejects the request.
- Latency. Time-to-first-token scales with input length, and total response time scales with output length. For real-time UIs, every reduction in input tokens shaves perceived latency.
- Quality. Past a point, longer prompts degrade quality. The model attends less reliably to information buried 80K tokens into a 100K-token input. Keeping prompts trim isn't only about cost; it's also about behavior.
Tokenization across OpenAI models
For the sentence "The OpenAI tokenizer breaks text into pieces called tokens", the counts come out approximately as follows:
| Model | Encoding | Tokens |
|---|---|---|
| GPT-5 / GPT-4o / GPT-4.1 / o-series | o200k_base | 11 |
| GPT-4 / GPT-4 Turbo | cl100k_base | 12 |
| GPT-3.5 Turbo | cl100k_base | 12 |
| Text-davinci-003 (legacy) | p50k_base | 13 |
The English-text difference is small. The difference is large for non-English, code, and structured data. For a multilingual workload, switching from GPT-3.5 Turbo to GPT-4o can be 30-40% cheaper per equivalent text, before you even factor in the per-token price difference.
Common use cases for a token counter
- Pre-flight budgeting. Pasting a few sample prompts to estimate per-call cost, then multiplying by expected daily volume to size a monthly bill before turning on traffic.
- Prompt compression. Iterating on a long system prompt and re-counting to see whether a rewrite saved tokens. Useful when you're trying to get under a context-window threshold or below a cached-input pricing cliff.
- Cache hit sizing. Recent OpenAI models discount cached input dramatically (typically 90%). Knowing the boundary between cacheable prefix and per-request suffix tells you what fraction of each call benefits from the discount.
- Tool-schema overhead. Counting just the tool definitions JSON separately from the prompt to see how much each enabled tool costs you per call.
- RAG chunk sizing. Tuning your retrieval chunk size in tokens (rather than characters) so chunks fit the model's preferred attention window without leaving capacity wasted.
- Localization cost analysis. Comparing the same content in different languages to see where the tokenization tax is highest and whether a translation strategy is worth the cost.
What's actually running
Under the hood this tool loads the tiktoken WASM build on first use (about 1.2 MB,
cached after the first load). The encoding is selected per model from a lookup table:
- GPT-5, GPT-5 Mini, GPT-5 Nano, GPT-4o, GPT-4o Mini, GPT-4.1, o1, o3, o3-mini →
o200k_base - GPT-4, GPT-4 Turbo, GPT-3.5 Turbo →
cl100k_base
For "Visualize tokens," the tokenizer returns the byte ranges; the UI maps each range to a colored span. The token IDs themselves aren't exposed in the UI (they're not useful for most workflows) but they're available in the underlying API if you want to fork the JS.
Frequently asked questions
How accurate is this token count compared to OpenAI's billing?
tiktoken library compiled to WebAssembly, with the exact encoding (o200k_base) that GPT-4o, GPT-4.1, and GPT-5 use. For plain text the count matches OpenAI's metered billing exactly. Tool calls and structured outputs add a small constant overhead (typically 7-20 tokens depending on the tool schema) that this counter does not model, so for tool-heavy workloads expect to be within 5% of the billed number.Does this tool send my text to OpenAI or to your server?
What's the difference between cl100k_base and o200k_base?
cl100k_base (about 100K tokens) is used by GPT-3.5 Turbo and GPT-4 / GPT-4 Turbo. o200k_base (about 200K tokens) is the newer vocabulary used by GPT-4o, GPT-4.1, o-series, and GPT-5. o200k_base generally produces fewer tokens for the same English text — typically 5-15% fewer — and significantly fewer for non-English text. The tool picks the right encoding automatically based on the model you select.