OpenAI Token Counter

Count tokens for any OpenAI model — GPT-5, GPT-4o, GPT-4.1, the o-series, GPT-3.5 — using the official tiktoken tokenizer compiled to WebAssembly. Get exact counts, per-message breakdowns, and cost estimates against current list prices. Runs entirely in your browser; your text never leaves the page.

Advanced options

Paste text and click Count tokens.

Try an example

How to use the OpenAI Token Counter

Paste any text into the input area, pick the model you'll be calling, and press Count tokens. The tool returns the token count, the character count, the average characters per token, and (if the toggle is on) a cost estimate against current list pricing.

For a more useful workflow when you're sizing prompts, turn on Visualize tokens in the options row. The visualization colors each token a different shade so you can see exactly how the tokenizer broke up your text. This is the fastest way to learn why "tokenization" splits into one token while "tokenizer" splits into two, or why a punctuation-heavy prompt is more expensive than prose with the same character count.

If you're deciding which model to use for a workload, the Compare models tab runs the same text through every OpenAI tokenizer variant in one go. o200k_base (used by GPT-5, GPT-4o, GPT-4.1, o-series) is usually 5-15% cheaper per token than the older cl100k_base for English, and noticeably better for non-English languages — sometimes 2-3x fewer tokens for the same Mandarin or Japanese text.

The Bulk tab takes one input per line and returns counts for each, which is useful when you're estimating the token cost of a dataset: paste a CSV column of user messages, get back per-row counts and a total.

What is a token in an LLM context?

A token is the unit of text an LLM actually sees. The tokenizer splits your input string into a sequence of integer IDs that map to pieces of words — sometimes whole words, sometimes sub-word fragments, sometimes single characters or bytes. The model never sees the raw string; it only sees the sequence of token IDs.

OpenAI uses Byte Pair Encoding (BPE). The tiktoken library is the official open-source implementation. For modern models it ships two main encodings:

  • cl100k_base — ~100,000 tokens. Used by GPT-3.5 Turbo, GPT-4, GPT-4 Turbo. The vocabulary heavily favors English.
  • o200k_base — ~200,000 tokens. Used by GPT-4o, GPT-4o Mini, GPT-4.1, the o-series, and GPT-5. Roughly twice the vocabulary, including many more multilingual and code tokens. The larger vocabulary means more text fits in fewer tokens, which is one reason GPT-4o is meaningfully cheaper per "useful chunk of text" than GPT-4 Turbo despite the published per-token pricing being similar.

A common rule of thumb is "1 token ≈ 4 characters in English." That's roughly right but masks a lot of variation. Code with lots of punctuation and short identifiers tokenizes around 2-3 characters per token. Long technical English with many compound words is more like 5-6 characters per token. CJK text under cl100k_base often runs one token per character; the same text under o200k_base can be 2-4 characters per token because more Chinese/Japanese/Korean sub-strings made it into the larger vocabulary.

Why token count matters

  • Cost. OpenAI bills per million tokens. Knowing the count up front lets you size budgets, decide whether to compress a prompt, or pick a cheaper model.
  • Context window. Every model has a hard input limit (e.g., 400K tokens for GPT-5, 128K for GPT-4o). If your prompt + expected output + tool definitions exceed it, the API rejects the request.
  • Latency. Time-to-first-token scales with input length, and total response time scales with output length. For real-time UIs, every reduction in input tokens shaves perceived latency.
  • Quality. Past a point, longer prompts degrade quality. The model attends less reliably to information buried 80K tokens into a 100K-token input. Keeping prompts trim isn't only about cost; it's also about behavior.

Tokenization across OpenAI models

For the sentence "The OpenAI tokenizer breaks text into pieces called tokens", the counts come out approximately as follows:

ModelEncodingTokens
GPT-5 / GPT-4o / GPT-4.1 / o-serieso200k_base11
GPT-4 / GPT-4 Turbocl100k_base12
GPT-3.5 Turbocl100k_base12
Text-davinci-003 (legacy)p50k_base13

The English-text difference is small. The difference is large for non-English, code, and structured data. For a multilingual workload, switching from GPT-3.5 Turbo to GPT-4o can be 30-40% cheaper per equivalent text, before you even factor in the per-token price difference.

Common use cases for a token counter

  • Pre-flight budgeting. Pasting a few sample prompts to estimate per-call cost, then multiplying by expected daily volume to size a monthly bill before turning on traffic.
  • Prompt compression. Iterating on a long system prompt and re-counting to see whether a rewrite saved tokens. Useful when you're trying to get under a context-window threshold or below a cached-input pricing cliff.
  • Cache hit sizing. Recent OpenAI models discount cached input dramatically (typically 90%). Knowing the boundary between cacheable prefix and per-request suffix tells you what fraction of each call benefits from the discount.
  • Tool-schema overhead. Counting just the tool definitions JSON separately from the prompt to see how much each enabled tool costs you per call.
  • RAG chunk sizing. Tuning your retrieval chunk size in tokens (rather than characters) so chunks fit the model's preferred attention window without leaving capacity wasted.
  • Localization cost analysis. Comparing the same content in different languages to see where the tokenization tax is highest and whether a translation strategy is worth the cost.

What's actually running

Under the hood this tool loads the tiktoken WASM build on first use (about 1.2 MB, cached after the first load). The encoding is selected per model from a lookup table:

  • GPT-5, GPT-5 Mini, GPT-5 Nano, GPT-4o, GPT-4o Mini, GPT-4.1, o1, o3, o3-mini → o200k_base
  • GPT-4, GPT-4 Turbo, GPT-3.5 Turbo → cl100k_base

For "Visualize tokens," the tokenizer returns the byte ranges; the UI maps each range to a colored span. The token IDs themselves aren't exposed in the UI (they're not useful for most workflows) but they're available in the underlying API if you want to fork the JS.

Frequently asked questions

How accurate is this token count compared to OpenAI's billing?

It uses the official tiktoken library compiled to WebAssembly, with the exact encoding (o200k_base) that GPT-4o, GPT-4.1, and GPT-5 use. For plain text the count matches OpenAI's metered billing exactly. Tool calls and structured outputs add a small constant overhead (typically 7-20 tokens depending on the tool schema) that this counter does not model, so for tool-heavy workloads expect to be within 5% of the billed number.

Does this tool send my text to OpenAI or to your server?

No. The tokenizer runs as a WebAssembly module inside your browser. There are zero network requests when you click "Count." You can verify in the DevTools Network tab — open it before clicking, then click, and confirm no requests fire. This makes the tool safe to use with sensitive prompts, customer data, or production credentials.

What's the difference between cl100k_base and o200k_base?

They're different tokenizer vocabularies. cl100k_base (about 100K tokens) is used by GPT-3.5 Turbo and GPT-4 / GPT-4 Turbo. o200k_base (about 200K tokens) is the newer vocabulary used by GPT-4o, GPT-4.1, o-series, and GPT-5. o200k_base generally produces fewer tokens for the same English text — typically 5-15% fewer — and significantly fewer for non-English text. The tool picks the right encoding automatically based on the model you select.

Can I use this for cost estimation?

Yes, that's what the "Show cost estimate" toggle in advanced options is for. Enter your expected monthly call volume and the tool multiplies tokens by current per-million-token pricing. The numbers reflect OpenAI's public list pricing as of the date shown in the model database; verify in your OpenAI billing dashboard before relying on these for budgeting.

Why is the count different from what I see in the OpenAI playground?

The playground sometimes adds system message overhead and conversation formatting tokens that vary by API endpoint (Chat Completions vs Responses). This tool counts the raw input you paste. To match what the playground bills, paste only the user message content without the role wrappers; the per-message overhead is usually 3-7 tokens depending on role.

What about Claude, Gemini, or Llama tokens?

Each LLM family uses a different tokenizer. Use the Claude Token Counter for Anthropic models, the Gemini Token Counter for Google models, and the Multi-Model Token Comparison tool to see all of them side by side for the same text.