LLM Context Window Comparison

Compare the context windows of today's major language models and see how much text each one can hold. Enter a token or word count and the table shows which models fit it and how much of their window it fills. Useful for deciding whether a long document, codebase, or chat history will fit before you commit to a model. All calculation runs in your browser.

Amount Unit Sort Only models that fit

Conversions assume ~4 characters and ~0.75 words per token (English). Context windows are shared between prompt and output, so leave headroom for the response. Figures reflect each model's standard published window.

How to use the LLM Context Window Comparison

Enter how much text you want to send and choose the unit — tokens, words, characters, or pages. The table converts your input to tokens and shows each model's context window, the percentage your input would fill, and whether it fits with room to spare. Sort by window size or name, and tick "only models that fit" to filter the list down to viable choices.

Remember the window is shared between your prompt and the model's reply. A prompt that fills 95% of the window leaves almost no room for output, so treat anything above roughly 80% as a tight fit rather than a comfortable one.

What a context window is — and why it caps your prompt

A model's context window is the maximum number of tokens it can attend to at once — the hard ceiling on prompt plus generated output combined. If your input plus the expected answer exceeds the window, the request is rejected or the oldest tokens are silently dropped, so knowing the limit ahead of time saves a wasted call.

Windows vary enormously across the current generation. Gemini's models lead with one-to-two-million-token windows; most frontier chat models (GPT-4o, Claude, Llama 3, Mistral, Qwen, DeepSeek, Grok) cluster around 128K–200K; and some efficient open models like Gemma 2 stay as low as 8K. A million-token window sounds limitless, but a single large codebase or a long PDF can still consume hundreds of thousands of tokens, and cost and latency both climb with the amount of context you actually use.

Because tokenizers differ, the same text is a slightly different number of tokens on each model — this tool uses the common English approximation of about four characters per token. For an exact count on a specific model, use that model's dedicated token counter or the token count its API returns.

Common use cases

Picking a model for a long document. See at a glance which models can hold your full input.
RAG sizing. Check whether your retrieved chunks plus the question fit a target model's window.
Codebase prompts. Estimate whether a large source tree fits before paying for a long call.
Comparing upgrades. Weigh a jump from a 128K to a 1M window against the higher cost it brings.

Frequently asked questions

Is the prompt the only thing that uses the window?

No. The context window is shared between your input and the model's generated output. If the window is 128K and your prompt is 120K tokens, only about 8K remain for the answer, so always leave headroom for the response you expect.

How accurate is the token conversion?

It uses the common English approximation of about four characters or 0.75 words per token. Real tokenizers vary by a few percent and differ between models, so use a model-specific token counter when you need an exact number.

Why do some models have tiny windows?

Smaller windows reduce memory use and serving cost. Efficient open models such as Gemma 2 ship with an 8K window to stay light, which is fine for short prompts but overflows quickly on long documents.

Does a bigger window cost more?

You only pay for the tokens you actually send and receive, not the maximum window size. But filling a large window means sending many tokens, so long-context calls are inherently more expensive and slower than short ones.

Are these the largest available windows?

They reflect each model's standard published context window. Some providers offer larger windows in beta or enterprise tiers (for example a 1M-token beta on certain Claude models), which are not shown here.

Embed this tool on your site

Free to embed, no attribution required (but appreciated). Paste this where you want the tool to appear:

<iframe src="https://codeswap.net/llm/context-window-comparison/?embed=1" width="100%" height="520" loading="lazy" style="border:1px solid #e5e7eb;border-radius:8px" title="LLM Context Window Comparison"></iframe>
<p style="font-size:13px">Tool by <a href="https://codeswap.net/llm/context-window-comparison/">LLM Context Window Comparison — Codeswap</a></p>