LLM API Pricing Table
A single sortable table of what the major large language models actually cost to call. Every row shows the input and output price per million tokens, the context window, and any notable extras like cached-input discounts. Type your own per-call token counts and the table adds a live cost-per-call column so you can rank models by what your workload would pay, not just the sticker price. Filtering and sorting run entirely in your browser.
Prices are list / standard-tier USD per 1,000,000 tokens, compiled 2026-05. Providers change prices often, and open-weight models (Llama, Qwen, Mixtral, Gemma) are billed differently by each host, so those rows show a representative figure marked hosted, varies. Always confirm on the provider's own pricing page before budgeting. Batch processing, cached input, and committed-volume tiers can cut these numbers substantially.
How to use the LLM API Pricing Table
The table loads sorted by input price, cheapest first. Click any column heading to re-sort — click Output $/1M to rank by completion cost, Context to find the longest windows, or click the same heading again to reverse the order. Use the Provider dropdown to narrow to one vendor, or type into Search model to filter by name (for example "haiku" or "flash").
The most useful control is the pair of token boxes. Enter how many input and output tokens a typical request uses and the Cost / call column recomputes for every model at once. A summarisation job might be 4,000 input and 300 output tokens; a chat reply might be 800 in and 400 out; a reasoning task can be 1,000 in and 8,000 out. Because input and output are priced separately — and output is usually three to five times dearer — the cheapest model for a read-heavy task is often not the cheapest for a generation-heavy one. Set the ratio that matches your use and sort by Cost / call to see the real ranking.
Everything recalculates locally as you type. Nothing about your usage is sent anywhere.
How LLM API pricing works
Hosted language models are billed per token, not per word or per request. A token is a chunk of text averaging about four characters in English, so roughly 750 words come to about 1,000 tokens. Providers quote a price per million tokens and split it into two rates: the input (or prompt) price you pay for everything you send, and the output (or completion) price you pay for everything the model generates. Output almost always costs more — frequently three to five times more — because generating tokens one at a time is the expensive part of inference.
Several factors move the effective price below the headline number. Prompt caching lets you re-send a large fixed prefix (a long system prompt, a document, a tool schema) at a steep discount on subsequent calls — the cached-input figures in the Notes column. Batch APIs trade latency for roughly half-price asynchronous processing. Reasoning models such as o1 and DeepSeek-R1 look cheap per token but emit large volumes of hidden "thinking" tokens that you are billed for at the output rate, so their real cost per finished answer is much higher than the sticker suggests. And open-weight models — Llama, Qwen, Mixtral, Gemma — have no single official price; each host (Together, Fireworks, Groq, your own GPUs) sets its own, which is why those rows are flagged as varying.
For budgeting, the number that matters is cost per task, not cost per token. Multiply your expected input tokens by the input price and your expected output tokens by the output price, add them, and multiply by your call volume. The token boxes above do exactly that arithmetic across every model at once, which is why a model that wins on raw input price can still lose once a long, generation-heavy output is priced in.
Common use cases
- Picking a model for a new feature. Set the input/output token mix your feature uses and sort by Cost / call to see which models fit the budget before you write any code.
- Cutting an existing bill. Find a cheaper model with a similar capability tier — for example moving a classification task from a flagship to a mini or Flash model can drop cost by 10× or more.
- Comparing reasoning vs standard models. Reasoning models bill their hidden thinking at the output rate; raise the output-tokens box to see the true cost difference against a standard model.
- Choosing a long-context model. Sort by Context to find windows large enough for whole-codebase or whole-document prompts, then check what that context costs to fill.
- Estimating monthly spend. Get the per-call figure here, then multiply by your expected daily request volume for a quick run-rate.