LLM API Cost Calculator

Estimate per-call and monthly API costs across every major LLM provider. Enter your input and output token counts, your monthly call volume, and the optional cache hit ratio — the tool runs the numbers for whichever model you pick. Built on the same model database that powers the rest of the site, so prices stay current.

How the calculation works

Every LLM API bills per million tokens. For a single call:

cost = (input_tokens × input_price + output_tokens × output_price) / 1,000,000

With prompt caching, the cached portion of input tokens is billed at a discounted rate (typically 90% off for OpenAI and Anthropic, 75% off for Google). The cache-hit ratio slider applies that discount to the share of input tokens you estimate will hit the cache. Output tokens are not cached on any major provider.

Monthly cost is per-call × calls per month. Annual is monthly × 12. The "compare all models" button runs the same calculation across every active model in the database and shows the cheapest first, which is the fastest way to spot the best price-performance fit for your workload.

What the calculator doesn't model

  • Tool-call overhead. Each function/tool definition adds ~10-50 tokens to the input. The calculator counts only the prompt + response.
  • Reasoning tokens. o-series and Claude thinking-mode models incur internal reasoning tokens that count toward output billing. Add ~2-5x the expected output to estimate.
  • Multimodal inputs. Images, audio, and video tokenize differently per provider. The calculator handles text only.
  • Vendor-specific discounts. Volume tiers, committed-use discounts, batch API discounts (50% off many models). Add these adjustments separately.
  • Egress and storage. Negligible for most workloads but non-zero for very high-volume streaming use cases.

Common questions before running numbers

What input/output split should I assume? For chat, typical is ~3-5x more input than output (system prompt + history dominates input). For generation tasks (summaries, code, translations), input and output are closer to balanced. For RAG, input dominates heavily.

How accurate is the call-volume estimate likely to be? In practice the variance is large. Real workloads tend to come in 30-50% over projections once retries, debug calls, and development iteration are added. Multiply the calculator's monthly estimate by 1.3-1.5 for a more realistic budget.

Should I include the model's free tier? No — by the time you're calling enough to care about cost, the free tier is irrelevant. Calculate on list price; treat the free tier as a small discount on the first month.