Prompt Caching Savings Calculator
See how much prompt caching saves on your API bill. When many requests share a long, identical prefix — a system prompt, a document, a few-shot block — providers can cache it and charge a fraction to reuse it. Enter your cached and fresh token counts, your monthly volume, and your cache hit rate to compare the cost with and without caching, and the break-even. All math runs locally.
| Input cost without caching | |
| Input cost with caching | |
| Monthly savings | |
| Savings |
Covers input-token cost only — output tokens are billed the same either way. Multipliers are applied to the base input price. Prices change; verify against your provider's current rate card.
How to use the Prompt Caching Savings Calculator
Pick a provider scheme to load its caching multipliers, or choose Custom and enter them yourself. The base input price is your model's normal per-million-token input rate; the write and read multipliers scale it for caching a prefix and for reusing one. Then describe your workload: how many tokens are in the shared prefix (the part that repeats), how many are unique per request, your monthly request count, and the share of requests that hit a warm cache.
The result compares input cost with and without caching and shows the monthly saving in both dollars and percent. The bigger your shared prefix relative to the fresh part, and the higher your hit rate, the more caching wins. If your prefix is small or your hit rate is low, caching can even cost slightly more because of the write surcharge — the calculator makes that trade-off visible.
How prompt caching pricing works
Large prompts often repeat. A coding agent resends the same system prompt and tool definitions on every turn; a document-QA app resends the whole document for each question; a few-shot classifier resends the same examples. Without caching you pay full input price for those tokens every single time. Prompt caching lets the provider store the processed prefix so subsequent requests that begin with the same tokens are billed at a steep discount — the model skips re-reading the cached part.
The pricing has two parts. Writing to the cache (the first request, a miss) can carry a small surcharge — Anthropic charges about 1.25x the base input rate for a short-lived cache write — because the provider does extra work to store the prefix. Reading from the cache (subsequent hits) is heavily discounted: roughly 0.1x base on Anthropic, 0.5x on OpenAI's automatic caching, and around 0.25x for Gemini's context caching. OpenAI applies caching automatically with no write surcharge; Anthropic requires you to mark cache breakpoints explicitly and offers both 5-minute and 1-hour cache lifetimes at different write prices.
The economics hinge on three numbers: how large the shared prefix is, how often you reuse it before it expires (the hit rate), and the write-versus-read spread. A long prefix reused thousands of times approaches the read multiplier — on Anthropic that is a roughly 90% input saving. A short prefix, or one that expires before reuse, barely helps and the write surcharge can make it marginally more expensive. This calculator models exactly that so you can decide whether to restructure a prompt to maximize the cacheable prefix.
Common use cases
- Justifying a caching rollout. Put a monthly dollar figure on enabling caching before you change any code.
- Structuring prompts. See how much you gain by moving stable content to the front so more of the prompt is cacheable.
- Comparing providers. Weigh Anthropic's deep read discount with a write surcharge against OpenAI's automatic 50% cached rate.
- Setting hit-rate targets. Find the hit rate at which caching breaks even for your prefix size.