Assistants / Thread Cost Estimator
Estimate the true cost of a multi-turn chat session, including the often-surprising effect of history accumulation. At each turn the full conversation history is re-sent as input, so costs grow superlinearly with turns. Set your model, system prompt size, message lengths, turn count, and history window to see per-turn and total costs.
How to use the Assistants / Thread Cost Estimator
Set the model and the four token size inputs, then choose a history window:
- System prompt tokens — the fixed instruction block prepended to every API call. Estimate via the LLM cost calculator.
- Avg user msg tokens — average tokens in each user turn.
- Avg assistant reply tokens — average tokens generated per turn. This is the output cost driver and also accumulates in history.
- Number of turns — total turns in the conversation (one turn = one user message + one assistant reply).
- History window (0 = all) — how many prior turns of context are sent with each new API call. 0 means the full conversation history is resent every turn. Setting N means only the last N turns are included. Reducing the window cuts input cost dramatically at the expense of the model forgetting earlier turns.
The result grid shows: total input tokens, total output tokens, total cost, per-turn average cost, and the input size of the final turn (the most expensive one).
Why multi-turn chat costs grow superlinearly
Stateless API design means that every call to a chat completion endpoint must include the full conversation history — the LLM has no memory between calls. At turn k, the input tokens are: system prompt + all prior user messages + all prior assistant replies (possibly truncated to a window) + the new user message. If you retain full history (window = 0), the input at turn k is O(k) tokens, so the cumulative cost across all turns is O(k²) — roughly quadratic in the number of turns. A 20-turn conversation with 300-token average messages sends roughly 6x more input tokens in total than if each call were independent. See the LLM cost calculator for single-call estimates.
The history window parameter mirrors what most production assistants implement via "sliding window" or "message trimming" strategies. By keeping only the last N turns, you cap the per-call input size at approximately sys + N*(umsg+amsg) + umsg tokens per call, which makes costs O(k) rather than O(k²) — linear instead of quadratic. The trade-off is that the model loses access to earlier turns, which can cause it to forget user preferences, agreed facts, or prior tool results. Finding the right window size is a key product engineering decision for chat-based products.
Common use cases
- Budget planning for chat products — estimate monthly API spend for a given average conversation length and expected user volume before launch.
- Window size optimisation — compare full-history vs windowed-history cost at 20, 30, 50 turns to find the break-even point for your quality requirements.
- Model selection — compare gpt-5-mini vs claude-haiku vs claude-sonnet across realistic conversation parameters to choose the right cost-quality tradeoff.
- Pricing model design — if you charge users per conversation, use this to calculate your cost per conversation and set a margin-positive price.
- System prompt optimisation — see the dollar impact of reducing system prompt size from 2000 to 500 tokens across 10,000 daily conversations.