Agent Loop Token Budget Estimator
Calculate the true token budget for an agentic LLM loop. Unlike single-call estimates, an agent resends the full tool schema block on every step while history grows by one output + observation per step — making a 10-step agent cost far more than 10× a single call. Set your model, tool count, step count, and output sizes to see the compounding effect.
How to use the Agent Loop Token Budget Estimator
Fill in the agent loop parameters, then click Calculate:
- System prompt tokens — fixed instruction block describing the agent's role and constraints.
- Number of tools — how many tool definitions are registered with the agent. Every tool's JSON schema is re-sent with every API call.
- Avg tokens per tool schema — the token count of a single tool's JSON definition (name, description, inputSchema). Estimate 100-400 per tool depending on parameter verbosity.
- Reasoning steps (K) — total number of turns in the agent loop. Each step the model decides on an action, the environment returns an observation, and the loop continues.
- Avg observation tokens / step — tokens in the tool result returned to the model (API response, file content, search result, etc.).
- Avg model output tokens / step — tokens the model generates per step (reasoning text + tool call JSON).
The result shows per-step and cumulative totals, plus a step-by-step breakdown of how the input context grows.
Why agentic loops are expensive
An LLM agent works by running a loop: the model receives a prompt, generates an action (usually a tool call), the environment executes the action and returns an observation, and the model receives the observation as the next input. This loop repeats K times. Because language model APIs are stateless, each iteration must include the full conversation history from step 1. Unlike a simple multi-turn chat (where history is user + assistant messages), an agent accumulates two types of tokens per step: the model\'s output (reasoning trace + tool call) and the tool\'s observation (often large — a full API response, a file excerpt, or a database result). The full tool-schema block is also re-sent with every call because the model needs to know what tools are available at each step.
The compounding effect is significant. At step k, the input context is: system prompt + (ntools × toolsize) + sum of all prior (output + observation) pairs. For a 10-step agent with 8 tools (200 tokens/tool), a 800-token system prompt, 250-token outputs, and 400-token observations, the input at step 10 alone is 800 + 1600 + 9×(250+400) = 8,250 tokens — versus 2,400 tokens for step 1. The total input across 10 steps is roughly 50,850 tokens, not 24,000 as you might expect from 10× a single call. This is why agentic applications need careful token budgeting and why long tool results dramatically inflate costs.
Common use cases
- Agent cost budgeting — estimate API spend before running an expensive multi-step research or code-generation agent in production.
- Tool schema optimisation — see the dollar impact of trimming tool descriptions from 300 tokens to 100 tokens when 10+ tools are registered.
- Step limit tuning — compare 5-step vs 20-step agent loops to find the cost-quality break-even for your use case.
- Observation truncation strategy — model the savings from truncating tool results to 200 tokens vs passing 1,000 tokens per observation.
- Model selection for agents — compare gpt-5-mini vs claude-sonnet across realistic agentic parameters where the input-heavy nature of agents makes input-rate differences critical.