Agent Loop Token Budget Estimator

Calculate the true token budget for an agentic LLM loop. Unlike single-call estimates, an agent resends the full tool schema block on every step while history grows by one output + observation per step — making a 10-step agent cost far more than 10× a single call. Set your model, tool count, step count, and output sizes to see the compounding effect.

How to use the Agent Loop Token Budget Estimator

Fill in the agent loop parameters, then click Calculate:

  • System prompt tokens — fixed instruction block describing the agent's role and constraints.
  • Number of tools — how many tool definitions are registered with the agent. Every tool's JSON schema is re-sent with every API call.
  • Avg tokens per tool schema — the token count of a single tool's JSON definition (name, description, inputSchema). Estimate 100-400 per tool depending on parameter verbosity.
  • Reasoning steps (K) — total number of turns in the agent loop. Each step the model decides on an action, the environment returns an observation, and the loop continues.
  • Avg observation tokens / step — tokens in the tool result returned to the model (API response, file content, search result, etc.).
  • Avg model output tokens / step — tokens the model generates per step (reasoning text + tool call JSON).

The result shows per-step and cumulative totals, plus a step-by-step breakdown of how the input context grows.

Why agentic loops are expensive

An LLM agent works by running a loop: the model receives a prompt, generates an action (usually a tool call), the environment executes the action and returns an observation, and the model receives the observation as the next input. This loop repeats K times. Because language model APIs are stateless, each iteration must include the full conversation history from step 1. Unlike a simple multi-turn chat (where history is user + assistant messages), an agent accumulates two types of tokens per step: the model\'s output (reasoning trace + tool call) and the tool\'s observation (often large — a full API response, a file excerpt, or a database result). The full tool-schema block is also re-sent with every call because the model needs to know what tools are available at each step.

The compounding effect is significant. At step k, the input context is: system prompt + (ntools × toolsize) + sum of all prior (output + observation) pairs. For a 10-step agent with 8 tools (200 tokens/tool), a 800-token system prompt, 250-token outputs, and 400-token observations, the input at step 10 alone is 800 + 1600 + 9×(250+400) = 8,250 tokens — versus 2,400 tokens for step 1. The total input across 10 steps is roughly 50,850 tokens, not 24,000 as you might expect from 10× a single call. This is why agentic applications need careful token budgeting and why long tool results dramatically inflate costs.

Common use cases

  • Agent cost budgeting — estimate API spend before running an expensive multi-step research or code-generation agent in production.
  • Tool schema optimisation — see the dollar impact of trimming tool descriptions from 300 tokens to 100 tokens when 10+ tools are registered.
  • Step limit tuning — compare 5-step vs 20-step agent loops to find the cost-quality break-even for your use case.
  • Observation truncation strategy — model the savings from truncating tool results to 200 tokens vs passing 1,000 tokens per observation.
  • Model selection for agents — compare gpt-5-mini vs claude-sonnet across realistic agentic parameters where the input-heavy nature of agents makes input-rate differences critical.

Frequently asked questions

Why is the tool schema block sent every step?

Because the API is stateless — the model has no memory of previous calls. The list of available tools must be included in every request so the model knows what actions it can take. This is one of the key cost drivers unique to agentic loops compared to plain chat.

Does this model ReAct, Plan-and-Execute, or other agent architectures?

It models a generic single-agent loop: one model call per step, full history accumulation. ReAct (Reason + Act) follows this pattern exactly. Plan-and-Execute splits planning and execution and may have lower per-step cost but a larger initial planning call. The estimates here are a good upper bound.

How do I reduce observation token cost?

Truncate tool results aggressively before adding them to history. For web search results, extract just the relevant sentences. For API responses, pick only the fields the model needs. Reducing observations from 1,000 to 200 tokens can cut total input tokens by 40-60% in a 10-step loop.

Is prompt caching useful for agents?

Highly useful. The system prompt and tool schemas are identical across all steps — they are a perfect prefix for prompt caching (Anthropic) or prefix caching (OpenAI). If caching works, those tokens may be served at 10-50% of full price, significantly reducing per-step input cost.

What is a typical step count for production agents?

Most production agents cap at 10-30 steps to control both cost and latency. Research agents (coding, deep research) may run 20-50 steps. Agents with no step limit are dangerous — a stuck agent will rack up thousands of API calls. Always set a max_iterations guard.