TPM / RPM Rate Limit Calculator

Enter your actual or expected requests-per-minute and average token counts, then set your OpenAI or Anthropic tier limits. The calculator shows which limit — TPM or RPM — binds first, how close you are to the cap, and the maximum throughput you can safely sustain.

Rate limits vary by model and usage tier. Check your limits at platform.openai.com/account/limits or console.anthropic.com/settings/limits.

How to use the TPM / RPM Rate Limit Calculator

Fill in your current or planned traffic numbers in the top row, then set your account's rate limits in the second row. If you do not know your limits, the defaults are typical for OpenAI Tier 2 on GPT-4o-mini. Click any input to recalculate instantly.

The calculator computes tokens per minute as RPM × (avg input tokens + avg output tokens), then compares that against your TPM and RPM caps. It tells you:

  • Your actual tokens-per-minute and what percentage of the TPM cap they consume.
  • Your RPM as a percentage of the RPM cap.
  • Which limit binds first — the one with the lower headroom is your effective ceiling.
  • The maximum safe RPM under the TPM cap: floor(TPM cap / (tin + tout)).
  • Headroom — how many more requests per minute you can add before hitting the first limit.

TPM, RPM, and how rate limits interact

OpenAI and Anthropic enforce two independent rate limits: Tokens Per Minute (TPM) and Requests Per Minute (RPM). Both are sliding-window limits, and you can hit either one independently. Your effective throughput ceiling is whichever limit you reach first. A single request that passes the RPM check can still be rejected if it pushes you over the TPM limit; conversely, low-token requests can exhaust your RPM allowance before touching the TPM cap.

The key insight is that TPM and RPM are not interchangeable. If your prompts are large (e.g., a 10k-token RAG context), you will hit the TPM wall long before the RPM wall. If your prompts are tiny (e.g., 50-token classification tasks), you will hit RPM first. Understanding which limit binds determines where to optimize: for TPM-bound workloads, compress prompts or use a model with a higher TPM limit; for RPM-bound workloads, use the Batch API or consolidate requests.

Tiers matter significantly. OpenAI Tier 1 (new accounts) starts at 500 RPM and 200k TPM on GPT-4o-mini. Tier 5 can be 30,000 RPM and 150M TPM. Anthropic's tiers scale similarly. Always verify your current limits in the provider console rather than relying on documentation defaults, as limits can be raised by request or automatically as usage history builds.

Common use cases

  • Pre-launch capacity check — verify your expected traffic fits within your tier before going live.
  • Rate limit upgrade justification — quantify how close to the cap you are to support a limit increase request.
  • Batch API decision — determine if your workload is RPM-bound and therefore a good fit for the async Batch API.
  • Prompt optimization ROI — see exactly how much throughput you gain per 100 tokens you trim from your prompt.
  • Multi-tenant SaaS planning — estimate how many concurrent end-users your current tier can support.

Frequently asked questions

What does "which limit binds first" mean?

It means which limit — TPM or RPM — you will hit first given your current traffic. The binding limit is the effective ceiling on your throughput. You cannot exceed either limit, so the tighter one determines your max safe load.

How do input and output tokens count differently?

Most providers count both input and output tokens toward your TPM limit. Some providers charge them at different rates, but both consume TPM capacity. This calculator adds them together to compute total tokens per request.

Why does my RPM seem low even though I have headroom on TPM?

RPM is a flat request count; it does not adjust for token size. Short requests consume RPM just as fast as long ones. If your use case has many small requests, you are likely RPM-bound and should look at request consolidation or the Batch API.

How do I request a rate limit increase?

For OpenAI, go to platform.openai.com → Settings → Limits and submit a limit increase request. For Anthropic, contact support through console.anthropic.com. Most providers grant increases automatically once sufficient usage history is established.

Do rate limits reset every minute exactly?

Both OpenAI and Anthropic use a sliding window, not a hard per-minute reset. If you send 500 requests in 10 seconds, you will be throttled even if your RPM cap is 500, because the trailing 60-second window is full. Space requests evenly with a token-bucket or leaky-bucket algorithm.