TPM / RPM Rate Limit Calculator
Enter your actual or expected requests-per-minute and average token counts, then set your OpenAI or Anthropic tier limits. The calculator shows which limit — TPM or RPM — binds first, how close you are to the cap, and the maximum throughput you can safely sustain.
Rate limits vary by model and usage tier. Check your limits at platform.openai.com/account/limits or console.anthropic.com/settings/limits.
How to use the TPM / RPM Rate Limit Calculator
Fill in your current or planned traffic numbers in the top row, then set your account's rate limits in the second row. If you do not know your limits, the defaults are typical for OpenAI Tier 2 on GPT-4o-mini. Click any input to recalculate instantly.
The calculator computes tokens per minute as RPM × (avg input tokens + avg output tokens), then compares that against your TPM and RPM caps. It tells you:
- Your actual tokens-per-minute and what percentage of the TPM cap they consume.
- Your RPM as a percentage of the RPM cap.
- Which limit binds first — the one with the lower headroom is your effective ceiling.
- The maximum safe RPM under the TPM cap:
floor(TPM cap / (tin + tout)). - Headroom — how many more requests per minute you can add before hitting the first limit.
TPM, RPM, and how rate limits interact
OpenAI and Anthropic enforce two independent rate limits: Tokens Per Minute (TPM) and Requests Per Minute (RPM). Both are sliding-window limits, and you can hit either one independently. Your effective throughput ceiling is whichever limit you reach first. A single request that passes the RPM check can still be rejected if it pushes you over the TPM limit; conversely, low-token requests can exhaust your RPM allowance before touching the TPM cap.
The key insight is that TPM and RPM are not interchangeable. If your prompts are large (e.g., a 10k-token RAG context), you will hit the TPM wall long before the RPM wall. If your prompts are tiny (e.g., 50-token classification tasks), you will hit RPM first. Understanding which limit binds determines where to optimize: for TPM-bound workloads, compress prompts or use a model with a higher TPM limit; for RPM-bound workloads, use the Batch API or consolidate requests.
Tiers matter significantly. OpenAI Tier 1 (new accounts) starts at 500 RPM and 200k TPM on GPT-4o-mini. Tier 5 can be 30,000 RPM and 150M TPM. Anthropic's tiers scale similarly. Always verify your current limits in the provider console rather than relying on documentation defaults, as limits can be raised by request or automatically as usage history builds.
Common use cases
- Pre-launch capacity check — verify your expected traffic fits within your tier before going live.
- Rate limit upgrade justification — quantify how close to the cap you are to support a limit increase request.
- Batch API decision — determine if your workload is RPM-bound and therefore a good fit for the async Batch API.
- Prompt optimization ROI — see exactly how much throughput you gain per 100 tokens you trim from your prompt.
- Multi-tenant SaaS planning — estimate how many concurrent end-users your current tier can support.