Nginx Rate Limit Config Generator

Generate nginx rate-limiting configuration the right way. Set a request rate and the key to throttle on, choose a burst size and whether excess is served immediately (nodelay), delayed, or rejected, optionally add a connection limit, and pick the memory zone size and the HTTP status returned when the limit trips. You get the matching http { } and server { } snippets plus a plain-English summary. It builds live in your browser.

How to use the Nginx Rate Limit Config Generator

Choose what to throttle on with the key — almost always the client IP via $binary_remote_addr, which is the compact binary form that keeps the shared-memory zone small. Set the rate (requests per second or minute) and the zone memory: roughly 16,000 IP states fit per megabyte, so 10 MB tracks ~160,000 clients. The generator emits a limit_req_zone for the http { } block and a limit_req for your location, plus the matching connection-limit directives if you enable them.

The subtle part is burst and nodelay. The base rate is enforced as a steady trickle, so without a burst even slightly bunched-up legitimate requests get rejected. A burst allows a queue of that many excess requests; nodelay serves them immediately as long as the queue isn't full (the usual choice for APIs), delay=N serves the first N immediately and paces the rest, and queue-all paces everything. Requests beyond rate-plus-burst receive the status you set — 429 Too Many Requests is the correct code. Put the limit_req_zone line in the http context and the limit_req line in the relevant server or location, then reload with nginx -t && nginx -s reload.

How nginx rate limiting works

Nginx rate limiting is built on the leaky bucket algorithm. You declare a shared-memory zone with limit_req_zone that records, per key, the timestamp of the last request; limit_req then enforces a target rate by treating requests as water draining from a bucket at a constant rate. Requests that arrive faster than the rate would overflow the bucket and are rejected — unless a burst gives the bucket extra capacity to absorb short spikes. This is why a bare rate with no burst feels surprisingly strict: real traffic is bursty, and a 10 r/s limit literally means one request every 100 ms, so two requests 50 ms apart already violate it.

The key defines what gets limited. Keying on $binary_remote_addr limits per client IP; keying on a URI or a header limits per endpoint or per token. The choice interacts with your topology: behind a CDN or load balancer every request appears to come from the proxy's IP, so you must either trust and parse X-Forwarded-For via the realip module or accept that limiting will be coarse. The zone memory bounds how many distinct keys can be tracked; when it fills, nginx evicts the oldest entries, so size it for your expected client population.

The burst/nodelay combination shapes the experience. With nodelay, the burst capacity lets bunched requests through instantly while still capping the sustained rate — ideal for APIs where latency matters. Without it, excess requests are delayed to smooth traffic into a steady stream, which protects a fragile backend at the cost of added latency. Anything beyond the rate-plus-burst allowance is refused with limit_req_status (use 429). A separate mechanism, limit_conn, caps the number of simultaneous connections per key rather than their rate — useful against slow-loris-style abuse and large concurrent downloads. The two are complementary, and this generator can emit both. Always validate with nginx -t before reloading.

Common use cases

  • API protection. Cap requests per IP with a burst and nodelay so legitimate spikes pass but abuse is throttled.
  • Login endpoints. Apply a strict per-IP rate to slow brute-force attempts against authentication.
  • Backend shielding. Use delayed limiting to smooth bursty traffic into a steady stream a fragile service can handle.
  • Connection caps. Add limit_conn to bound simultaneous connections against slow-client and download abuse.

Frequently asked questions

What does burst actually do?

burst sets how many requests above the steady rate may be queued before nginx starts rejecting. Because the base rate is enforced as an even trickle (10 r/s means one every 100 ms), real bursty traffic needs a burst allowance to avoid spurious 429s. With nodelay the burst is served immediately; without it the queued requests are delayed to match the rate.

Should I use nodelay?

For most APIs, yes. nodelay serves bursted requests right away while still capping the sustained rate, which keeps latency low. Omit nodelay (or use delay=N) when your goal is to smooth traffic into a steady stream to protect a backend that cannot handle bursts, accepting the added latency.

Which status code should rate-limited requests return?

429 Too Many Requests is the standard, semantically-correct response and is what clients and SDKs expect for ret/backoff logic. Nginx defaults to 503; set limit_req_status 429 (this generator does) so clients can distinguish rate limiting from a real outage.

How big should the memory zone be?

Each key state is about 64 bytes, so roughly 16,000 keys fit per megabyte. A 10 MB zone tracks around 160,000 distinct IPs. Size it to your expected concurrent client population; when the zone is full nginx evicts the least-recently-seen entries, which can let a very large client base slip the limit.

I am behind Cloudflare or a load balancer — does limiting still work?

By default every request appears to originate from the proxy IP, so a per-IP limit would throttle all clients together. Use the realip module to restore the true client IP from a trusted X-Forwarded-For or CF-Connecting-IP header, then key the limit on $binary_remote_addr. Only trust the header from known proxy ranges.