By Sam Smith in DevOps — 19 Aug 2025

Nginx Rate Limiting: How it works, how to configure it and how to test it

Rate limits helps keep your service online and stable for all, without them your server will get overloaded, but setting the right limits is an art and you need to get it right.

Imagine running a web server. Everything is working smoothly, until one day your site suddenly goes down. A DoS (Denial of Service) attack is in progress.

Without rate limits in place, your server is flooded with requests until it becomes overloaded and unresponsive. If you had configured rate limits in Nginx, that extra traffic could have been rejected, and your site would still be online.

In this post, we’ll explore what rate limiting is, why it matters, how to configure it in Nginx, and how to test that it works.

What is rate limiting and why do we need it?

Rate limiting does exactly what it says, it limits the rate of requests your server will accept.

Without limits, a server will try to accept as many requests as it can. Under heavy load this leads to rising latency, timeouts, and eventually crashes, all of which mean a poor experience for your users.

With rate limiting, your server can manage requests at a safe, steady pace. Spikes in traffic can be queued or rejected, preventing your application from taking on more than it can handle.

For example, imagine you haven’t configured any limits. A single client could fire off hundreds of requests per second. If ten clients did this simultaneously, that could be thousands of requests per second, far more than your application can safely serve.

By setting a limit, say 10 requests per second per client, each user is kept within a fair budget. This allows the server to stay stable and responsive, and ensures more users can be served at once.

Choosing the right limits takes some tuning, but we can use simple formulas to guide us, let’s look at those next.

Getting the numbers right

Before we think about touching the Nginx configuration, we should do some initial calculations to estimate our servers safe capacity.

Defining base rate (Little’s Law)

This step is crucial, setting your safe base rate too high will result in our server being overloaded, setting it too low and users will see a lot of errors, you need to find that goldilocks number in between and this is where Little's Law is used.

Little’s Law:
Requests Per Second = Max Concurrency / Average Service Time

Max Concurrency: the number of requests your service can handle simultaneously without degradation.
Average Service Time: the average time to complete one request.

Example: If you can handle 50 concurrent requests and each takes 0.25s, then

50 / 0.25 = 200 r/s

That’s your safe base rate.

Defining a burst (queue capacity)

Real traffic is bursty, it does not come in at a linear flow, defining a burst allows you to absorb short spikes without errors. There is another formula you can use to help size the burst:

Burst = (Spike Rate − Base Rate) × Seconds To Absorb

Example: Base Rate = 200 r/s, Spike Rate = 350 r/s, absorb 5 s

Burst = (350 − 200) × 5 = 750

This means you can absorb 750 extra requests total over those 5 seconds.
Those extra requests wait in a short queue and are released at the safe 200 r/s.
If the queue is full (more than 750 waiting), subsequent requests are rejected.

When requests are queued, the user might experience a slight delay, there is one more formula to work out the maximum wait time when queueing is enabled:

Max Wait = Burst / Base Rate

So in our example: 750 / 200 = 3.75 s

How to implement it in Nginx?

Let’s now look at how to implement our rate limiting in Nginx, we shall define two groups, known as zones, global and per IP with the following rules:

Global will protect the server overall with a 200 r/s limit and 750 burst causing a max wait of 3.75s
Per IP will ensure fairness between clients with a 10 r/s limit and 20 burst causing a max wait of 2s

Define limiter zones

Inside the Nginx configuration, we shall define our zones inside the http block.

http {
  # Global bucket: Safe 200 r/s, can absorb 5s at 350 r/s (burst = 750)
  # Use a constant key (here: $server_name) so all requests share one bucket.
  limit_req_zone $server_name       zone=global:10m rate=200r/s;

  # Per-IP bucket: 10 r/s per client, allow short spikes (burst = 20)
  limit_req_zone $binary_remote_addr zone=perip:10m  rate=10r/s;

  # (Optional) If behind Cloudflare/another proxy, restore real client IP:
  # real_ip_header CF-Connecting-IP;
  # set_real_ip_from 173.245.48.0/20;
  # ... include all CF ranges ...
}

Apply limiters

To apply the limiters, you need to define them in your server blocks, in this example we are using the default queueing method where burst requests are queued, causing increased latency to maintain the define requests per second.

server {
  listen 80;
  server_name _;

  # Return 429 when limited, and log limiter events
  limit_req_status 429;
  limit_req_log_level notice;

  # Apply BOTH: fairness (per-IP) + total capacity (global)
  limit_req zone=perip  burst=20;   # ~2.0s max wait per busy client (20 / 10)
  limit_req zone=global burst=750;  # ~3.75s max wait globally (750 / 200)

  # Friendly error body for rate-limited requests
  error_page 429 = @rate_limited;
  location @rate_limited { return 429 "Rate limit hit\n"; }

  root /usr/share/nginx/html;
  location / { try_files /index.html =404; }
}

If you would rather not queue requests, you can add the nodelay option, this helps keep latency lower but the client may experience more rejections, this is because nodelay tells Nginx to immediately reject requests if the burst is full.

limit_req zone=perip  burst=20  nodelay;   # Hard per-IP cap, low latency
limit_req zone=global burst=750 nodelay;   # Hard global cap, early 429s

You should now test your Nginx configuration to ensure it is still valid and then reload it.

nginx -t
nginx -s reload

How do you tell if it is working? Well that is what we will cover in the next section.

How to test it?

To test your rate limiter, you will need an Nginx server, for this post I’m going to run everything in Docker containers locally, but there is nothing stopping you from running it however you like, Docker is just a quick, simple way for me to demo this topic.

Running Nginx Instance

Once you have defined your nginx.conf file with the previous configuration examples, run the following Docker commands to create a network and an instance of Nginx.

docker network create nginx-test
docker run --rm -it \
  --name nginx \
  --network nginx-test \
  -v "${PWD}/nginx.conf:/etc/nginx/nginx.conf:ro" \
  nginx:latest

Running a Test Instance

We now need a place to run our test from, again I will use a Docker container with the wrk tool installed, this will allow me to generate requests and show the rate limiter in action, you can run as many of these instances as you like, each one will have its own IP in the network and therefore each one can be treated as a separate client.

docker run --rm -it \
  --network nginx-test \
  ubuntu:latest bash -c 'apt update && apt install -y wrk && exec bash'

Running a test

The syntax for the wrk command is as follows.

wrk -t[NUM_OF_THREADS] -c[NUM_OF_CONNECTIONS] -d[TEST_DURATION] --latency [TARGET_URL]

For the following tests this is how I want you to see it, the client could be a company and the connections the employees, the company in this example is one IP address, the rate limit would be applied to the whole company since we are limiting based on the IP.

For comparison, this was a test I did before applying the rate limiting, it shows an unrestricted flow of requests, the client (aka company) has 20 employees hammering our server and is clearly taking advantage of it.

# wrk -t1 -c20 -d10s --latency http://nginx:80
Running 10s test @ http://nginx:80/
  1 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.22ms  455.57us  14.08ms   87.07%
    Req/Sec    16.42k     1.17k   19.91k    73.27%
  Latency Distribution
     50%    1.17ms
     75%    1.36ms
     90%    1.60ms
     99%    2.44ms
  164968 requests in 10.10s, 134.20MB read
Requests/sec:  16332.37
Transfer/sec:     13.29MB

They are making over 16,000 requests per second, even though our server can handle this right now, if 10 more companies did the same, this would quickly get out of hand.

So, with our rate limiting configured, the following test run will effectively simulate one client with one connection.

# wrk -t1 -c1 -d10s --latency http://nginx:80
Running 10s test @ http://nginx:80/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    99.11ms    8.05ms 101.30ms   99.01%
    Req/Sec    10.10      1.00    20.00     99.00%
  Latency Distribution
     50%   99.78ms
     75%  100.44ms
     90%  100.61ms
     99%  101.11ms
  101 requests in 10.02s, 84.13KB read
Requests/sec:     10.08
Transfer/sec:      8.40KB

You can see from the latency and the Req/Sec metrics that our rate limit is working as expected, requests are taking ~100ms which matches our rate limit of 10 r/s.

Now, let's see the queue in action, we will run the test again but this time simulate one client with 20 connections, we have a rate limit set at 10 r/s per IP, so as more requests are made, some will be delayed.

# wrk -t1 -c20 -d10s --latency http://nginx:80
Running 10s test @ http://nginx:80/
  1 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.61s   622.12ms   2.00s    81.48%
    Req/Sec    10.31      2.51    20.00     90.53%
  Latency Distribution
     50%    2.00s
     75%    2.00s
     90%    2.00s
     99%    2.00s
  101 requests in 10.02s, 84.13KB read
Requests/sec:     10.08
Transfer/sec:      8.40KB

There you have it, we still have the 10 r/s rate but the latency has increased to 2s which is as expected (20 / 10 = 2), you will also notice that the total requests for our 10s test was again 101, this is because of our rate limit, so no matter how many employees hit our server from that company, the flow will always be a steady 10 r/s.

Let’s now see the rate limit being triggered, for this test we will run the command again but with 50 connections.

# wrk -t1 -c50 -d10s --latency http://nginx:80
Running 10s test @ http://nginx:80/
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    24.94ms  150.32ms   1.90s    96.97%
    Req/Sec    25.21k     2.59k   33.70k    66.00%
  Latency Distribution
     50%    1.15ms
     75%    1.41ms
     90%    1.77ms
     99%  931.47ms
  250800 requests in 10.01s, 42.64MB read
  Non-2xx or 3xx responses: 250699
Requests/sec:  25067.03
Transfer/sec:      4.26MB

You can clearly see here that a lot of requests failed, 250k of them, only 101 requests went through, thats the rate limit, because of how wrk works here, the requests per second and latency figures are a bit misleading, this is just because most of the requests failed very quickly as they were rejected due to the queue being full.

To `nodelay` or not to `nodelay`

I found it hard to visualise the effects of nodelay at the start, if you are in the same boat, hopefully this will help you understand better.

Without `nodelay` (default) → Queue mode

Requests above the base rate are queued (delayed) up to the burst size.
They’re released at the base rate, i.e. 10 r/s.
Clients may experience longer wait times, but fewer requests are dropped.

Analogy:

Think of it like a nightclub with a bouncer.

Only 55 people can go in per second.
Extra people wait in line (up to burst).
If the line is full, new arrivals are turned away (429).

Effect:

✅ Smooths bursts
✅ Few 429s
❌ Some users wait longer (extra latency)

With `nodelay` → Immediate accept/drop

Requests above the base rate are not queued.
If there’s still room in the burst, they’re let through immediately.
Once burst is full, excess is dropped instantly with 429.