An introduction to load balancing with Nginx

When you start needing multiple servers to host your application, you also need some way to spread the load, enter Nginx and its load balancing features.

The internet is getting busier day by day and balancing your websites traffic is becoming more and more relevant.

Take the following situation, you run a single webserver, but lately you have seen traffic increase and performance is starting to drop, you need more capacity, this is where load balancing comes in to play.

You add another webserver and place a load balancer in front of them, this balancer server will take in requests and distribute them between the two webservers, the load is now spread and your performance improves.

Why Do We Need Load Balancing?

There are a number of reasons you might want load balancing, here are a few:

  • Scalability
    • A single server has limits (CPU, memory, bandwidth).
    • By adding more servers behind a load balancer, you can handle more users.
  • High Availability (Fault Tolerance)
    • If one server crashes or goes offline, the load balancer can redirect traffic to another healthy server.
    • Users don't notice downtime, since other servers pick up the slack.
  • Performance Optimization
    • Distributing requests evenly reduces latency and prevents bottlenecks.
    • Algorithms like "least connections" and "IP hash" can be used to optimize resource use.
  • Flexibility & Maintainability
    • Servers can be taken down for updates or maintenance without affecting users, the load balancer simply routes around them until they're back.

How It Works (at a glance)

In short, this is how load balancing works:

  • Client sends a request to the website.
  • Load balancer receives it instead of a single backend server.
  • Load balancer chooses a server using a method like:
    • Round Robin - rotate evenly through servers
    • Least Connections - send traffic to the server with the fewest active requests
    • IP Hash - the same client always goes to the same server
  • Server responds, and the client gets the data back.

Nginx Examples

The following examples will demonstrate the different algorithms you can use, at the end I will provide a setup which you can spin up in Docker and play around with yourself, but first familiarise yourself with the available algorithms and their pros and cons.

Round Robin (default)

This is the default and simplest approach, it will rotate requests in order so if you had 4 servers, the first request would go to server 1, the second request to server 2, request 3 to server 3 and request 4 to server 4, request 5 would then circle back and go to server 1 and so on.

✅ Good general-purpose choice.
❌ Doesn't consider server load.

upstream backend {
  server worker1:8000;
  server worker2:8000;
  server worker3:8000;
  server worker4:8000;
}

Weighted Round Robin

Sometimes you may want a server to receive more requests than others, you can do this by giving a weight value to servers, so in this case if 4 requests are sent, 3 of them should go to worker1 and worker2 would get the other.

You can apply weight to the Least Connections algorithm.

✅ Good if you have some server larger than others.
❌ Again doesn't consider server load.

upstream backend {
  server worker1:8000 weight=3;
  server worker2:8000 weight=1;
}

Least Connections

This options send the request to the server with the least active connections, Nginx does this by keeping count of the number of active connections each server has, it works as follows:

  • Nginx iterates through the servers.
  • It compares the number of active connections for each.
  • The request goes to the one with the lowest number.
  • If multiple servers have the same lowest number, it falls back to round robin between them.

If you also have weights defined it will divide the number of active requests by the weight before comparing.

✅ Great for APIs or apps with uneven request times.
❌ Adds a little overhead for connection tracking.

upstream backend {
  least_conn;
  server worker1:8000;
  server worker2:8000;
  server worker3:8000;
  server worker4:8000;
}

IP Hash (Sticky Sessions)

With this method Nginx will use the client's IP address to always send the request to the same server, if that particular server goes down, Nginx will temporary send the requests to another server.

Weight will have no effect when using this method.

✅ Ensures session stickiness (good for apps storing sessions in memory).
❌ If one client is heavy, it overloads one server.

upstream backend {
  ip_hash;
  server worker1:8000;
  server worker2:8000;
  server worker3:8000;
  server worker4:8000;
}

Handling Failures and Health Check

The more servers you have, the more likely some might fail, and you don't want to be sending requests to these failed servers, Nginx provides a few options to help manage these situations.

upstream backend {
  server worker1:8000 max_fails=2 fail_timeout=10s;
  server worker2:8000 max_fails=2 fail_timeout=10s;
  server worker3:8000 max_fails=2 fail_timeout=10s;
  server worker4:8000 backup;
}

max_fails and fail_timeout

In the above example, max_fails will define the number of times the server needs to fail to respond before Nginx marks it at unavailable or down, and fail_timeout will define the timespan for the fails to occur, but also for how long Nginx will refrain from sending requests to it.

  • If worker1 fails 2 requests within 10 seconds, it will be marked down.
  • For the next 10 seconds, Nginx won’t send new requests to it.
  • After 10s, Nginx will try again, if the server responds, it comes back into rotation.

While this stops Nginx sending requests to a failed server, it is a passive health check and Nginx will only know about the failed server after client requests hit it and fail.

For an active health check, you would need Nginx Plus since this feature is not available in the free open source version.

backup

If servers 1 - 3 are all down, Nginx will resort to sending requests to the backup server, once the others come back, requests will resume being routed to them.

You can define as many backup servers as you like, if you have multiple, these will use the same routing algorithms and you can also define weights to these.

Hands on, have a go

To learn best, you need to get hands on, I have created a Docker stack which you can use to test and play around with all the load balancing algorithms we looked at above.

You need to create three files:

  • Dockerfile
FROM busybox:stable
ENV SERVER_NAME=server-1
WORKDIR /www
EXPOSE 8000
CMD sh -c 'echo "Hey from $SERVER_NAME" > index.html && httpd -f -p 8000 -h /www'
  • nginx.conf
worker_processes  auto;

events { worker_connections  1024; }

http {
  upstream backend-rb {
    server worker1:8000;
    server worker2:8000;
    server worker3:8000;
    server worker4:8000;
  }

  upstream backend-rbw {
    server worker1:8000 weight=4;
    server worker2:8000 weight=2;
    server worker3:8000 weight=1;
    server worker4:8000 weight=1;
  }

  upstream backend-lc {
    least_conn;
    server worker1:8000;
    server worker2:8000;
    server worker3:8000;
    server worker4:8000;
  }

  upstream backend-ip {
    ip_hash;
    server worker1:8000;
    server worker2:8000;
    server worker3:8000;
    server worker4:8000;
  }

  upstream backend-backup {
    server worker1:8001 max_fails=1 fail_timeout=10s;
    server worker2:8001 max_fails=1 fail_timeout=10s;
    server worker3:8001 max_fails=1 fail_timeout=10s;
    server worker4:8000 backup;
  }

  server {
    listen 80;

    proxy_http_version 1.1;
    proxy_set_header Connection "";
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    proxy_connect_timeout 2s;
    proxy_send_timeout    5s;
    proxy_read_timeout    5s;

    location /round-robin {
      proxy_pass http://backend-rb/;
    }

    location /round-robin-weighted {
      proxy_pass http://backend-rbw/;
    }

    location /least-connections {
      proxy_pass http://backend-lc/;
    }

    location /ip-hash {
      proxy_pass http://backend-ip/;
    }

    location /backup {
      proxy_pass http://backend-backup/;
    }
  }
}
  • docker-compose.yml
services:
  worker1:
    build:
      context: .
    environment:
      - SERVER_NAME=server 1
    networks:
      - appnet

  worker2:
    build:
      context: .
    environment:
      - SERVER_NAME=server 2
    networks:
      - appnet

  worker3:
    build:
      context: .
    environment:
      - SERVER_NAME=server 3
    networks:
      - appnet

  worker4:
    build:
      context: .
    environment:
      - SERVER_NAME=server 4
    networks:
      - appnet

  nginx-lb:
    image: nginx:alpine
    depends_on:
      - worker1
      - worker2
      - worker3
      - worker4
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    ports:
      - "8080:80"
    networks:
      - appnet

networks:
  appnet:
    driver: bridge

Now, all you need to do is run docker compose up and you should have 4 workers and an instance of Nginx exposed at http://localhost:8080.

Round Robin Test

In this test we will send 100 requests to our server using the command below, the results show that the requests were evenly distributed which is as expected for the round robin approach.

for i in {1..100}; \
  do curl -s http://localhost:8080/round-robin; \
  done | sort | uniq -c
24 Hey from server 1
26 Hey from server 2
26 Hey from server 3
24 Hey from server 4

Weighted Round Robin Test

In this next test we again sent 100 requests but to our weighted round robin endpoint, as expected, server 1 and 2 took on more of the requests with the rest split between server 3 and 4.

The distribution was near spot on with a spread of 50% : 26% : 12% : 12% which matches our 4:2:1:1 distribution.

for i in {1..100}; \
  do curl -s http://localhost:8080/round-robin-weighted; \
  done | sort | uniq -c
50 Hey from server 1
26 Hey from server 2
12 Hey from server 3
12 Hey from server 4

Least Connection

This test will direct the request to the server with the least number of connections, in our test, this resulted in an even spread because all our requests are simple and finish very quickly with consistent latency, however if you had some requests take longer, this distribution would look different.

for i in {1..100}; \
  do curl -s http://localhost:8080/least-connections; \
  done | sort | uniq -c
25 Hey from server 1
25 Hey from server 2
25 Hey from server 3
25 Hey from server 4

IP Hash

The IP Hash algorithm will pick one server for the requesters IP address and then send all traffic to that server from now on, because we are sending our 100 requests from our local machine the IP is the same, so as expected all our requests, in this case, went to server 2.

for i in {1..100}; \
  do curl -s http://localhost:8080/ip-hash; \
  done | sort | uniq -c
100 Hey from server 2

Backup

Finally, to demo a situation where the servers are down and the backup server is used to take on requests, I changed the port for server 1 - 3 in our Nginx configuration causing a fail to occur, server 4 was designated as the backup server and as expected we can see it took all the requests.

for i in {1..100}; \
  do curl -s http://localhost:8080/backup; \
  done | sort | uniq -c
100 Hey from server 4

Wrapping up

That about covers it for this introduction to load balancing in Nginx, I hope you have found it useful and that you can now make the decision on which algorithm in best for you.

Enjoyed this post?

If so, please consider buying me a chilled pint to sip while writing the next one!

Gift a pint

Subscribe to Sam::blog()

Don’t miss out on the latest posts. Sign up now to get access to the library of members-only posts.
[email protected]
Subscribe