Skip to content

API Design

API (Application Programming Interface) design is the discipline of defining clear, consistent, and usable interfaces through which software components communicate. A well-designed API reduces integration friction, improves developer experience, and enables systems to evolve independently. Whether you are building public-facing APIs consumed by third-party developers or internal service-to-service interfaces within a microservices architecture, the principles covered in this chapter form the foundation of robust API engineering.


RESTful API Design Principles

REST (Representational State Transfer) is an architectural style defined by Roy Fielding in his 2000 doctoral dissertation. RESTful APIs model the world as resources identified by URIs and manipulated through a uniform interface — the standard HTTP methods.

Core Constraints of REST

  1. Client-Server — Separation of concerns between UI and data storage.
  2. Stateless — Each request contains all the information the server needs; no session state is stored server-side between requests.
  3. Cacheable — Responses must declare themselves cacheable or non-cacheable.
  4. Uniform Interface — A consistent way to interact with resources (URIs + HTTP verbs + representations).
  5. Layered System — Intermediaries (proxies, gateways, CDNs) can be inserted transparently.
  6. Code on Demand (optional) — Servers can extend client functionality by transferring executable code.

Resource Naming Conventions

Resources are the central abstraction in REST. Good URI design makes APIs intuitive.

Rules of thumb:

Guideline Good Bad
Use nouns, not verbs /users /getUsers
Use plural nouns /orders/42 /order/42
Use lowercase with hyphens /user-profiles /userProfiles, /User_Profiles
Nest for relationships /users/7/orders /getUserOrders?userId=7
Avoid deep nesting (max 2-3 levels) /users/7/orders /users/7/orders/12/items/3/reviews
Use query params for filtering /orders?status=shipped /orders/shipped
Resource hierarchy example:

/api/v1/organizations/{orgId}
/api/v1/organizations/{orgId}/teams
/api/v1/organizations/{orgId}/teams/{teamId}
/api/v1/organizations/{orgId}/teams/{teamId}/members

HTTP Methods and Their Semantics

Method CRUD Operation Request Body Idempotent Safe Typical Status Codes
GET Read No Yes Yes 200, 304, 404
POST Create Yes No No 201, 400, 409
PUT Full Replace Yes Yes No 200, 204, 404
PATCH Partial Update Yes No* No 200, 204, 404
DELETE Delete Rarely Yes No 200, 204, 404

*PATCH can be made idempotent with JSON Merge Patch (RFC 7396), but is not idempotent by default when using JSON Patch (RFC 6902) operations like "add to array."

Idempotency

An operation is idempotent if performing it multiple times produces the same result as performing it once. This property is critical for reliability — if a network timeout occurs after sending a request, the client can safely retry an idempotent call without side effects.

Idempotent:    PUT /users/42  { "name": "Alice" }   -->  Always sets name to Alice
Idempotent:    DELETE /users/42                      -->  User 42 is gone (or already gone)
NOT idempotent: POST /orders   { "item": "widget" }  -->  Creates a NEW order each time

Idempotency keys — For non-idempotent operations (like POST), clients can pass a unique key in a header so the server can detect and deduplicate retries:

POST /payments HTTP/1.1
Idempotency-Key: 8a3b1c9e-f7d2-4e6a-b5c8-1234abcd5678
Content-Type: application/json

{ "amount": 99.99, "currency": "USD" }

The server stores the idempotency key and, if it sees the same key again, returns the original response instead of processing the payment a second time.

HTTP Status Codes

Using the correct status code communicates the outcome of an operation unambiguously.

Success (2xx):

Code Meaning When to Use
200 OK Successful GET, PUT, PATCH, or DELETE
201 Created Successful POST that creates a resource
202 Accepted Request accepted for async processing
204 No Content Successful DELETE or PUT with no response body

Client Error (4xx):

Code Meaning When to Use
400 Bad Request Malformed syntax, invalid parameters
401 Unauthorized Missing or invalid authentication credentials
403 Forbidden Authenticated but lacks permission
404 Not Found Resource does not exist
405 Method Not Allowed HTTP method not supported on this resource
409 Conflict Conflicting state (e.g., duplicate creation)
422 Unprocessable Entity Syntactically valid but semantically wrong
429 Too Many Requests Rate limit exceeded

Server Error (5xx):

Code Meaning When to Use
500 Internal Server Error Unexpected server failure
502 Bad Gateway Upstream service returned invalid response
503 Service Unavailable Server is temporarily overloaded or in maintenance
504 Gateway Timeout Upstream service did not respond in time

GraphQL

GraphQL is a query language for APIs developed by Facebook (2012, open-sourced 2015). Instead of multiple endpoints returning fixed data shapes, GraphQL exposes a single endpoint and lets the client specify exactly the data it needs.

Schema Definition Language (SDL)

The schema is the contract between client and server:

type User {
  id: ID!
  name: String!
  email: String!
  posts: [Post!]!
  createdAt: DateTime!
}

type Post {
  id: ID!
  title: String!
  body: String!
  author: User!
  comments: [Comment!]!
}

type Comment {
  id: ID!
  text: String!
  author: User!
}

type Query {
  user(id: ID!): User
  users(limit: Int, offset: Int): [User!]!
  post(id: ID!): Post
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  updateUser(id: ID!, input: UpdateUserInput!): User!
  deleteUser(id: ID!): Boolean!
}

input CreateUserInput {
  name: String!
  email: String!
}

input UpdateUserInput {
  name: String
  email: String
}

type Subscription {
  postCreated: Post!
  commentAdded(postId: ID!): Comment!
}

Queries

Clients request exactly the fields they need:

query GetUserWithPosts {
  user(id: "42") {
    name
    email
    posts {
      title
      comments {
        text
        author {
          name
        }
      }
    }
  }
}

This solves the over-fetching problem (getting more data than needed) and the under-fetching problem (needing multiple REST calls to assemble a view).

Mutations

Mutations modify server-side data:

mutation CreateNewUser {
  createUser(input: { name: "Alice", email: "alice@example.com" }) {
    id
    name
    email
  }
}

Subscriptions

Subscriptions use WebSockets to push real-time updates:

subscription OnNewComment {
  commentAdded(postId: "101") {
    text
    author {
      name
    }
  }
}

The N+1 Problem

When resolving nested relationships naively, a query for N users with their posts issues 1 query for users + N queries for posts — the classic N+1 problem.

Query: users { posts { title } }

SQL executed (naive):
  SELECT * FROM users;              -- 1 query
  SELECT * FROM posts WHERE user_id = 1;  -- +1
  SELECT * FROM posts WHERE user_id = 2;  -- +1
  SELECT * FROM posts WHERE user_id = 3;  -- +1
  ...                                     -- = N+1 total

Solution — DataLoader pattern (batching + caching):

// Using Facebook's dataloader library
const DataLoader = require('dataloader');

const postLoader = new DataLoader(async (userIds) => {
  // Single batched query instead of N individual ones
  const posts = await db.query(
    'SELECT * FROM posts WHERE user_id IN (?)', [userIds]
  );
  // Group posts by user_id and return in same order as userIds
  const postsByUser = {};
  posts.forEach(p => {
    (postsByUser[p.user_id] ||= []).push(p);
  });
  return userIds.map(id => postsByUser[id] || []);
});

// Resolver
const resolvers = {
  User: {
    posts: (user) => postLoader.load(user.id),
  },
};

GraphQL vs REST — Tradeoffs

Aspect REST GraphQL
Endpoints Multiple (one per resource) Single endpoint
Data fetching Fixed response shape Client specifies exact fields
Over-fetching Common Eliminated
Under-fetching Multiple round trips needed Single request
Caching HTTP caching works naturally (GET + URL) Harder — requires client-side cache (Apollo, Relay)
File upload Native multipart support Requires extensions (multipart spec)
Error handling HTTP status codes Always 200; errors in response body
Versioning URL/header versioning Schema evolution (deprecate fields)
Learning curve Low Moderate
Tooling Mature (Postman, curl) Growing (GraphiQL, Apollo Studio)
Best for Simple CRUD, public APIs, caching-heavy Complex nested data, mobile apps, BFF

gRPC

gRPC (gRPC Remote Procedure Call) is a high-performance, open-source framework developed by Google. It uses Protocol Buffers (protobuf) as its interface definition language and serialization format, and communicates over HTTP/2.

Architecture Overview

┌──────────────┐         HTTP/2 + Protobuf         ┌──────────────┐
│   gRPC       │  ──────────────────────────────▶   │   gRPC       │
│   Client     │                                    │   Server     │
│              │  ◀──────────────────────────────   │              │
│  (stub)      │     Binary frames, multiplexed     │  (service)   │
└──────────────┘                                    └──────────────┘
       │                                                   │
       │  Generated code                     Generated code│
       ▼                                                   ▼
 ┌────────────┐                                  ┌────────────────┐
 │  .proto     │◀────── Shared contract ────────▶│   .proto       │
 │  definition │                                 │   definition   │
 └────────────┘                                  └────────────────┘

Protocol Buffers

Protocol Buffers (protobuf) are a language-neutral, platform-neutral mechanism for serializing structured data. They are smaller, faster, and more strongly typed than JSON.

Example .proto file:

syntax = "proto3";

package ecommerce;

// Service definition
service OrderService {
  // Unary RPC
  rpc GetOrder (GetOrderRequest) returns (Order);

  // Server streaming RPC - stream order status updates
  rpc TrackOrder (TrackOrderRequest) returns (stream OrderStatus);

  // Client streaming RPC - upload multiple items for a bulk order
  rpc CreateBulkOrder (stream OrderItem) returns (BulkOrderResponse);

  // Bidirectional streaming RPC - real-time chat with support
  rpc SupportChat (stream ChatMessage) returns (stream ChatMessage);
}

message GetOrderRequest {
  string order_id = 1;
}

message Order {
  string id = 1;
  string customer_id = 2;
  repeated OrderItem items = 3;
  OrderStatus status = 4;
  double total_amount = 5;
  google.protobuf.Timestamp created_at = 6;
}

message OrderItem {
  string product_id = 1;
  string name = 2;
  int32 quantity = 3;
  double price = 4;
}

enum OrderStatus {
  ORDER_STATUS_UNSPECIFIED = 0;
  ORDER_STATUS_PENDING = 1;
  ORDER_STATUS_CONFIRMED = 2;
  ORDER_STATUS_SHIPPED = 3;
  ORDER_STATUS_DELIVERED = 4;
}

message TrackOrderRequest {
  string order_id = 1;
}

message BulkOrderResponse {
  string order_id = 1;
  int32 items_accepted = 2;
  int32 items_rejected = 3;
}

message ChatMessage {
  string sender = 1;
  string text = 2;
  google.protobuf.Timestamp timestamp = 3;
}

RPC Types

Type Client Server Use Case
Unary 1 request 1 response Standard request/response (like REST)
Server streaming 1 request N responses Real-time feeds, large data downloads
Client streaming N requests 1 response File uploads, telemetry ingestion
Bidirectional streaming N requests N responses Chat, collaborative editing

Python gRPC Server Example

import grpc
from concurrent import futures
import order_pb2
import order_pb2_grpc

class OrderServiceServicer(order_pb2_grpc.OrderServiceServicer):
    def GetOrder(self, request, context):
        # Look up order from database
        order = db.get_order(request.order_id)
        if not order:
            context.set_code(grpc.StatusCode.NOT_FOUND)
            context.set_details(f"Order {request.order_id} not found")
            return order_pb2.Order()
        return order_pb2.Order(
            id=order.id,
            customer_id=order.customer_id,
            total_amount=order.total,
            status=order_pb2.ORDER_STATUS_CONFIRMED,
        )

    def TrackOrder(self, request, context):
        """Server streaming — pushes status updates to client."""
        for status_update in db.stream_order_updates(request.order_id):
            yield order_pb2.OrderStatus(
                status=status_update.status,
            )

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    order_pb2_grpc.add_OrderServiceServicer_to_server(
        OrderServiceServicer(), server
    )
    server.add_insecure_port("[::]:50051")
    server.start()
    server.wait_for_termination()

if __name__ == "__main__":
    serve()

gRPC vs REST

Aspect gRPC REST
Protocol HTTP/2 HTTP/1.1 or HTTP/2
Serialization Protobuf (binary) JSON (text)
Performance Very high (small payloads, multiplexing) Good (larger payloads, human-readable)
Streaming Native (4 types) Requires WebSockets or SSE
Code generation Built-in from .proto Optional (OpenAPI codegen)
Browser support Limited (requires gRPC-Web proxy) Native
Human readability Low (binary) High (JSON)
Contract Strict (protobuf schema) Loose (optional OpenAPI spec)
Best for Microservice-to-microservice, low latency Public APIs, browser clients, broad compatibility

API Versioning Strategies

APIs evolve over time. Breaking changes — removing fields, renaming endpoints, changing response shapes — require versioning so existing clients continue to function.

Strategy Comparison

Strategy Example Pros Cons
URI Path /api/v1/users Simple, explicit, easy to route Pollutes URI space, duplicates controllers
Query Parameter /api/users?version=1 Easy to default to latest Easy to forget, less visible
Custom Header X-API-Version: 1 Clean URIs, flexible Hidden from URL, harder to test in browser
Accept Header (Content Negotiation) Accept: application/vnd.myapi.v1+json Standards-based, clean URIs Complex, unfamiliar to many developers
No Versioning (Schema Evolution) Additive changes only Simplest, no version management Limits what changes you can make

URI Path Versioning (Most Common)

GET /api/v1/users/42          --> Returns v1 response shape
GET /api/v2/users/42          --> Returns v2 response shape (e.g., split name into first/last)
# Flask example with URI versioning
from flask import Flask, Blueprint

app = Flask(__name__)

v1 = Blueprint('v1', __name__, url_prefix='/api/v1')
v2 = Blueprint('v2', __name__, url_prefix='/api/v2')

@v1.route('/users/<int:user_id>')
def get_user_v1(user_id):
    user = db.get_user(user_id)
    return {"id": user.id, "name": user.full_name}

@v2.route('/users/<int:user_id>')
def get_user_v2(user_id):
    user = db.get_user(user_id)
    return {
        "id": user.id,
        "first_name": user.first_name,
        "last_name": user.last_name,
        "email": user.email,
    }

app.register_blueprint(v1)
app.register_blueprint(v2)

Header Versioning

GET /api/users/42 HTTP/1.1
Host: api.example.com
X-API-Version: 2
Accept: application/json

Best Practices for Versioning

  • Prefer additive, non-breaking changes whenever possible — adding new fields to a response, adding optional query parameters, or adding new endpoints does not require a new version.
  • Deprecate before removing. Announce deprecation with a Sunset header (RFC 8594) and give clients a migration window.
  • Support at most 2-3 active versions to keep maintenance manageable.
  • Document a clear deprecation policy — e.g., "Each major version is supported for 18 months after the next version is released."
HTTP/1.1 200 OK
Sunset: Sat, 01 Mar 2026 00:00:00 GMT
Deprecation: true
Link: <https://api.example.com/docs/migration-v1-to-v2>; rel="deprecation"

API Documentation

Good documentation is the difference between an API that developers love and one they avoid. The industry standard is the OpenAPI Specification (formerly Swagger).

OpenAPI / Swagger

OpenAPI is a specification format (YAML or JSON) that describes REST APIs in a machine-readable way. Tools like Swagger UI, Redoc, and Stoplight render it into interactive documentation.

┌──────────────┐        ┌───────────────────┐       ┌──────────────────┐
│  openapi.yaml│───────▶│  Swagger UI /     │──────▶│  Interactive     │
│  (spec file) │        │  Redoc / Stoplight│       │  Documentation   │
└──────────────┘        └───────────────────┘       └──────────────────┘
       │
       │  Also used for:
       ├──▶ Client SDK generation (openapi-generator)
       ├──▶ Server stub generation
       ├──▶ Contract testing
       └──▶ Mock servers

Example OpenAPI Spec Snippet

openapi: 3.0.3
info:
  title: Bookstore API
  description: API for managing a bookstore catalog
  version: 1.0.0
  contact:
    name: API Support
    email: api@bookstore.com

servers:
  - url: https://api.bookstore.com/v1
    description: Production
  - url: https://staging-api.bookstore.com/v1
    description: Staging

paths:
  /books:
    get:
      summary: List all books
      operationId: listBooks
      tags:
        - Books
      parameters:
        - name: genre
          in: query
          schema:
            type: string
            enum: [fiction, non-fiction, science, history]
        - name: limit
          in: query
          schema:
            type: integer
            default: 20
            maximum: 100
        - name: cursor
          in: query
          schema:
            type: string
          description: Cursor for pagination
      responses:
        '200':
          description: A paginated list of books
          content:
            application/json:
              schema:
                type: object
                properties:
                  data:
                    type: array
                    items:
                      $ref: '#/components/schemas/Book'
                  pagination:
                    $ref: '#/components/schemas/CursorPagination'
        '400':
          description: Invalid query parameters
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ProblemDetails'

    post:
      summary: Create a new book
      operationId: createBook
      tags:
        - Books
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateBookRequest'
      responses:
        '201':
          description: Book created successfully
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Book'
          headers:
            Location:
              schema:
                type: string
              description: URI of the newly created book
        '400':
          $ref: '#/components/responses/BadRequest'
        '409':
          description: Book with same ISBN already exists

components:
  schemas:
    Book:
      type: object
      required: [id, title, author, isbn]
      properties:
        id:
          type: string
          format: uuid
        title:
          type: string
          example: "The Pragmatic Programmer"
        author:
          type: string
          example: "David Thomas, Andrew Hunt"
        isbn:
          type: string
          pattern: '^\d{3}-\d{10}$'
          example: "978-0135957059"
        genre:
          type: string
        published_date:
          type: string
          format: date
        price:
          type: number
          format: float

    CreateBookRequest:
      type: object
      required: [title, author, isbn]
      properties:
        title:
          type: string
        author:
          type: string
        isbn:
          type: string
        genre:
          type: string
        price:
          type: number

    CursorPagination:
      type: object
      properties:
        next_cursor:
          type: string
          nullable: true
        has_more:
          type: boolean

    ProblemDetails:
      type: object
      properties:
        type:
          type: string
          format: uri
        title:
          type: string
        status:
          type: integer
        detail:
          type: string
        instance:
          type: string
          format: uri

Pagination, Filtering, and Sorting Patterns

Any API that returns collections of resources needs strategies for pagination, filtering, and sorting to keep responses manageable and performant.

Offset-Based Pagination

The simplest approach: specify a page number (or offset) and a page size.

GET /api/v1/products?page=3&page_size=25
{
  "data": [ ... ],
  "pagination": {
    "page": 3,
    "page_size": 25,
    "total_items": 1042,
    "total_pages": 42
  }
}

Pros:

  • Simple to implement and understand.
  • Allows jumping to any page directly.
  • Easy to display "Page X of Y" in UIs.

Cons:

  • Inconsistent results when data changes between requests — items can be skipped or duplicated if rows are inserted or deleted.
  • Poor performance on large offsetsOFFSET 100000 still scans and discards 100,000 rows in most databases.

Cursor-Based Pagination

Uses an opaque cursor (often a base64-encoded identifier or timestamp) to mark the position in the result set.

GET /api/v1/products?limit=25&cursor=eyJpZCI6MTUwfQ==
{
  "data": [ ... ],
  "pagination": {
    "next_cursor": "eyJpZCI6MTc1fQ==",
    "has_more": true
  }
}

Behind the scenes, the cursor decodes to {"id": 150}, and the query becomes:

SELECT * FROM products
WHERE id > 150
ORDER BY id ASC
LIMIT 25;

Pros:

  • Stable results — inserts and deletes between pages do not cause skips or duplicates.
  • Constant performance — uses indexed WHERE clause instead of OFFSET.

Cons:

  • Cannot jump to an arbitrary page.
  • Cursor must encode enough information to recreate the query position.

Comparison

Offset Pagination                    Cursor Pagination
┌──────────────────────┐            ┌──────────────────────┐
│ Page 1: items 1-25   │            │ Start ──▶ cursor_A   │
│ Page 2: items 26-50  │            │ cursor_A ──▶ cursor_B│
│ Page 3: items 51-75  │            │ cursor_B ──▶ cursor_C│
│ ...                  │            │ ...                  │
│ Can jump to Page N   │            │ Must traverse in     │
│ O(N) for large       │            │ order. O(1) per page │
│ offsets              │            │                      │
└──────────────────────┘            └──────────────────────┘

Filtering

Use query parameters to filter resource collections:

GET /api/v1/products?category=electronics&price_min=100&price_max=500&in_stock=true

For more expressive filtering, some APIs adopt a filter syntax:

GET /api/v1/products?filter[category]=electronics&filter[price][gte]=100&filter[price][lte]=500

Or use a single filter parameter with a simple expression language:

GET /api/v1/products?filter=category eq "electronics" and price gte 100

Sorting

Use a sort query parameter. Prefix with - for descending order:

GET /api/v1/products?sort=-price,name

This sorts by price descending, then by name ascending.

# Python implementation sketch
from flask import request

@app.route('/api/v1/products')
def list_products():
    # Parse sorting
    sort_param = request.args.get('sort', 'id')
    order_clauses = []
    for field in sort_param.split(','):
        if field.startswith('-'):
            order_clauses.append(f"{field[1:]} DESC")
        else:
            order_clauses.append(f"{field} ASC")

    # Parse filtering
    category = request.args.get('category')
    price_min = request.args.get('price_min', type=float)
    price_max = request.args.get('price_max', type=float)

    # Parse pagination
    cursor = request.args.get('cursor')
    limit = min(request.args.get('limit', 25, type=int), 100)

    products = db.query_products(
        category=category,
        price_min=price_min,
        price_max=price_max,
        order_by=order_clauses,
        cursor=cursor,
        limit=limit + 1,  # Fetch one extra to determine has_more
    )

    has_more = len(products) > limit
    if has_more:
        products = products[:limit]

    return {
        "data": [p.to_dict() for p in products],
        "pagination": {
            "next_cursor": encode_cursor(products[-1]) if has_more else None,
            "has_more": has_more,
        }
    }

HATEOAS

HATEOAS (Hypermedia As The Engine Of Application State) is a REST constraint where the server includes hyperlinks in responses that tell the client what actions are available next. Instead of hardcoding URLs, clients discover them dynamically.

In practice, full HATEOAS is rarely implemented outside of enterprise APIs, but including relevant links improves discoverability and decouples clients from URL structures.

{
  "id": "order-42",
  "status": "pending",
  "total": 149.99,
  "items": [
    {
      "product_id": "prod-7",
      "name": "Mechanical Keyboard",
      "quantity": 1,
      "price": 149.99,
      "_links": {
        "product": { "href": "/api/v1/products/prod-7", "method": "GET" }
      }
    }
  ],
  "_links": {
    "self":    { "href": "/api/v1/orders/order-42", "method": "GET" },
    "cancel":  { "href": "/api/v1/orders/order-42/cancel", "method": "POST" },
    "payment": { "href": "/api/v1/orders/order-42/pay", "method": "POST" },
    "customer": { "href": "/api/v1/users/user-7", "method": "GET" }
  }
}

Once the order is confirmed, the cancel and payment links might disappear and a track link might appear — the available transitions change based on the resource's current state.

{
  "id": "order-42",
  "status": "shipped",
  "total": 149.99,
  "_links": {
    "self":  { "href": "/api/v1/orders/order-42", "method": "GET" },
    "track": { "href": "/api/v1/orders/order-42/tracking", "method": "GET" }
  }
}

Error Handling and Response Formats

Consistent, informative error responses are essential for a good developer experience. Clients should never have to guess what went wrong.

Principles

  1. Use appropriate HTTP status codes — do not return 200 for errors.
  2. Provide a consistent error response structure across all endpoints.
  3. Include enough detail for the developer to understand and fix the issue.
  4. Never leak internal details (stack traces, SQL queries, internal paths) in production.
  5. Use machine-readable error codes in addition to human-readable messages.

Consistent Error Response Format

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "The request body contains invalid fields.",
    "details": [
      {
        "field": "email",
        "issue": "Must be a valid email address.",
        "value": "not-an-email"
      },
      {
        "field": "age",
        "issue": "Must be a positive integer.",
        "value": -5
      }
    ],
    "request_id": "req-a1b2c3d4",
    "documentation_url": "https://api.example.com/docs/errors#VALIDATION_ERROR"
  }
}

RFC 7807 — Problem Details for HTTP APIs

RFC 7807 defines a standard format for error responses. It is widely adopted and supported by many frameworks.

HTTP/1.1 403 Forbidden
Content-Type: application/problem+json

{
  "type": "https://api.example.com/errors/insufficient-funds",
  "title": "Insufficient Funds",
  "status": 403,
  "detail": "Your account balance of $10.00 is insufficient for a $25.00 purchase.",
  "instance": "/api/v1/payments/txn-789",
  "balance": 10.00,
  "required": 25.00
}

Standard fields:

Field Required Description
type Yes A URI reference identifying the error type (can be a docs link)
title Yes Short human-readable summary
status Yes HTTP status code
detail No Human-readable explanation specific to this occurrence
instance No A URI identifying this specific occurrence

Additional fields (like balance and required above) can be added for context.

Implementation Example

from flask import Flask, jsonify

app = Flask(__name__)

class APIError(Exception):
    def __init__(self, status, error_type, title, detail=None, **extra):
        self.status = status
        self.error_type = error_type
        self.title = title
        self.detail = detail
        self.extra = extra

@app.errorhandler(APIError)
def handle_api_error(error):
    response = {
        "type": f"https://api.example.com/errors/{error.error_type}",
        "title": error.title,
        "status": error.status,
    }
    if error.detail:
        response["detail"] = error.detail
    response.update(error.extra)

    return jsonify(response), error.status, {
        "Content-Type": "application/problem+json"
    }

@app.route('/api/v1/orders/<order_id>/pay', methods=['POST'])
def pay_order(order_id):
    order = db.get_order(order_id)
    account = db.get_account(request.user_id)

    if account.balance < order.total:
        raise APIError(
            status=403,
            error_type="insufficient-funds",
            title="Insufficient Funds",
            detail=f"Balance of ${account.balance:.2f} is less than "
                   f"the required ${order.total:.2f}.",
            balance=account.balance,
            required=order.total,
        )

    # Process payment ...
    return jsonify({"status": "paid"}), 200

Rate Limiting

Rate limiting protects APIs from abuse, ensures fair usage among clients, and prevents individual consumers from overwhelming backend services. It is a critical component of both API design and system design.

Cross-reference: For rate limiting in the context of system design, see Chapter 7.1 — System Design.

Common Algorithms

Token Bucket

A bucket holds up to N tokens. Each request consumes one token. Tokens are added at a fixed rate. When the bucket is empty, requests are rejected (or queued).

Token Bucket (capacity=5, refill_rate=1/sec)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Time 0s:  [*][*][*][*][*]  5 tokens (full)
  Request → [*][*][*][*][ ]  4 tokens (1 consumed)
  Request → [*][*][*][ ][ ]  3 tokens
  Request → [*][*][ ][ ][ ]  2 tokens
  Time 1s:  [*][*][*][ ][ ]  3 tokens (1 refilled)
  Request → [*][*][ ][ ][ ]  2 tokens
  ...
  Time 0s:  [ ][ ][ ][ ][ ]  0 tokens
  Request → REJECTED (429 Too Many Requests)

Characteristics:

  • Allows short bursts up to the bucket capacity.
  • Smooths traffic to the refill rate over time.
  • Simple to implement; used by AWS, Stripe, and many others.
import time

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.tokens = capacity
        self.last_refill = time.monotonic()

    def allow_request(self) -> bool:
        now = time.monotonic()
        elapsed = now - self.last_refill
        self.tokens = min(
            self.capacity,
            self.tokens + elapsed * self.refill_rate,
        )
        self.last_refill = now

        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

Sliding Window Log

Stores the timestamp of every request in the window. Counts requests by checking how many timestamps fall within the current window.

Sliding Window (window=60s, limit=5)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Timeline (seconds):
  0    10    20    30    40    50    60    70
  |─────|─────|─────|─────|─────|─────|─────|
  R     R     R           R     R           R?
  1     2     3           4     5
                                         ───┤
                                    At t=70: │
                              Window = [10,70]│
                          Requests in window: │
                          10, 20, 40, 50 = 4  │
                          R at 70 → ALLOWED   │
                                         ────┘

Characteristics:

  • Precise — no boundary effects.
  • Higher memory usage (stores all timestamps within the window).
  • Used when exact rate enforcement is required.

Sliding Window Counter

A memory-efficient approximation that combines fixed window counts with a weighted overlap.

Window counter estimation:

  Previous window       Current window
  (40 requests)         (15 requests so far)
  ┌────────────────┐┌───────┬────────┐
  │  60s window    ││ 25%   │  75%   │
  │  ended         ││elapsed│remaining│
  └────────────────┘└───────┴────────┘

  Estimated count = (prev_count × remaining%) + current_count
                  = (40 × 0.75) + 15
                  = 30 + 15 = 45

  If limit is 50 → ALLOWED (45 < 50)

Rate Limit Response Headers

Communicate rate limit status to clients using standard headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 67
X-RateLimit-Reset: 1708732800
Retry-After: 30
Header Description
X-RateLimit-Limit Maximum requests allowed in the window
X-RateLimit-Remaining Requests remaining in the current window
X-RateLimit-Reset Unix timestamp when the window resets
Retry-After Seconds (or date) before the client should retry (sent with 429)

When the limit is exceeded:

HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708732800

{
  "type": "https://api.example.com/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "You have exceeded 100 requests per minute. Try again in 30 seconds."
}

Rate Limiting Strategies by Scope

Scope Key Example
Per user User ID or API key 1000 req/hour per user
Per IP Client IP address 100 req/minute per IP
Per endpoint Method + path 10 POST /login per minute per IP
Global None 50,000 req/minute total for the service
Tiered Subscription plan Free: 100/hr, Pro: 10,000/hr, Enterprise: unlimited

Summary — Choosing the Right API Style

Factor REST GraphQL gRPC
Client type Browsers, third-party devs Mobile apps, complex UIs Internal microservices
Performance needs Moderate Moderate High
Data relationships Simple, flat Complex, nested Varies
Real-time Polling / SSE / WebSocket Subscriptions Native streaming
Schema enforcement Optional (OpenAPI) Required (SDL) Required (Protobuf)
Ecosystem maturity Very mature Mature Mature
Human debuggability High (JSON + URLs) Medium (JSON + single endpoint) Low (binary)

In practice, many systems use a combination: REST for public-facing APIs, GraphQL as a Backend-for-Frontend (BFF) layer aggregating multiple services, and gRPC for internal service-to-service communication where performance matters most.