API Design¶

API (Application Programming Interface) design is the discipline of defining clear, consistent, and usable interfaces through which software components communicate. A well-designed API reduces integration friction, improves developer experience, and enables systems to evolve independently. Whether you are building public-facing APIs consumed by third-party developers or internal service-to-service interfaces within a microservices architecture, the principles covered in this chapter form the foundation of robust API engineering.

RESTful API Design Principles¶

REST (Representational State Transfer) is an architectural style defined by Roy Fielding in his 2000 doctoral dissertation. RESTful APIs model the world as resources identified by URIs and manipulated through a uniform interface — the standard HTTP methods.

Core Constraints of REST¶

Client-Server — Separation of concerns between UI and data storage.
Stateless — Each request contains all the information the server needs; no session state is stored server-side between requests.
Cacheable — Responses must declare themselves cacheable or non-cacheable.
Uniform Interface — A consistent way to interact with resources (URIs + HTTP verbs + representations).
Layered System — Intermediaries (proxies, gateways, CDNs) can be inserted transparently.
Code on Demand (optional) — Servers can extend client functionality by transferring executable code.

Resource Naming Conventions¶

Resources are the central abstraction in REST. Good URI design makes APIs intuitive.

Rules of thumb:

Guideline	Good	Bad
Use nouns, not verbs	`/users`	`/getUsers`
Use plural nouns	`/orders/42`	`/order/42`
Use lowercase with hyphens	`/user-profiles`	`/userProfiles`, `/User_Profiles`
Nest for relationships	`/users/7/orders`	`/getUserOrders?userId=7`
Avoid deep nesting (max 2-3 levels)	`/users/7/orders`	`/users/7/orders/12/items/3/reviews`
Use query params for filtering	`/orders?status=shipped`	`/orders/shipped`

Resource hierarchy example:

/api/v1/organizations/{orgId}
/api/v1/organizations/{orgId}/teams
/api/v1/organizations/{orgId}/teams/{teamId}
/api/v1/organizations/{orgId}/teams/{teamId}/members

HTTP Methods and Their Semantics¶

Method	CRUD Operation	Request Body	Idempotent	Safe	Typical Status Codes
`GET`	Read	No	Yes	Yes	200, 304, 404
`POST`	Create	Yes	No	No	201, 400, 409
`PUT`	Full Replace	Yes	Yes	No	200, 204, 404
`PATCH`	Partial Update	Yes	No*	No	200, 204, 404
`DELETE`	Delete	Rarely	Yes	No	200, 204, 404

*PATCH can be made idempotent with JSON Merge Patch (RFC 7396), but is not idempotent by default when using JSON Patch (RFC 6902) operations like "add to array."

Idempotency¶

An operation is idempotent if performing it multiple times produces the same result as performing it once. This property is critical for reliability — if a network timeout occurs after sending a request, the client can safely retry an idempotent call without side effects.

Idempotent:    PUT /users/42  { "name": "Alice" }   -->  Always sets name to Alice
Idempotent:    DELETE /users/42                      -->  User 42 is gone (or already gone)
NOT idempotent: POST /orders   { "item": "widget" }  -->  Creates a NEW order each time

Idempotency keys — For non-idempotent operations (like POST), clients can pass a unique key in a header so the server can detect and deduplicate retries:

POST /payments HTTP/1.1
Idempotency-Key: 8a3b1c9e-f7d2-4e6a-b5c8-1234abcd5678
Content-Type: application/json

{ "amount": 99.99, "currency": "USD" }

The server stores the idempotency key and, if it sees the same key again, returns the original response instead of processing the payment a second time.

HTTP Status Codes¶

Using the correct status code communicates the outcome of an operation unambiguously.

Success (2xx):

Code	Meaning	When to Use
200	OK	Successful GET, PUT, PATCH, or DELETE
201	Created	Successful POST that creates a resource
202	Accepted	Request accepted for async processing
204	No Content	Successful DELETE or PUT with no response body

Client Error (4xx):

Code	Meaning	When to Use
400	Bad Request	Malformed syntax, invalid parameters
401	Unauthorized	Missing or invalid authentication credentials
403	Forbidden	Authenticated but lacks permission
404	Not Found	Resource does not exist
405	Method Not Allowed	HTTP method not supported on this resource
409	Conflict	Conflicting state (e.g., duplicate creation)
422	Unprocessable Entity	Syntactically valid but semantically wrong
429	Too Many Requests	Rate limit exceeded

Server Error (5xx):

Code	Meaning	When to Use
500	Internal Server Error	Unexpected server failure
502	Bad Gateway	Upstream service returned invalid response
503	Service Unavailable	Server is temporarily overloaded or in maintenance
504	Gateway Timeout	Upstream service did not respond in time

GraphQL¶

GraphQL is a query language for APIs developed by Facebook (2012, open-sourced 2015). Instead of multiple endpoints returning fixed data shapes, GraphQL exposes a single endpoint and lets the client specify exactly the data it needs.

Schema Definition Language (SDL)¶

The schema is the contract between client and server:

type User {
  id: ID!
  name: String!
  email: String!
  posts: [Post!]!
  createdAt: DateTime!
}

type Post {
  id: ID!
  title: String!
  body: String!
  author: User!
  comments: [Comment!]!
}

type Comment {
  id: ID!
  text: String!
  author: User!
}

type Query {
  user(id: ID!): User
  users(limit: Int, offset: Int): [User!]!
  post(id: ID!): Post
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  updateUser(id: ID!, input: UpdateUserInput!): User!
  deleteUser(id: ID!): Boolean!
}

input CreateUserInput {
  name: String!
  email: String!
}

input UpdateUserInput {
  name: String
  email: String
}

type Subscription {
  postCreated: Post!
  commentAdded(postId: ID!): Comment!
}

Queries¶

Clients request exactly the fields they need:

query GetUserWithPosts {
  user(id: "42") {
    name
    email
    posts {
      title
      comments {
        text
        author {
          name
        }
      }
    }
  }
}

This solves the over-fetching problem (getting more data than needed) and the under-fetching problem (needing multiple REST calls to assemble a view).

Mutations¶

Mutations modify server-side data:

mutation CreateNewUser {
  createUser(input: { name: "Alice", email: "alice@example.com" }) {
    id
    name
    email
  }
}

Subscriptions¶

Subscriptions use WebSockets to push real-time updates:

subscription OnNewComment {
  commentAdded(postId: "101") {
    text
    author {
      name
    }
  }
}

The N+1 Problem¶

When resolving nested relationships naively, a query for N users with their posts issues 1 query for users + N queries for posts — the classic N+1 problem.

Query: users { posts { title } }

SQL executed (naive):
  SELECT * FROM users;              -- 1 query
  SELECT * FROM posts WHERE user_id = 1;  -- +1
  SELECT * FROM posts WHERE user_id = 2;  -- +1
  SELECT * FROM posts WHERE user_id = 3;  -- +1
  ...                                     -- = N+1 total

Solution — DataLoader pattern (batching + caching):

// Using Facebook's dataloader library
const DataLoader = require('dataloader');

const postLoader = new DataLoader(async (userIds) => {
  // Single batched query instead of N individual ones
  const posts = await db.query(
    'SELECT * FROM posts WHERE user_id IN (?)', [userIds]
  );
  // Group posts by user_id and return in same order as userIds
  const postsByUser = {};
  posts.forEach(p => {
    (postsByUser[p.user_id] ||= []).push(p);
  });
  return userIds.map(id => postsByUser[id] || []);
});

// Resolver
const resolvers = {
  User: {
    posts: (user) => postLoader.load(user.id),
  },
};

GraphQL vs REST — Tradeoffs¶

Aspect	REST	GraphQL
Endpoints	Multiple (one per resource)	Single endpoint
Data fetching	Fixed response shape	Client specifies exact fields
Over-fetching	Common	Eliminated
Under-fetching	Multiple round trips needed	Single request
Caching	HTTP caching works naturally (GET + URL)	Harder — requires client-side cache (Apollo, Relay)
File upload	Native multipart support	Requires extensions (multipart spec)
Error handling	HTTP status codes	Always 200; errors in response body
Versioning	URL/header versioning	Schema evolution (deprecate fields)
Learning curve	Low	Moderate
Tooling	Mature (Postman, curl)	Growing (GraphiQL, Apollo Studio)
Best for	Simple CRUD, public APIs, caching-heavy	Complex nested data, mobile apps, BFF

gRPC¶

gRPC (gRPC Remote Procedure Call) is a high-performance, open-source framework developed by Google. It uses Protocol Buffers (protobuf) as its interface definition language and serialization format, and communicates over HTTP/2.

Architecture Overview¶

┌──────────────┐         HTTP/2 + Protobuf         ┌──────────────┐
│   gRPC       │  ──────────────────────────────▶   │   gRPC       │
│   Client     │                                    │   Server     │
│              │  ◀──────────────────────────────   │              │
│  (stub)      │     Binary frames, multiplexed     │  (service)   │
└──────────────┘                                    └──────────────┘
       │                                                   │
       │  Generated code                     Generated code│
       ▼                                                   ▼
 ┌────────────┐                                  ┌────────────────┐
 │  .proto     │◀────── Shared contract ────────▶│   .proto       │
 │  definition │                                 │   definition   │
 └────────────┘                                  └────────────────┘

Protocol Buffers¶

Protocol Buffers (protobuf) are a language-neutral, platform-neutral mechanism for serializing structured data. They are smaller, faster, and more strongly typed than JSON.

Example .proto file:

syntax = "proto3";

package ecommerce;

// Service definition
service OrderService {
  // Unary RPC
  rpc GetOrder (GetOrderRequest) returns (Order);

  // Server streaming RPC - stream order status updates
  rpc TrackOrder (TrackOrderRequest) returns (stream OrderStatus);

  // Client streaming RPC - upload multiple items for a bulk order
  rpc CreateBulkOrder (stream OrderItem) returns (BulkOrderResponse);

  // Bidirectional streaming RPC - real-time chat with support
  rpc SupportChat (stream ChatMessage) returns (stream ChatMessage);
}

message GetOrderRequest {
  string order_id = 1;
}

message Order {
  string id = 1;
  string customer_id = 2;
  repeated OrderItem items = 3;
  OrderStatus status = 4;
  double total_amount = 5;
  google.protobuf.Timestamp created_at = 6;
}

message OrderItem {
  string product_id = 1;
  string name = 2;
  int32 quantity = 3;
  double price = 4;
}

enum OrderStatus {
  ORDER_STATUS_UNSPECIFIED = 0;
  ORDER_STATUS_PENDING = 1;
  ORDER_STATUS_CONFIRMED = 2;
  ORDER_STATUS_SHIPPED = 3;
  ORDER_STATUS_DELIVERED = 4;
}

message TrackOrderRequest {
  string order_id = 1;
}

message BulkOrderResponse {
  string order_id = 1;
  int32 items_accepted = 2;
  int32 items_rejected = 3;
}

message ChatMessage {
  string sender = 1;
  string text = 2;
  google.protobuf.Timestamp timestamp = 3;
}

RPC Types¶

Type	Client	Server	Use Case
Unary	1 request	1 response	Standard request/response (like REST)
Server streaming	1 request	N responses	Real-time feeds, large data downloads
Client streaming	N requests	1 response	File uploads, telemetry ingestion
Bidirectional streaming	N requests	N responses	Chat, collaborative editing

Python gRPC Server Example¶

import grpc
from concurrent import futures
import order_pb2
import order_pb2_grpc

class OrderServiceServicer(order_pb2_grpc.OrderServiceServicer):
    def GetOrder(self, request, context):
        # Look up order from database
        order = db.get_order(request.order_id)
        if not order:
            context.set_code(grpc.StatusCode.NOT_FOUND)
            context.set_details(f"Order {request.order_id} not found")
            return order_pb2.Order()
        return order_pb2.Order(
            id=order.id,
            customer_id=order.customer_id,
            total_amount=order.total,
            status=order_pb2.ORDER_STATUS_CONFIRMED,
        )

    def TrackOrder(self, request, context):
        """Server streaming — pushes status updates to client."""
        for status_update in db.stream_order_updates(request.order_id):
            yield order_pb2.OrderStatus(
                status=status_update.status,
            )

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    order_pb2_grpc.add_OrderServiceServicer_to_server(
        OrderServiceServicer(), server
    )
    server.add_insecure_port("[::]:50051")
    server.start()
    server.wait_for_termination()

if __name__ == "__main__":
    serve()

gRPC vs REST¶

Aspect	gRPC	REST
Protocol	HTTP/2	HTTP/1.1 or HTTP/2
Serialization	Protobuf (binary)	JSON (text)
Performance	Very high (small payloads, multiplexing)	Good (larger payloads, human-readable)
Streaming	Native (4 types)	Requires WebSockets or SSE
Code generation	Built-in from `.proto`	Optional (OpenAPI codegen)
Browser support	Limited (requires gRPC-Web proxy)	Native
Human readability	Low (binary)	High (JSON)
Contract	Strict (protobuf schema)	Loose (optional OpenAPI spec)
Best for	Microservice-to-microservice, low latency	Public APIs, browser clients, broad compatibility

API Versioning Strategies¶

APIs evolve over time. Breaking changes — removing fields, renaming endpoints, changing response shapes — require versioning so existing clients continue to function.

Strategy Comparison¶

Strategy	Example	Pros	Cons
URI Path	`/api/v1/users`	Simple, explicit, easy to route	Pollutes URI space, duplicates controllers
Query Parameter	`/api/users?version=1`	Easy to default to latest	Easy to forget, less visible
Custom Header	`X-API-Version: 1`	Clean URIs, flexible	Hidden from URL, harder to test in browser
Accept Header (Content Negotiation)	`Accept: application/vnd.myapi.v1+json`	Standards-based, clean URIs	Complex, unfamiliar to many developers
No Versioning (Schema Evolution)	Additive changes only	Simplest, no version management	Limits what changes you can make

URI Path Versioning (Most Common)¶

GET /api/v1/users/42          --> Returns v1 response shape
GET /api/v2/users/42          --> Returns v2 response shape (e.g., split name into first/last)

# Flask example with URI versioning
from flask import Flask, Blueprint

app = Flask(__name__)

v1 = Blueprint('v1', __name__, url_prefix='/api/v1')
v2 = Blueprint('v2', __name__, url_prefix='/api/v2')

@v1.route('/users/<int:user_id>')
def get_user_v1(user_id):
    user = db.get_user(user_id)
    return {"id": user.id, "name": user.full_name}

@v2.route('/users/<int:user_id>')
def get_user_v2(user_id):
    user = db.get_user(user_id)
    return {
        "id": user.id,
        "first_name": user.first_name,
        "last_name": user.last_name,
        "email": user.email,
    }

app.register_blueprint(v1)
app.register_blueprint(v2)

Header Versioning¶

GET /api/users/42 HTTP/1.1
Host: api.example.com
X-API-Version: 2
Accept: application/json

Best Practices for Versioning¶

Prefer additive, non-breaking changes whenever possible — adding new fields to a response, adding optional query parameters, or adding new endpoints does not require a new version.
Deprecate before removing. Announce deprecation with a Sunset header (RFC 8594) and give clients a migration window.
Support at most 2-3 active versions to keep maintenance manageable.
Document a clear deprecation policy — e.g., "Each major version is supported for 18 months after the next version is released."

HTTP/1.1 200 OK
Sunset: Sat, 01 Mar 2026 00:00:00 GMT
Deprecation: true
Link: <https://api.example.com/docs/migration-v1-to-v2>; rel="deprecation"

API Documentation¶

Good documentation is the difference between an API that developers love and one they avoid. The industry standard is the OpenAPI Specification (formerly Swagger).

OpenAPI / Swagger¶

OpenAPI is a specification format (YAML or JSON) that describes REST APIs in a machine-readable way. Tools like Swagger UI, Redoc, and Stoplight render it into interactive documentation.

┌──────────────┐        ┌───────────────────┐       ┌──────────────────┐
│  openapi.yaml│───────▶│  Swagger UI /     │──────▶│  Interactive     │
│  (spec file) │        │  Redoc / Stoplight│       │  Documentation   │
└──────────────┘        └───────────────────┘       └──────────────────┘
       │
       │  Also used for:
       ├──▶ Client SDK generation (openapi-generator)
       ├──▶ Server stub generation
       ├──▶ Contract testing
       └──▶ Mock servers

Example OpenAPI Spec Snippet¶

openapi: 3.0.3
info:
  title: Bookstore API
  description: API for managing a bookstore catalog
  version: 1.0.0
  contact:
    name: API Support
    email: api@bookstore.com

servers:
  - url: https://api.bookstore.com/v1
    description: Production
  - url: https://staging-api.bookstore.com/v1
    description: Staging

paths:
  /books:
    get:
      summary: List all books
      operationId: listBooks
      tags:
        - Books
      parameters:
        - name: genre
          in: query
          schema:
            type: string
            enum: [fiction, non-fiction, science, history]
        - name: limit
          in: query
          schema:
            type: integer
            default: 20
            maximum: 100
        - name: cursor
          in: query
          schema:
            type: string
          description: Cursor for pagination
      responses:
        '200':
          description: A paginated list of books
          content:
            application/json:
              schema:
                type: object
                properties:
                  data:
                    type: array
                    items:
                      $ref: '#/components/schemas/Book'
                  pagination:
                    $ref: '#/components/schemas/CursorPagination'
        '400':
          description: Invalid query parameters
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ProblemDetails'

    post:
      summary: Create a new book
      operationId: createBook
      tags:
        - Books
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateBookRequest'
      responses:
        '201':
          description: Book created successfully
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Book'
          headers:
            Location:
              schema:
                type: string
              description: URI of the newly created book
        '400':
          $ref: '#/components/responses/BadRequest'
        '409':
          description: Book with same ISBN already exists

components:
  schemas:
    Book:
      type: object
      required: [id, title, author, isbn]
      properties:
        id:
          type: string
          format: uuid
        title:
          type: string
          example: "The Pragmatic Programmer"
        author:
          type: string
          example: "David Thomas, Andrew Hunt"
        isbn:
          type: string
          pattern: '^\d{3}-\d{10}$'
          example: "978-0135957059"
        genre:
          type: string
        published_date:
          type: string
          format: date
        price:
          type: number
          format: float

    CreateBookRequest:
      type: object
      required: [title, author, isbn]
      properties:
        title:
          type: string
        author:
          type: string
        isbn:
          type: string
        genre:
          type: string
        price:
          type: number

    CursorPagination:
      type: object
      properties:
        next_cursor:
          type: string
          nullable: true
        has_more:
          type: boolean

    ProblemDetails:
      type: object
      properties:
        type:
          type: string
          format: uri
        title:
          type: string
        status:
          type: integer
        detail:
          type: string
        instance:
          type: string
          format: uri

Pagination, Filtering, and Sorting Patterns¶

Any API that returns collections of resources needs strategies for pagination, filtering, and sorting to keep responses manageable and performant.

Offset-Based Pagination¶

The simplest approach: specify a page number (or offset) and a page size.

GET /api/v1/products?page=3&page_size=25

{
  "data": [ ... ],
  "pagination": {
    "page": 3,
    "page_size": 25,
    "total_items": 1042,
    "total_pages": 42
  }
}

Pros:

Simple to implement and understand.
Allows jumping to any page directly.
Easy to display "Page X of Y" in UIs.

Cons:

Inconsistent results when data changes between requests — items can be skipped or duplicated if rows are inserted or deleted.
Poor performance on large offsets — OFFSET 100000 still scans and discards 100,000 rows in most databases.

Cursor-Based Pagination¶

Uses an opaque cursor (often a base64-encoded identifier or timestamp) to mark the position in the result set.

GET /api/v1/products?limit=25&cursor=eyJpZCI6MTUwfQ==

{
  "data": [ ... ],
  "pagination": {
    "next_cursor": "eyJpZCI6MTc1fQ==",
    "has_more": true
  }
}

Behind the scenes, the cursor decodes to {"id": 150}, and the query becomes:

SELECT * FROM products
WHERE id > 150
ORDER BY id ASC
LIMIT 25;

Pros:

Stable results — inserts and deletes between pages do not cause skips or duplicates.
Constant performance — uses indexed WHERE clause instead of OFFSET.

Cons:

Cannot jump to an arbitrary page.
Cursor must encode enough information to recreate the query position.

Comparison¶

Offset Pagination                    Cursor Pagination
┌──────────────────────┐            ┌──────────────────────┐
│ Page 1: items 1-25   │            │ Start ──▶ cursor_A   │
│ Page 2: items 26-50  │            │ cursor_A ──▶ cursor_B│
│ Page 3: items 51-75  │            │ cursor_B ──▶ cursor_C│
│ ...                  │            │ ...                  │
│ Can jump to Page N   │            │ Must traverse in     │
│ O(N) for large       │            │ order. O(1) per page │
│ offsets              │            │                      │
└──────────────────────┘            └──────────────────────┘

Filtering¶

Use query parameters to filter resource collections:

GET /api/v1/products?category=electronics&price_min=100&price_max=500&in_stock=true

For more expressive filtering, some APIs adopt a filter syntax:

GET /api/v1/products?filter[category]=electronics&filter[price][gte]=100&filter[price][lte]=500

Or use a single filter parameter with a simple expression language:

GET /api/v1/products?filter=category eq "electronics" and price gte 100

Sorting¶

Use a sort query parameter. Prefix with - for descending order:

GET /api/v1/products?sort=-price,name

This sorts by price descending, then by name ascending.

# Python implementation sketch
from flask import request

@app.route('/api/v1/products')
def list_products():
    # Parse sorting
    sort_param = request.args.get('sort', 'id')
    order_clauses = []
    for field in sort_param.split(','):
        if field.startswith('-'):
            order_clauses.append(f"{field[1:]} DESC")
        else:
            order_clauses.append(f"{field} ASC")

    # Parse filtering
    category = request.args.get('category')
    price_min = request.args.get('price_min', type=float)
    price_max = request.args.get('price_max', type=float)

    # Parse pagination
    cursor = request.args.get('cursor')
    limit = min(request.args.get('limit', 25, type=int), 100)

    products = db.query_products(
        category=category,
        price_min=price_min,
        price_max=price_max,
        order_by=order_clauses,
        cursor=cursor,
        limit=limit + 1,  # Fetch one extra to determine has_more
    )

    has_more = len(products) > limit
    if has_more:
        products = products[:limit]

    return {
        "data": [p.to_dict() for p in products],
        "pagination": {
            "next_cursor": encode_cursor(products[-1]) if has_more else None,
            "has_more": has_more,
        }
    }

HATEOAS¶

HATEOAS (Hypermedia As The Engine Of Application State) is a REST constraint where the server includes hyperlinks in responses that tell the client what actions are available next. Instead of hardcoding URLs, clients discover them dynamically.

In practice, full HATEOAS is rarely implemented outside of enterprise APIs, but including relevant links improves discoverability and decouples clients from URL structures.

Example Response with HATEOAS Links¶

{
  "id": "order-42",
  "status": "pending",
  "total": 149.99,
  "items": [
    {
      "product_id": "prod-7",
      "name": "Mechanical Keyboard",
      "quantity": 1,
      "price": 149.99,
      "_links": {
        "product": { "href": "/api/v1/products/prod-7", "method": "GET" }
      }
    }
  ],
  "_links": {
    "self":    { "href": "/api/v1/orders/order-42", "method": "GET" },
    "cancel":  { "href": "/api/v1/orders/order-42/cancel", "method": "POST" },
    "payment": { "href": "/api/v1/orders/order-42/pay", "method": "POST" },
    "customer": { "href": "/api/v1/users/user-7", "method": "GET" }
  }
}

Once the order is confirmed, the cancel and payment links might disappear and a track link might appear — the available transitions change based on the resource's current state.

{
  "id": "order-42",
  "status": "shipped",
  "total": 149.99,
  "_links": {
    "self":  { "href": "/api/v1/orders/order-42", "method": "GET" },
    "track": { "href": "/api/v1/orders/order-42/tracking", "method": "GET" }
  }
}

Error Handling and Response Formats¶

Consistent, informative error responses are essential for a good developer experience. Clients should never have to guess what went wrong.

Principles¶

Use appropriate HTTP status codes — do not return 200 for errors.
Provide a consistent error response structure across all endpoints.
Include enough detail for the developer to understand and fix the issue.
Never leak internal details (stack traces, SQL queries, internal paths) in production.
Use machine-readable error codes in addition to human-readable messages.

Consistent Error Response Format¶

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "The request body contains invalid fields.",
    "details": [
      {
        "field": "email",
        "issue": "Must be a valid email address.",
        "value": "not-an-email"
      },
      {
        "field": "age",
        "issue": "Must be a positive integer.",
        "value": -5
      }
    ],
    "request_id": "req-a1b2c3d4",
    "documentation_url": "https://api.example.com/docs/errors#VALIDATION_ERROR"
  }
}

RFC 7807 — Problem Details for HTTP APIs¶

RFC 7807 defines a standard format for error responses. It is widely adopted and supported by many frameworks.

HTTP/1.1 403 Forbidden
Content-Type: application/problem+json

{
  "type": "https://api.example.com/errors/insufficient-funds",
  "title": "Insufficient Funds",
  "status": 403,
  "detail": "Your account balance of $10.00 is insufficient for a $25.00 purchase.",
  "instance": "/api/v1/payments/txn-789",
  "balance": 10.00,
  "required": 25.00
}

Standard fields:

Field	Required	Description
`type`	Yes	A URI reference identifying the error type (can be a docs link)
`title`	Yes	Short human-readable summary
`status`	Yes	HTTP status code
`detail`	No	Human-readable explanation specific to this occurrence
`instance`	No	A URI identifying this specific occurrence

Additional fields (like balance and required above) can be added for context.

Implementation Example¶

from flask import Flask, jsonify

app = Flask(__name__)

class APIError(Exception):
    def __init__(self, status, error_type, title, detail=None, **extra):
        self.status = status
        self.error_type = error_type
        self.title = title
        self.detail = detail
        self.extra = extra

@app.errorhandler(APIError)
def handle_api_error(error):
    response = {
        "type": f"https://api.example.com/errors/{error.error_type}",
        "title": error.title,
        "status": error.status,
    }
    if error.detail:
        response["detail"] = error.detail
    response.update(error.extra)

    return jsonify(response), error.status, {
        "Content-Type": "application/problem+json"
    }

@app.route('/api/v1/orders/<order_id>/pay', methods=['POST'])
def pay_order(order_id):
    order = db.get_order(order_id)
    account = db.get_account(request.user_id)

    if account.balance < order.total:
        raise APIError(
            status=403,
            error_type="insufficient-funds",
            title="Insufficient Funds",
            detail=f"Balance of ${account.balance:.2f} is less than "
                   f"the required ${order.total:.2f}.",
            balance=account.balance,
            required=order.total,
        )

    # Process payment ...
    return jsonify({"status": "paid"}), 200

Rate Limiting¶

Rate limiting protects APIs from abuse, ensures fair usage among clients, and prevents individual consumers from overwhelming backend services. It is a critical component of both API design and system design.

Cross-reference: For rate limiting in the context of system design, see Chapter 7.1 — System Design.

Common Algorithms¶

Token Bucket¶

A bucket holds up to N tokens. Each request consumes one token. Tokens are added at a fixed rate. When the bucket is empty, requests are rejected (or queued).

Token Bucket (capacity=5, refill_rate=1/sec)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Time 0s:  [*][*][*][*][*]  5 tokens (full)
  Request → [*][*][*][*][ ]  4 tokens (1 consumed)
  Request → [*][*][*][ ][ ]  3 tokens
  Request → [*][*][ ][ ][ ]  2 tokens
  Time 1s:  [*][*][*][ ][ ]  3 tokens (1 refilled)
  Request → [*][*][ ][ ][ ]  2 tokens
  ...
  Time 0s:  [ ][ ][ ][ ][ ]  0 tokens
  Request → REJECTED (429 Too Many Requests)

Characteristics:

Allows short bursts up to the bucket capacity.
Smooths traffic to the refill rate over time.
Simple to implement; used by AWS, Stripe, and many others.

import time

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.tokens = capacity
        self.last_refill = time.monotonic()

    def allow_request(self) -> bool:
        now = time.monotonic()
        elapsed = now - self.last_refill
        self.tokens = min(
            self.capacity,
            self.tokens + elapsed * self.refill_rate,
        )
        self.last_refill = now

        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

Sliding Window Log¶

Stores the timestamp of every request in the window. Counts requests by checking how many timestamps fall within the current window.

Sliding Window (window=60s, limit=5)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Timeline (seconds):
  0    10    20    30    40    50    60    70
  |─────|─────|─────|─────|─────|─────|─────|
  R     R     R           R     R           R?
  1     2     3           4     5
                                         ───┤
                                    At t=70: │
                              Window = [10,70]│
                          Requests in window: │
                          10, 20, 40, 50 = 4  │
                          R at 70 → ALLOWED   │
                                         ────┘

Characteristics:

Precise — no boundary effects.
Higher memory usage (stores all timestamps within the window).
Used when exact rate enforcement is required.

Sliding Window Counter¶

A memory-efficient approximation that combines fixed window counts with a weighted overlap.

Window counter estimation:

  Previous window       Current window
  (40 requests)         (15 requests so far)
  ┌────────────────┐┌───────┬────────┐
  │  60s window    ││ 25%   │  75%   │
  │  ended         ││elapsed│remaining│
  └────────────────┘└───────┴────────┘

  Estimated count = (prev_count × remaining%) + current_count
                  = (40 × 0.75) + 15
                  = 30 + 15 = 45

  If limit is 50 → ALLOWED (45 < 50)

Rate Limit Response Headers¶

Communicate rate limit status to clients using standard headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 67
X-RateLimit-Reset: 1708732800
Retry-After: 30

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed in the window
`X-RateLimit-Remaining`	Requests remaining in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets
`Retry-After`	Seconds (or date) before the client should retry (sent with 429)

When the limit is exceeded:

HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708732800

{
  "type": "https://api.example.com/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "You have exceeded 100 requests per minute. Try again in 30 seconds."
}

Rate Limiting Strategies by Scope¶

Scope	Key	Example
Per user	User ID or API key	1000 req/hour per user
Per IP	Client IP address	100 req/minute per IP
Per endpoint	Method + path	10 POST /login per minute per IP
Global	None	50,000 req/minute total for the service
Tiered	Subscription plan	Free: 100/hr, Pro: 10,000/hr, Enterprise: unlimited

Summary — Choosing the Right API Style¶

Factor	REST	GraphQL	gRPC
Client type	Browsers, third-party devs	Mobile apps, complex UIs	Internal microservices
Performance needs	Moderate	Moderate	High
Data relationships	Simple, flat	Complex, nested	Varies
Real-time	Polling / SSE / WebSocket	Subscriptions	Native streaming
Schema enforcement	Optional (OpenAPI)	Required (SDL)	Required (Protobuf)
Ecosystem maturity	Very mature	Mature	Mature
Human debuggability	High (JSON + URLs)	Medium (JSON + single endpoint)	Low (binary)

In practice, many systems use a combination: REST for public-facing APIs, GraphQL as a Backend-for-Frontend (BFF) layer aggregating multiple services, and gRPC for internal service-to-service communication where performance matters most.