API Design¶
API (Application Programming Interface) design is the discipline of defining clear, consistent, and usable interfaces through which software components communicate. A well-designed API reduces integration friction, improves developer experience, and enables systems to evolve independently. Whether you are building public-facing APIs consumed by third-party developers or internal service-to-service interfaces within a microservices architecture, the principles covered in this chapter form the foundation of robust API engineering.
RESTful API Design Principles¶
REST (Representational State Transfer) is an architectural style defined by Roy Fielding in his 2000 doctoral dissertation. RESTful APIs model the world as resources identified by URIs and manipulated through a uniform interface — the standard HTTP methods.
Core Constraints of REST¶
- Client-Server — Separation of concerns between UI and data storage.
- Stateless — Each request contains all the information the server needs; no session state is stored server-side between requests.
- Cacheable — Responses must declare themselves cacheable or non-cacheable.
- Uniform Interface — A consistent way to interact with resources (URIs + HTTP verbs + representations).
- Layered System — Intermediaries (proxies, gateways, CDNs) can be inserted transparently.
- Code on Demand (optional) — Servers can extend client functionality by transferring executable code.
Resource Naming Conventions¶
Resources are the central abstraction in REST. Good URI design makes APIs intuitive.
Rules of thumb:
| Guideline | Good | Bad |
|---|---|---|
| Use nouns, not verbs | /users |
/getUsers |
| Use plural nouns | /orders/42 |
/order/42 |
| Use lowercase with hyphens | /user-profiles |
/userProfiles, /User_Profiles |
| Nest for relationships | /users/7/orders |
/getUserOrders?userId=7 |
| Avoid deep nesting (max 2-3 levels) | /users/7/orders |
/users/7/orders/12/items/3/reviews |
| Use query params for filtering | /orders?status=shipped |
/orders/shipped |
Resource hierarchy example:
/api/v1/organizations/{orgId}
/api/v1/organizations/{orgId}/teams
/api/v1/organizations/{orgId}/teams/{teamId}
/api/v1/organizations/{orgId}/teams/{teamId}/members
HTTP Methods and Their Semantics¶
| Method | CRUD Operation | Request Body | Idempotent | Safe | Typical Status Codes |
|---|---|---|---|---|---|
GET |
Read | No | Yes | Yes | 200, 304, 404 |
POST |
Create | Yes | No | No | 201, 400, 409 |
PUT |
Full Replace | Yes | Yes | No | 200, 204, 404 |
PATCH |
Partial Update | Yes | No* | No | 200, 204, 404 |
DELETE |
Delete | Rarely | Yes | No | 200, 204, 404 |
*PATCH can be made idempotent with JSON Merge Patch (RFC 7396), but is not idempotent by default when using JSON Patch (RFC 6902) operations like "add to array."
Idempotency¶
An operation is idempotent if performing it multiple times produces the same result as performing it once. This property is critical for reliability — if a network timeout occurs after sending a request, the client can safely retry an idempotent call without side effects.
Idempotent: PUT /users/42 { "name": "Alice" } --> Always sets name to Alice
Idempotent: DELETE /users/42 --> User 42 is gone (or already gone)
NOT idempotent: POST /orders { "item": "widget" } --> Creates a NEW order each time
Idempotency keys — For non-idempotent operations (like POST), clients can pass a unique key in a header so the server can detect and deduplicate retries:
POST /payments HTTP/1.1
Idempotency-Key: 8a3b1c9e-f7d2-4e6a-b5c8-1234abcd5678
Content-Type: application/json
{ "amount": 99.99, "currency": "USD" }
The server stores the idempotency key and, if it sees the same key again, returns the original response instead of processing the payment a second time.
HTTP Status Codes¶
Using the correct status code communicates the outcome of an operation unambiguously.
Success (2xx):
| Code | Meaning | When to Use |
|---|---|---|
| 200 | OK | Successful GET, PUT, PATCH, or DELETE |
| 201 | Created | Successful POST that creates a resource |
| 202 | Accepted | Request accepted for async processing |
| 204 | No Content | Successful DELETE or PUT with no response body |
Client Error (4xx):
| Code | Meaning | When to Use |
|---|---|---|
| 400 | Bad Request | Malformed syntax, invalid parameters |
| 401 | Unauthorized | Missing or invalid authentication credentials |
| 403 | Forbidden | Authenticated but lacks permission |
| 404 | Not Found | Resource does not exist |
| 405 | Method Not Allowed | HTTP method not supported on this resource |
| 409 | Conflict | Conflicting state (e.g., duplicate creation) |
| 422 | Unprocessable Entity | Syntactically valid but semantically wrong |
| 429 | Too Many Requests | Rate limit exceeded |
Server Error (5xx):
| Code | Meaning | When to Use |
|---|---|---|
| 500 | Internal Server Error | Unexpected server failure |
| 502 | Bad Gateway | Upstream service returned invalid response |
| 503 | Service Unavailable | Server is temporarily overloaded or in maintenance |
| 504 | Gateway Timeout | Upstream service did not respond in time |
GraphQL¶
GraphQL is a query language for APIs developed by Facebook (2012, open-sourced 2015). Instead of multiple endpoints returning fixed data shapes, GraphQL exposes a single endpoint and lets the client specify exactly the data it needs.
Schema Definition Language (SDL)¶
The schema is the contract between client and server:
type User {
id: ID!
name: String!
email: String!
posts: [Post!]!
createdAt: DateTime!
}
type Post {
id: ID!
title: String!
body: String!
author: User!
comments: [Comment!]!
}
type Comment {
id: ID!
text: String!
author: User!
}
type Query {
user(id: ID!): User
users(limit: Int, offset: Int): [User!]!
post(id: ID!): Post
}
type Mutation {
createUser(input: CreateUserInput!): User!
updateUser(id: ID!, input: UpdateUserInput!): User!
deleteUser(id: ID!): Boolean!
}
input CreateUserInput {
name: String!
email: String!
}
input UpdateUserInput {
name: String
email: String
}
type Subscription {
postCreated: Post!
commentAdded(postId: ID!): Comment!
}
Queries¶
Clients request exactly the fields they need:
query GetUserWithPosts {
user(id: "42") {
name
email
posts {
title
comments {
text
author {
name
}
}
}
}
}
This solves the over-fetching problem (getting more data than needed) and the under-fetching problem (needing multiple REST calls to assemble a view).
Mutations¶
Mutations modify server-side data:
mutation CreateNewUser {
createUser(input: { name: "Alice", email: "alice@example.com" }) {
id
name
email
}
}
Subscriptions¶
Subscriptions use WebSockets to push real-time updates:
subscription OnNewComment {
commentAdded(postId: "101") {
text
author {
name
}
}
}
The N+1 Problem¶
When resolving nested relationships naively, a query for N users with their posts issues 1 query for users + N queries for posts — the classic N+1 problem.
Query: users { posts { title } }
SQL executed (naive):
SELECT * FROM users; -- 1 query
SELECT * FROM posts WHERE user_id = 1; -- +1
SELECT * FROM posts WHERE user_id = 2; -- +1
SELECT * FROM posts WHERE user_id = 3; -- +1
... -- = N+1 total
Solution — DataLoader pattern (batching + caching):
// Using Facebook's dataloader library
const DataLoader = require('dataloader');
const postLoader = new DataLoader(async (userIds) => {
// Single batched query instead of N individual ones
const posts = await db.query(
'SELECT * FROM posts WHERE user_id IN (?)', [userIds]
);
// Group posts by user_id and return in same order as userIds
const postsByUser = {};
posts.forEach(p => {
(postsByUser[p.user_id] ||= []).push(p);
});
return userIds.map(id => postsByUser[id] || []);
});
// Resolver
const resolvers = {
User: {
posts: (user) => postLoader.load(user.id),
},
};
GraphQL vs REST — Tradeoffs¶
| Aspect | REST | GraphQL |
|---|---|---|
| Endpoints | Multiple (one per resource) | Single endpoint |
| Data fetching | Fixed response shape | Client specifies exact fields |
| Over-fetching | Common | Eliminated |
| Under-fetching | Multiple round trips needed | Single request |
| Caching | HTTP caching works naturally (GET + URL) | Harder — requires client-side cache (Apollo, Relay) |
| File upload | Native multipart support | Requires extensions (multipart spec) |
| Error handling | HTTP status codes | Always 200; errors in response body |
| Versioning | URL/header versioning | Schema evolution (deprecate fields) |
| Learning curve | Low | Moderate |
| Tooling | Mature (Postman, curl) | Growing (GraphiQL, Apollo Studio) |
| Best for | Simple CRUD, public APIs, caching-heavy | Complex nested data, mobile apps, BFF |
gRPC¶
gRPC (gRPC Remote Procedure Call) is a high-performance, open-source framework developed by Google. It uses Protocol Buffers (protobuf) as its interface definition language and serialization format, and communicates over HTTP/2.
Architecture Overview¶
┌──────────────┐ HTTP/2 + Protobuf ┌──────────────┐
│ gRPC │ ──────────────────────────────▶ │ gRPC │
│ Client │ │ Server │
│ │ ◀────────────────────────────── │ │
│ (stub) │ Binary frames, multiplexed │ (service) │
└──────────────┘ └──────────────┘
│ │
│ Generated code Generated code│
▼ ▼
┌────────────┐ ┌────────────────┐
│ .proto │◀────── Shared contract ────────▶│ .proto │
│ definition │ │ definition │
└────────────┘ └────────────────┘
Protocol Buffers¶
Protocol Buffers (protobuf) are a language-neutral, platform-neutral mechanism for serializing structured data. They are smaller, faster, and more strongly typed than JSON.
Example .proto file:
syntax = "proto3";
package ecommerce;
// Service definition
service OrderService {
// Unary RPC
rpc GetOrder (GetOrderRequest) returns (Order);
// Server streaming RPC - stream order status updates
rpc TrackOrder (TrackOrderRequest) returns (stream OrderStatus);
// Client streaming RPC - upload multiple items for a bulk order
rpc CreateBulkOrder (stream OrderItem) returns (BulkOrderResponse);
// Bidirectional streaming RPC - real-time chat with support
rpc SupportChat (stream ChatMessage) returns (stream ChatMessage);
}
message GetOrderRequest {
string order_id = 1;
}
message Order {
string id = 1;
string customer_id = 2;
repeated OrderItem items = 3;
OrderStatus status = 4;
double total_amount = 5;
google.protobuf.Timestamp created_at = 6;
}
message OrderItem {
string product_id = 1;
string name = 2;
int32 quantity = 3;
double price = 4;
}
enum OrderStatus {
ORDER_STATUS_UNSPECIFIED = 0;
ORDER_STATUS_PENDING = 1;
ORDER_STATUS_CONFIRMED = 2;
ORDER_STATUS_SHIPPED = 3;
ORDER_STATUS_DELIVERED = 4;
}
message TrackOrderRequest {
string order_id = 1;
}
message BulkOrderResponse {
string order_id = 1;
int32 items_accepted = 2;
int32 items_rejected = 3;
}
message ChatMessage {
string sender = 1;
string text = 2;
google.protobuf.Timestamp timestamp = 3;
}
RPC Types¶
| Type | Client | Server | Use Case |
|---|---|---|---|
| Unary | 1 request | 1 response | Standard request/response (like REST) |
| Server streaming | 1 request | N responses | Real-time feeds, large data downloads |
| Client streaming | N requests | 1 response | File uploads, telemetry ingestion |
| Bidirectional streaming | N requests | N responses | Chat, collaborative editing |
Python gRPC Server Example¶
import grpc
from concurrent import futures
import order_pb2
import order_pb2_grpc
class OrderServiceServicer(order_pb2_grpc.OrderServiceServicer):
def GetOrder(self, request, context):
# Look up order from database
order = db.get_order(request.order_id)
if not order:
context.set_code(grpc.StatusCode.NOT_FOUND)
context.set_details(f"Order {request.order_id} not found")
return order_pb2.Order()
return order_pb2.Order(
id=order.id,
customer_id=order.customer_id,
total_amount=order.total,
status=order_pb2.ORDER_STATUS_CONFIRMED,
)
def TrackOrder(self, request, context):
"""Server streaming — pushes status updates to client."""
for status_update in db.stream_order_updates(request.order_id):
yield order_pb2.OrderStatus(
status=status_update.status,
)
def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
order_pb2_grpc.add_OrderServiceServicer_to_server(
OrderServiceServicer(), server
)
server.add_insecure_port("[::]:50051")
server.start()
server.wait_for_termination()
if __name__ == "__main__":
serve()
gRPC vs REST¶
| Aspect | gRPC | REST |
|---|---|---|
| Protocol | HTTP/2 | HTTP/1.1 or HTTP/2 |
| Serialization | Protobuf (binary) | JSON (text) |
| Performance | Very high (small payloads, multiplexing) | Good (larger payloads, human-readable) |
| Streaming | Native (4 types) | Requires WebSockets or SSE |
| Code generation | Built-in from .proto |
Optional (OpenAPI codegen) |
| Browser support | Limited (requires gRPC-Web proxy) | Native |
| Human readability | Low (binary) | High (JSON) |
| Contract | Strict (protobuf schema) | Loose (optional OpenAPI spec) |
| Best for | Microservice-to-microservice, low latency | Public APIs, browser clients, broad compatibility |
API Versioning Strategies¶
APIs evolve over time. Breaking changes — removing fields, renaming endpoints, changing response shapes — require versioning so existing clients continue to function.
Strategy Comparison¶
| Strategy | Example | Pros | Cons |
|---|---|---|---|
| URI Path | /api/v1/users |
Simple, explicit, easy to route | Pollutes URI space, duplicates controllers |
| Query Parameter | /api/users?version=1 |
Easy to default to latest | Easy to forget, less visible |
| Custom Header | X-API-Version: 1 |
Clean URIs, flexible | Hidden from URL, harder to test in browser |
| Accept Header (Content Negotiation) | Accept: application/vnd.myapi.v1+json |
Standards-based, clean URIs | Complex, unfamiliar to many developers |
| No Versioning (Schema Evolution) | Additive changes only | Simplest, no version management | Limits what changes you can make |
URI Path Versioning (Most Common)¶
GET /api/v1/users/42 --> Returns v1 response shape
GET /api/v2/users/42 --> Returns v2 response shape (e.g., split name into first/last)
# Flask example with URI versioning
from flask import Flask, Blueprint
app = Flask(__name__)
v1 = Blueprint('v1', __name__, url_prefix='/api/v1')
v2 = Blueprint('v2', __name__, url_prefix='/api/v2')
@v1.route('/users/<int:user_id>')
def get_user_v1(user_id):
user = db.get_user(user_id)
return {"id": user.id, "name": user.full_name}
@v2.route('/users/<int:user_id>')
def get_user_v2(user_id):
user = db.get_user(user_id)
return {
"id": user.id,
"first_name": user.first_name,
"last_name": user.last_name,
"email": user.email,
}
app.register_blueprint(v1)
app.register_blueprint(v2)
Header Versioning¶
GET /api/users/42 HTTP/1.1
Host: api.example.com
X-API-Version: 2
Accept: application/json
Best Practices for Versioning¶
- Prefer additive, non-breaking changes whenever possible — adding new fields to a response, adding optional query parameters, or adding new endpoints does not require a new version.
- Deprecate before removing. Announce deprecation with a
Sunsetheader (RFC 8594) and give clients a migration window. - Support at most 2-3 active versions to keep maintenance manageable.
- Document a clear deprecation policy — e.g., "Each major version is supported for 18 months after the next version is released."
HTTP/1.1 200 OK
Sunset: Sat, 01 Mar 2026 00:00:00 GMT
Deprecation: true
Link: <https://api.example.com/docs/migration-v1-to-v2>; rel="deprecation"
API Documentation¶
Good documentation is the difference between an API that developers love and one they avoid. The industry standard is the OpenAPI Specification (formerly Swagger).
OpenAPI / Swagger¶
OpenAPI is a specification format (YAML or JSON) that describes REST APIs in a machine-readable way. Tools like Swagger UI, Redoc, and Stoplight render it into interactive documentation.
┌──────────────┐ ┌───────────────────┐ ┌──────────────────┐
│ openapi.yaml│───────▶│ Swagger UI / │──────▶│ Interactive │
│ (spec file) │ │ Redoc / Stoplight│ │ Documentation │
└──────────────┘ └───────────────────┘ └──────────────────┘
│
│ Also used for:
├──▶ Client SDK generation (openapi-generator)
├──▶ Server stub generation
├──▶ Contract testing
└──▶ Mock servers
Example OpenAPI Spec Snippet¶
openapi: 3.0.3
info:
title: Bookstore API
description: API for managing a bookstore catalog
version: 1.0.0
contact:
name: API Support
email: api@bookstore.com
servers:
- url: https://api.bookstore.com/v1
description: Production
- url: https://staging-api.bookstore.com/v1
description: Staging
paths:
/books:
get:
summary: List all books
operationId: listBooks
tags:
- Books
parameters:
- name: genre
in: query
schema:
type: string
enum: [fiction, non-fiction, science, history]
- name: limit
in: query
schema:
type: integer
default: 20
maximum: 100
- name: cursor
in: query
schema:
type: string
description: Cursor for pagination
responses:
'200':
description: A paginated list of books
content:
application/json:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/Book'
pagination:
$ref: '#/components/schemas/CursorPagination'
'400':
description: Invalid query parameters
content:
application/json:
schema:
$ref: '#/components/schemas/ProblemDetails'
post:
summary: Create a new book
operationId: createBook
tags:
- Books
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/CreateBookRequest'
responses:
'201':
description: Book created successfully
content:
application/json:
schema:
$ref: '#/components/schemas/Book'
headers:
Location:
schema:
type: string
description: URI of the newly created book
'400':
$ref: '#/components/responses/BadRequest'
'409':
description: Book with same ISBN already exists
components:
schemas:
Book:
type: object
required: [id, title, author, isbn]
properties:
id:
type: string
format: uuid
title:
type: string
example: "The Pragmatic Programmer"
author:
type: string
example: "David Thomas, Andrew Hunt"
isbn:
type: string
pattern: '^\d{3}-\d{10}$'
example: "978-0135957059"
genre:
type: string
published_date:
type: string
format: date
price:
type: number
format: float
CreateBookRequest:
type: object
required: [title, author, isbn]
properties:
title:
type: string
author:
type: string
isbn:
type: string
genre:
type: string
price:
type: number
CursorPagination:
type: object
properties:
next_cursor:
type: string
nullable: true
has_more:
type: boolean
ProblemDetails:
type: object
properties:
type:
type: string
format: uri
title:
type: string
status:
type: integer
detail:
type: string
instance:
type: string
format: uri
Pagination, Filtering, and Sorting Patterns¶
Any API that returns collections of resources needs strategies for pagination, filtering, and sorting to keep responses manageable and performant.
Offset-Based Pagination¶
The simplest approach: specify a page number (or offset) and a page size.
GET /api/v1/products?page=3&page_size=25
{
"data": [ ... ],
"pagination": {
"page": 3,
"page_size": 25,
"total_items": 1042,
"total_pages": 42
}
}
Pros:
- Simple to implement and understand.
- Allows jumping to any page directly.
- Easy to display "Page X of Y" in UIs.
Cons:
- Inconsistent results when data changes between requests — items can be skipped or duplicated if rows are inserted or deleted.
- Poor performance on large offsets —
OFFSET 100000still scans and discards 100,000 rows in most databases.
Cursor-Based Pagination¶
Uses an opaque cursor (often a base64-encoded identifier or timestamp) to mark the position in the result set.
GET /api/v1/products?limit=25&cursor=eyJpZCI6MTUwfQ==
{
"data": [ ... ],
"pagination": {
"next_cursor": "eyJpZCI6MTc1fQ==",
"has_more": true
}
}
Behind the scenes, the cursor decodes to {"id": 150}, and the query becomes:
SELECT * FROM products
WHERE id > 150
ORDER BY id ASC
LIMIT 25;
Pros:
- Stable results — inserts and deletes between pages do not cause skips or duplicates.
- Constant performance — uses indexed
WHEREclause instead ofOFFSET.
Cons:
- Cannot jump to an arbitrary page.
- Cursor must encode enough information to recreate the query position.
Comparison¶
Offset Pagination Cursor Pagination
┌──────────────────────┐ ┌──────────────────────┐
│ Page 1: items 1-25 │ │ Start ──▶ cursor_A │
│ Page 2: items 26-50 │ │ cursor_A ──▶ cursor_B│
│ Page 3: items 51-75 │ │ cursor_B ──▶ cursor_C│
│ ... │ │ ... │
│ Can jump to Page N │ │ Must traverse in │
│ O(N) for large │ │ order. O(1) per page │
│ offsets │ │ │
└──────────────────────┘ └──────────────────────┘
Filtering¶
Use query parameters to filter resource collections:
GET /api/v1/products?category=electronics&price_min=100&price_max=500&in_stock=true
For more expressive filtering, some APIs adopt a filter syntax:
GET /api/v1/products?filter[category]=electronics&filter[price][gte]=100&filter[price][lte]=500
Or use a single filter parameter with a simple expression language:
GET /api/v1/products?filter=category eq "electronics" and price gte 100
Sorting¶
Use a sort query parameter. Prefix with - for descending order:
GET /api/v1/products?sort=-price,name
This sorts by price descending, then by name ascending.
# Python implementation sketch
from flask import request
@app.route('/api/v1/products')
def list_products():
# Parse sorting
sort_param = request.args.get('sort', 'id')
order_clauses = []
for field in sort_param.split(','):
if field.startswith('-'):
order_clauses.append(f"{field[1:]} DESC")
else:
order_clauses.append(f"{field} ASC")
# Parse filtering
category = request.args.get('category')
price_min = request.args.get('price_min', type=float)
price_max = request.args.get('price_max', type=float)
# Parse pagination
cursor = request.args.get('cursor')
limit = min(request.args.get('limit', 25, type=int), 100)
products = db.query_products(
category=category,
price_min=price_min,
price_max=price_max,
order_by=order_clauses,
cursor=cursor,
limit=limit + 1, # Fetch one extra to determine has_more
)
has_more = len(products) > limit
if has_more:
products = products[:limit]
return {
"data": [p.to_dict() for p in products],
"pagination": {
"next_cursor": encode_cursor(products[-1]) if has_more else None,
"has_more": has_more,
}
}
HATEOAS¶
HATEOAS (Hypermedia As The Engine Of Application State) is a REST constraint where the server includes hyperlinks in responses that tell the client what actions are available next. Instead of hardcoding URLs, clients discover them dynamically.
In practice, full HATEOAS is rarely implemented outside of enterprise APIs, but including relevant links improves discoverability and decouples clients from URL structures.
Example Response with HATEOAS Links¶
{
"id": "order-42",
"status": "pending",
"total": 149.99,
"items": [
{
"product_id": "prod-7",
"name": "Mechanical Keyboard",
"quantity": 1,
"price": 149.99,
"_links": {
"product": { "href": "/api/v1/products/prod-7", "method": "GET" }
}
}
],
"_links": {
"self": { "href": "/api/v1/orders/order-42", "method": "GET" },
"cancel": { "href": "/api/v1/orders/order-42/cancel", "method": "POST" },
"payment": { "href": "/api/v1/orders/order-42/pay", "method": "POST" },
"customer": { "href": "/api/v1/users/user-7", "method": "GET" }
}
}
Once the order is confirmed, the cancel and payment links might disappear and a track link might appear — the available transitions change based on the resource's current state.
{
"id": "order-42",
"status": "shipped",
"total": 149.99,
"_links": {
"self": { "href": "/api/v1/orders/order-42", "method": "GET" },
"track": { "href": "/api/v1/orders/order-42/tracking", "method": "GET" }
}
}
Error Handling and Response Formats¶
Consistent, informative error responses are essential for a good developer experience. Clients should never have to guess what went wrong.
Principles¶
- Use appropriate HTTP status codes — do not return 200 for errors.
- Provide a consistent error response structure across all endpoints.
- Include enough detail for the developer to understand and fix the issue.
- Never leak internal details (stack traces, SQL queries, internal paths) in production.
- Use machine-readable error codes in addition to human-readable messages.
Consistent Error Response Format¶
{
"error": {
"code": "VALIDATION_ERROR",
"message": "The request body contains invalid fields.",
"details": [
{
"field": "email",
"issue": "Must be a valid email address.",
"value": "not-an-email"
},
{
"field": "age",
"issue": "Must be a positive integer.",
"value": -5
}
],
"request_id": "req-a1b2c3d4",
"documentation_url": "https://api.example.com/docs/errors#VALIDATION_ERROR"
}
}
RFC 7807 — Problem Details for HTTP APIs¶
RFC 7807 defines a standard format for error responses. It is widely adopted and supported by many frameworks.
HTTP/1.1 403 Forbidden
Content-Type: application/problem+json
{
"type": "https://api.example.com/errors/insufficient-funds",
"title": "Insufficient Funds",
"status": 403,
"detail": "Your account balance of $10.00 is insufficient for a $25.00 purchase.",
"instance": "/api/v1/payments/txn-789",
"balance": 10.00,
"required": 25.00
}
Standard fields:
| Field | Required | Description |
|---|---|---|
type |
Yes | A URI reference identifying the error type (can be a docs link) |
title |
Yes | Short human-readable summary |
status |
Yes | HTTP status code |
detail |
No | Human-readable explanation specific to this occurrence |
instance |
No | A URI identifying this specific occurrence |
Additional fields (like balance and required above) can be added for context.
Implementation Example¶
from flask import Flask, jsonify
app = Flask(__name__)
class APIError(Exception):
def __init__(self, status, error_type, title, detail=None, **extra):
self.status = status
self.error_type = error_type
self.title = title
self.detail = detail
self.extra = extra
@app.errorhandler(APIError)
def handle_api_error(error):
response = {
"type": f"https://api.example.com/errors/{error.error_type}",
"title": error.title,
"status": error.status,
}
if error.detail:
response["detail"] = error.detail
response.update(error.extra)
return jsonify(response), error.status, {
"Content-Type": "application/problem+json"
}
@app.route('/api/v1/orders/<order_id>/pay', methods=['POST'])
def pay_order(order_id):
order = db.get_order(order_id)
account = db.get_account(request.user_id)
if account.balance < order.total:
raise APIError(
status=403,
error_type="insufficient-funds",
title="Insufficient Funds",
detail=f"Balance of ${account.balance:.2f} is less than "
f"the required ${order.total:.2f}.",
balance=account.balance,
required=order.total,
)
# Process payment ...
return jsonify({"status": "paid"}), 200
Rate Limiting¶
Rate limiting protects APIs from abuse, ensures fair usage among clients, and prevents individual consumers from overwhelming backend services. It is a critical component of both API design and system design.
Cross-reference: For rate limiting in the context of system design, see Chapter 7.1 — System Design.
Common Algorithms¶
Token Bucket¶
A bucket holds up to N tokens. Each request consumes one token. Tokens are added at a fixed rate. When the bucket is empty, requests are rejected (or queued).
Token Bucket (capacity=5, refill_rate=1/sec)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Time 0s: [*][*][*][*][*] 5 tokens (full)
Request → [*][*][*][*][ ] 4 tokens (1 consumed)
Request → [*][*][*][ ][ ] 3 tokens
Request → [*][*][ ][ ][ ] 2 tokens
Time 1s: [*][*][*][ ][ ] 3 tokens (1 refilled)
Request → [*][*][ ][ ][ ] 2 tokens
...
Time 0s: [ ][ ][ ][ ][ ] 0 tokens
Request → REJECTED (429 Too Many Requests)
Characteristics:
- Allows short bursts up to the bucket capacity.
- Smooths traffic to the refill rate over time.
- Simple to implement; used by AWS, Stripe, and many others.
import time
class TokenBucket:
def __init__(self, capacity: int, refill_rate: float):
self.capacity = capacity
self.refill_rate = refill_rate # tokens per second
self.tokens = capacity
self.last_refill = time.monotonic()
def allow_request(self) -> bool:
now = time.monotonic()
elapsed = now - self.last_refill
self.tokens = min(
self.capacity,
self.tokens + elapsed * self.refill_rate,
)
self.last_refill = now
if self.tokens >= 1:
self.tokens -= 1
return True
return False
Sliding Window Log¶
Stores the timestamp of every request in the window. Counts requests by checking how many timestamps fall within the current window.
Sliding Window (window=60s, limit=5)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Timeline (seconds):
0 10 20 30 40 50 60 70
|─────|─────|─────|─────|─────|─────|─────|
R R R R R R?
1 2 3 4 5
───┤
At t=70: │
Window = [10,70]│
Requests in window: │
10, 20, 40, 50 = 4 │
R at 70 → ALLOWED │
────┘
Characteristics:
- Precise — no boundary effects.
- Higher memory usage (stores all timestamps within the window).
- Used when exact rate enforcement is required.
Sliding Window Counter¶
A memory-efficient approximation that combines fixed window counts with a weighted overlap.
Window counter estimation:
Previous window Current window
(40 requests) (15 requests so far)
┌────────────────┐┌───────┬────────┐
│ 60s window ││ 25% │ 75% │
│ ended ││elapsed│remaining│
└────────────────┘└───────┴────────┘
Estimated count = (prev_count × remaining%) + current_count
= (40 × 0.75) + 15
= 30 + 15 = 45
If limit is 50 → ALLOWED (45 < 50)
Rate Limit Response Headers¶
Communicate rate limit status to clients using standard headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 67
X-RateLimit-Reset: 1708732800
Retry-After: 30
| Header | Description |
|---|---|
X-RateLimit-Limit |
Maximum requests allowed in the window |
X-RateLimit-Remaining |
Requests remaining in the current window |
X-RateLimit-Reset |
Unix timestamp when the window resets |
Retry-After |
Seconds (or date) before the client should retry (sent with 429) |
When the limit is exceeded:
HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1708732800
{
"type": "https://api.example.com/errors/rate-limit-exceeded",
"title": "Rate Limit Exceeded",
"status": 429,
"detail": "You have exceeded 100 requests per minute. Try again in 30 seconds."
}
Rate Limiting Strategies by Scope¶
| Scope | Key | Example |
|---|---|---|
| Per user | User ID or API key | 1000 req/hour per user |
| Per IP | Client IP address | 100 req/minute per IP |
| Per endpoint | Method + path | 10 POST /login per minute per IP |
| Global | None | 50,000 req/minute total for the service |
| Tiered | Subscription plan | Free: 100/hr, Pro: 10,000/hr, Enterprise: unlimited |
Summary — Choosing the Right API Style¶
| Factor | REST | GraphQL | gRPC |
|---|---|---|---|
| Client type | Browsers, third-party devs | Mobile apps, complex UIs | Internal microservices |
| Performance needs | Moderate | Moderate | High |
| Data relationships | Simple, flat | Complex, nested | Varies |
| Real-time | Polling / SSE / WebSocket | Subscriptions | Native streaming |
| Schema enforcement | Optional (OpenAPI) | Required (SDL) | Required (Protobuf) |
| Ecosystem maturity | Very mature | Mature | Mature |
| Human debuggability | High (JSON + URLs) | Medium (JSON + single endpoint) | Low (binary) |
In practice, many systems use a combination: REST for public-facing APIs, GraphQL as a Backend-for-Frontend (BFF) layer aggregating multiple services, and gRPC for internal service-to-service communication where performance matters most.