Designing APIs That Last — Principles From 10 Years of Breaking Things
by Eric Hanson, Backend Developer at Clean Systems Consulting
The cost of a breaking change
A breaking change to a public API is a support incident. Clients that integrated against the original contract — mobile apps in app stores, third-party tools, automated workflows — break silently or noisily. The engineering cost of maintaining a backward-compatible migration path is always higher than it seemed when the original decision was made.
Every API design decision is a contract negotiation. The decisions made at launch are the ones that will be maintained for years. Making them deliberately, with the understanding that reverting them is expensive, is the discipline that distinguishes APIs that age well from those that don't.
Model resources, not operations
The most common API design mistake: designing around operations rather than resources. An operation-centric API has endpoints named after verbs:
POST /getUser ← not REST, not predictable
POST /createOrder
POST /cancelOrder
POST /updateShipping
POST /checkInventory
A resource-centric API has endpoints named after nouns, with HTTP methods expressing the operation:
GET /users/{id} ← get a user
POST /orders ← create an order
PATCH /orders/{id} ← update an order (partial)
DELETE /orders/{id}/cancellation ← cancel an order (explicit sub-resource)
GET /products/{id}/inventory ← check inventory for a product
The resource model is predictable — if you know the resource exists, you can guess the endpoint. /orders/{id} for any order operation, /users/{id} for any user operation. New engineers on the team understand the API structure without reading documentation.
The tricky cases are operations that don't map cleanly to CRUD: "publish an article," "activate an account," "merge two records." Model these as sub-resources or state transitions:
POST /articles/{id}/publications ← publish the article
POST /accounts/{id}/activation ← activate the account
POST /contacts/{id}/merges ← merge with another contact (body contains target ID)
The alternative — PATCH /articles/{id} with {"status": "published"} — also works but is less explicit. Either is defensible; what matters is consistency across the API.
Design for the response, not the database
API responses should reflect the client's needs, not the database schema. The mistake: serializing the database entity directly.
// Exposing the database schema as the API contract
@GetMapping("/orders/{id}")
public Order getOrder(@PathVariable Long id) {
return orderRepository.findById(id).orElseThrow();
// Returns: id, userId, addressId, paymentMethodId, createdAt, updatedAt, version, deletedAt...
// Foreign keys, internal columns, and all fields regardless of relevance
}
When the database schema changes — adding an internal column, renaming a field, normalizing a relationship — the API contract breaks automatically because they're the same object.
Explicit response objects decouple the API contract from the storage layer:
public record OrderResponse(
String id,
OrderStatus status,
MoneyAmount total,
CustomerSummary customer, // embedded, not a foreign key
List<LineItemSummary> items,
Instant createdAt
) {
public static OrderResponse from(Order order, User user, List<LineItem> items) {
return new OrderResponse(
order.getPublicId(),
order.getStatus(),
MoneyAmount.from(order.getTotal()),
CustomerSummary.from(user),
items.stream().map(LineItemSummary::from).toList(),
order.getCreatedAt()
);
}
}
The API response includes a customer object rather than a userId foreign key — clients don't have to make a second request to get the customer's name. This is the resource embedding principle: include related data that clients predictably need in the same response, rather than requiring multiple round trips.
Consistent error shape — always
Inconsistent error responses are the most frustrating API quality issue for integrators. When some endpoints return {"error": "not found"}, others return {"message": "User not found"}, and others return a 500 with an HTML stack trace, clients must handle each case differently.
A single error format, used everywhere:
{
"errors": [
{
"code": "validation_failed",
"message": "The request body contains invalid data",
"field": "email",
"detail": "Must be a valid email address"
}
],
"requestId": "abc123-def456"
}
Design decisions:
errorsis always an array — batch operations can fail with multiple errors; single operations return one-element arrays. Clients always handle an array, never special-case single vs multiple.codeis a stable string — machine-readable, never changes. Clients switch oncode.messageis human-readable — may change wording, not machine-reliable. For display only.requestId— present in every response (success and error). Clients include it in support requests. You find it in logs immediately.
The code values form an implicit contract — once you use "order_not_found" in a 404 response, changing it to "not_found" breaks clients that switch on the code. Document them in the OpenAPI spec and treat them as part of the versioned contract.
Pagination that doesn't surprise people
Three pagination patterns exist: offset-based, cursor-based, and keyset. Each has different trade-offs.
Offset pagination (?page=2&limit=20) is familiar and easy to implement:
{
"data": [...],
"pagination": {
"total": 1547,
"page": 2,
"limit": 20,
"hasNext": true
}
}
The problems: performance degrades as offset grows (the database skips rows), and results shift when items are added or deleted between page requests (page 2 may show items already seen on page 1). Appropriate for small-to-medium datasets where total count is displayed.
Cursor pagination uses an opaque cursor pointing to a position in the result set:
{
"data": [...],
"pagination": {
"nextCursor": "eyJpZCI6IDEyMywgImNyZWF0ZWRBdCI6ICIyMDI2LTAxLTAxIn0=",
"hasNext": true
}
}
The cursor encodes the last item's identifying values (base64 encoded JSON is common). The next page query uses the cursor to resume from exactly that position — no skipping rows, stable results even as data changes. Appropriate for feeds, activity streams, and any dataset where total count doesn't matter.
Which to choose: start with cursor pagination for any feed-like resource (orders, transactions, activity). Use offset only when clients genuinely need to jump to a specific page (search results with page numbers displayed).
Whatever you choose: include hasNext always — clients need to know whether to show "load more." Never make clients detect the end by receiving an empty response.
Timestamps in ISO 8601 UTC, IDs as strings
Two decisions that cause migrations when gotten wrong:
Timestamps. Return all timestamps as ISO 8601 strings in UTC: "2026-04-17T14:30:00Z". Unix timestamps as integers are compact but require conversion, are ambiguous (milliseconds or seconds?), and are unreadable in logs and debugging tools. ISO 8601 is unambiguous, universally parseable, and self-documenting.
{
"createdAt": "2026-04-17T14:30:00.123Z", ← always UTC, always ISO 8601
"updatedAt": "2026-04-17T15:45:22.456Z"
}
IDs as strings. Return IDs as strings, even if the underlying type is a 64-bit integer:
{
"id": "1234567890123456789" ← string, not integer
}
JavaScript's Number type has 53 bits of integer precision. A 64-bit integer ID like 1234567890123456789 loses precision when parsed as a JavaScript Number — it becomes 1234567890123456800. Clients storing IDs as numbers silently corrupt them. Returning IDs as strings sidesteps this entirely. The database uses BIGINT internally; the API returns strings. No migration required later when IDs grow large enough to overflow JavaScript numbers.
Idempotency keys — safe retry for non-idempotent operations
POST /orders creates an order. If the client's request times out before receiving a response, they don't know whether the order was created. Retrying creates a duplicate.
Idempotency keys allow clients to safely retry:
POST /orders
Idempotency-Key: 7f8c9d2e-4a1b-4e3f-9c8d-2a1b3c4d5e6f
The server stores the key and the response. If the same key appears again, return the stored response — no second order is created.
@PostMapping("/orders")
public ResponseEntity<OrderResponse> createOrder(
@RequestBody @Valid CreateOrderRequest request,
@RequestHeader(value = "Idempotency-Key", required = false) String idempotencyKey) {
if (idempotencyKey != null) {
Optional<IdempotentResponse> cached = idempotencyCache.get(idempotencyKey);
if (cached.isPresent()) {
return ResponseEntity
.status(cached.get().status())
.body(cached.get().body());
}
}
Order order = orderService.createOrder(request);
OrderResponse response = OrderResponse.from(order);
if (idempotencyKey != null) {
idempotencyCache.store(idempotencyKey, HttpStatus.CREATED, response);
}
return ResponseEntity.status(HttpStatus.CREATED).body(response);
}
Document idempotency key support in the API spec. For financial operations, make it mandatory. For clients using your API in unreliable network conditions (mobile, IoT), it's the difference between a good integration and a frustrating one.
Null vs absent — be explicit
The ambiguity between null and absent field:
{"couponCode": null} ← the coupon code field exists and has no value
{"couponCode": ""} ← the coupon code field exists and is an empty string
{} ← the coupon code field wasn't provided
These mean different things semantically. Define the semantics explicitly in your API and document them:
- Absent field: the client isn't providing a value for this field — don't change it (for PATCH semantics)
- Explicit null: the client is explicitly clearing this field
- Empty string: the client is providing an empty string value (usually semantically equivalent to null — define which)
For PATCH endpoints where you need to distinguish "don't change this field" from "set this field to null," JSON Merge Patch (application/merge-patch+json) handles this correctly:
PATCH /users/123
Content-Type: application/merge-patch+json
{
"displayName": "Alice",
"couponCode": null ← explicitly clearing the coupon
}
// bio is absent — don't change it
Define your PATCH semantics in documentation. Whatever you choose, be consistent across the API — inconsistency is what confuses integrators.
Rate limit headers on every response
Clients can't throttle themselves without knowing their current rate limit state. Include rate limit headers on every response:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 743
X-RateLimit-Reset: 1713360000
Retry-After: 30 ← only on 429 responses
X-RateLimit-Reset as a Unix timestamp (seconds since epoch) tells clients exactly when their limit resets — they can calculate the appropriate backoff without guessing.
The backward-compatibility rules that prevent version bumps
Most breaking changes are avoidable with these rules:
Always return new fields in responses — clients that ignore unknown fields (and good clients always do) are unaffected. New fields are non-breaking.
Never remove or rename fields without a version bump. A client that reads order.customerName breaks when the field becomes order.customer.name.
Never change a field's type — "total": 100 (integer) becoming "total": "100.00" (string) breaks clients that parse it as a number.
Never change a field's semantics — "status": "active" meaning something different in v2 while keeping the same field name is a semantic breaking change that unit tests won't catch.
Make new required fields optional on the way in — if adding a required field to a request body, make it optional with a default for a transition period before making it required.
Keep enum values additive — add new values, never remove existing ones. Clients should handle unknown enum values gracefully (default case in switch statements), but removing a value they're currently sending or receiving is a breaking change.
These rules aren't difficult. They require discipline to follow consistently when making changes under deadline pressure. The cost of an undisciplined change is paid by integrators, not the team that made it.
The API review before launch
Before an API endpoint goes live, the questions worth asking:
- Is the URL a resource noun, not an operation verb?
- Does the response reflect client needs, not the database schema?
- Does the error response use the standard error shape?
- Are timestamps in ISO 8601 UTC?
- Are IDs returned as strings?
- Does the endpoint include rate limit headers?
- For POST operations: is idempotency supported?
- Are all field names consistent with the existing API vocabulary?
- Is pagination consistent with the rest of the API?
- Are the error codes documented in the spec?
These aren't rules for their own sake. Each one exists because violating it has caused a migration, a support escalation, or an integration incident — in some API, for some team, at some point. The discipline is cheaper than the cleanup.