Designing APIs That Last — Principles From 10 Years of Breaking Things

by Eric Hanson, Backend Developer at Clean Systems Consulting

The cost of a breaking change

A breaking change to a public API is a support incident. Clients that integrated against the original contract — mobile apps in app stores, third-party tools, automated workflows — break silently or noisily. The engineering cost of maintaining a backward-compatible migration path is always higher than it seemed when the original decision was made.

Every API design decision is a contract negotiation. The decisions made at launch are the ones that will be maintained for years. Making them deliberately, with the understanding that reverting them is expensive, is the discipline that distinguishes APIs that age well from those that don't.

Model resources, not operations

The most common API design mistake: designing around operations rather than resources. An operation-centric API has endpoints named after verbs:

POST /getUser          ← not REST, not predictable
POST /createOrder
POST /cancelOrder
POST /updateShipping
POST /checkInventory

A resource-centric API has endpoints named after nouns, with HTTP methods expressing the operation:

GET    /users/{id}              ← get a user
POST   /orders                  ← create an order
PATCH  /orders/{id}             ← update an order (partial)
DELETE /orders/{id}/cancellation ← cancel an order (explicit sub-resource)
GET    /products/{id}/inventory  ← check inventory for a product

The resource model is predictable — if you know the resource exists, you can guess the endpoint. /orders/{id} for any order operation, /users/{id} for any user operation. New engineers on the team understand the API structure without reading documentation.

The tricky cases are operations that don't map cleanly to CRUD: "publish an article," "activate an account," "merge two records." Model these as sub-resources or state transitions:

POST /articles/{id}/publications   ← publish the article
POST /accounts/{id}/activation     ← activate the account
POST /contacts/{id}/merges         ← merge with another contact (body contains target ID)

The alternative — PATCH /articles/{id} with {"status": "published"} — also works but is less explicit. Either is defensible; what matters is consistency across the API.

Design for the response, not the database

API responses should reflect the client's needs, not the database schema. The mistake: serializing the database entity directly.

// Exposing the database schema as the API contract
@GetMapping("/orders/{id}")
public Order getOrder(@PathVariable Long id) {
    return orderRepository.findById(id).orElseThrow();
    // Returns: id, userId, addressId, paymentMethodId, createdAt, updatedAt, version, deletedAt...
    // Foreign keys, internal columns, and all fields regardless of relevance
}

When the database schema changes — adding an internal column, renaming a field, normalizing a relationship — the API contract breaks automatically because they're the same object.

Explicit response objects decouple the API contract from the storage layer:

public record OrderResponse(
    String id,
    OrderStatus status,
    MoneyAmount total,
    CustomerSummary customer,    // embedded, not a foreign key
    List<LineItemSummary> items,
    Instant createdAt
) {
    public static OrderResponse from(Order order, User user, List<LineItem> items) {
        return new OrderResponse(
            order.getPublicId(),
            order.getStatus(),
            MoneyAmount.from(order.getTotal()),
            CustomerSummary.from(user),
            items.stream().map(LineItemSummary::from).toList(),
            order.getCreatedAt()
        );
    }
}

The API response includes a customer object rather than a userId foreign key — clients don't have to make a second request to get the customer's name. This is the resource embedding principle: include related data that clients predictably need in the same response, rather than requiring multiple round trips.

Consistent error shape — always

Inconsistent error responses are the most frustrating API quality issue for integrators. When some endpoints return {"error": "not found"}, others return {"message": "User not found"}, and others return a 500 with an HTML stack trace, clients must handle each case differently.

A single error format, used everywhere:

{
  "errors": [
    {
      "code": "validation_failed",
      "message": "The request body contains invalid data",
      "field": "email",
      "detail": "Must be a valid email address"
    }
  ],
  "requestId": "abc123-def456"
}

Design decisions:

  • errors is always an array — batch operations can fail with multiple errors; single operations return one-element arrays. Clients always handle an array, never special-case single vs multiple.
  • code is a stable string — machine-readable, never changes. Clients switch on code.
  • message is human-readable — may change wording, not machine-reliable. For display only.
  • requestId — present in every response (success and error). Clients include it in support requests. You find it in logs immediately.

The code values form an implicit contract — once you use "order_not_found" in a 404 response, changing it to "not_found" breaks clients that switch on the code. Document them in the OpenAPI spec and treat them as part of the versioned contract.

Pagination that doesn't surprise people

Three pagination patterns exist: offset-based, cursor-based, and keyset. Each has different trade-offs.

Offset pagination (?page=2&limit=20) is familiar and easy to implement:

{
  "data": [...],
  "pagination": {
    "total": 1547,
    "page": 2,
    "limit": 20,
    "hasNext": true
  }
}

The problems: performance degrades as offset grows (the database skips rows), and results shift when items are added or deleted between page requests (page 2 may show items already seen on page 1). Appropriate for small-to-medium datasets where total count is displayed.

Cursor pagination uses an opaque cursor pointing to a position in the result set:

{
  "data": [...],
  "pagination": {
    "nextCursor": "eyJpZCI6IDEyMywgImNyZWF0ZWRBdCI6ICIyMDI2LTAxLTAxIn0=",
    "hasNext": true
  }
}

The cursor encodes the last item's identifying values (base64 encoded JSON is common). The next page query uses the cursor to resume from exactly that position — no skipping rows, stable results even as data changes. Appropriate for feeds, activity streams, and any dataset where total count doesn't matter.

Which to choose: start with cursor pagination for any feed-like resource (orders, transactions, activity). Use offset only when clients genuinely need to jump to a specific page (search results with page numbers displayed).

Whatever you choose: include hasNext always — clients need to know whether to show "load more." Never make clients detect the end by receiving an empty response.

Timestamps in ISO 8601 UTC, IDs as strings

Two decisions that cause migrations when gotten wrong:

Timestamps. Return all timestamps as ISO 8601 strings in UTC: "2026-04-17T14:30:00Z". Unix timestamps as integers are compact but require conversion, are ambiguous (milliseconds or seconds?), and are unreadable in logs and debugging tools. ISO 8601 is unambiguous, universally parseable, and self-documenting.

{
  "createdAt": "2026-04-17T14:30:00.123Z",   ← always UTC, always ISO 8601
  "updatedAt": "2026-04-17T15:45:22.456Z"
}

IDs as strings. Return IDs as strings, even if the underlying type is a 64-bit integer:

{
  "id": "1234567890123456789"   ← string, not integer
}

JavaScript's Number type has 53 bits of integer precision. A 64-bit integer ID like 1234567890123456789 loses precision when parsed as a JavaScript Number — it becomes 1234567890123456800. Clients storing IDs as numbers silently corrupt them. Returning IDs as strings sidesteps this entirely. The database uses BIGINT internally; the API returns strings. No migration required later when IDs grow large enough to overflow JavaScript numbers.

Idempotency keys — safe retry for non-idempotent operations

POST /orders creates an order. If the client's request times out before receiving a response, they don't know whether the order was created. Retrying creates a duplicate.

Idempotency keys allow clients to safely retry:

POST /orders
Idempotency-Key: 7f8c9d2e-4a1b-4e3f-9c8d-2a1b3c4d5e6f

The server stores the key and the response. If the same key appears again, return the stored response — no second order is created.

@PostMapping("/orders")
public ResponseEntity<OrderResponse> createOrder(
        @RequestBody @Valid CreateOrderRequest request,
        @RequestHeader(value = "Idempotency-Key", required = false) String idempotencyKey) {

    if (idempotencyKey != null) {
        Optional<IdempotentResponse> cached = idempotencyCache.get(idempotencyKey);
        if (cached.isPresent()) {
            return ResponseEntity
                .status(cached.get().status())
                .body(cached.get().body());
        }
    }

    Order order = orderService.createOrder(request);
    OrderResponse response = OrderResponse.from(order);

    if (idempotencyKey != null) {
        idempotencyCache.store(idempotencyKey, HttpStatus.CREATED, response);
    }

    return ResponseEntity.status(HttpStatus.CREATED).body(response);
}

Document idempotency key support in the API spec. For financial operations, make it mandatory. For clients using your API in unreliable network conditions (mobile, IoT), it's the difference between a good integration and a frustrating one.

Null vs absent — be explicit

The ambiguity between null and absent field:

{"couponCode": null}   ← the coupon code field exists and has no value
{"couponCode": ""}     ← the coupon code field exists and is an empty string
{}                     ← the coupon code field wasn't provided

These mean different things semantically. Define the semantics explicitly in your API and document them:

  • Absent field: the client isn't providing a value for this field — don't change it (for PATCH semantics)
  • Explicit null: the client is explicitly clearing this field
  • Empty string: the client is providing an empty string value (usually semantically equivalent to null — define which)

For PATCH endpoints where you need to distinguish "don't change this field" from "set this field to null," JSON Merge Patch (application/merge-patch+json) handles this correctly:

PATCH /users/123
Content-Type: application/merge-patch+json

{
  "displayName": "Alice",
  "couponCode": null    ← explicitly clearing the coupon
}
// bio is absent — don't change it

Define your PATCH semantics in documentation. Whatever you choose, be consistent across the API — inconsistency is what confuses integrators.

Rate limit headers on every response

Clients can't throttle themselves without knowing their current rate limit state. Include rate limit headers on every response:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 743
X-RateLimit-Reset: 1713360000
Retry-After: 30        ← only on 429 responses

X-RateLimit-Reset as a Unix timestamp (seconds since epoch) tells clients exactly when their limit resets — they can calculate the appropriate backoff without guessing.

The backward-compatibility rules that prevent version bumps

Most breaking changes are avoidable with these rules:

Always return new fields in responses — clients that ignore unknown fields (and good clients always do) are unaffected. New fields are non-breaking.

Never remove or rename fields without a version bump. A client that reads order.customerName breaks when the field becomes order.customer.name.

Never change a field's type"total": 100 (integer) becoming "total": "100.00" (string) breaks clients that parse it as a number.

Never change a field's semantics"status": "active" meaning something different in v2 while keeping the same field name is a semantic breaking change that unit tests won't catch.

Make new required fields optional on the way in — if adding a required field to a request body, make it optional with a default for a transition period before making it required.

Keep enum values additive — add new values, never remove existing ones. Clients should handle unknown enum values gracefully (default case in switch statements), but removing a value they're currently sending or receiving is a breaking change.

These rules aren't difficult. They require discipline to follow consistently when making changes under deadline pressure. The cost of an undisciplined change is paid by integrators, not the team that made it.

The API review before launch

Before an API endpoint goes live, the questions worth asking:

  1. Is the URL a resource noun, not an operation verb?
  2. Does the response reflect client needs, not the database schema?
  3. Does the error response use the standard error shape?
  4. Are timestamps in ISO 8601 UTC?
  5. Are IDs returned as strings?
  6. Does the endpoint include rate limit headers?
  7. For POST operations: is idempotency supported?
  8. Are all field names consistent with the existing API vocabulary?
  9. Is pagination consistent with the rest of the API?
  10. Are the error codes documented in the spec?

These aren't rules for their own sake. Each one exists because violating it has caused a migration, a support escalation, or an integration incident — in some API, for some team, at some point. The discipline is cheaper than the cleanup.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

When “Don’t Touch This Code” Becomes a Team Culture

Some code becomes untouchable—not because it’s perfect, but because it’s fragile. And when that mindset spreads, it shapes the entire team culture.

Read more

The Difference Between a Developer and a Software Engineer

“Developer” and “software engineer” are often used interchangeably. But there’s a meaningful difference in approach, scope, and impact.

Read more

What Clients Often Get Wrong When Outsourcing Development

Outsourcing development seems simple: hire, delegate, and wait for results. In reality, many clients misunderstand what it takes to build quality software remotely.

Read more

The Risks of Shipping Code Without Review

Shipping code without a review might feel fast and efficient—but it’s a trap. One missed bug can ripple through your system and your team.

Read more