Stop Returning Everything When the Client Only Needs a Few Fields
by Eric Hanson, Backend Developer at Clean Systems Consulting
The list endpoint that returns full objects
A product list page needs to display 50 items, each showing a name, thumbnail URL, and price. Your API returns the full product object: name, description, price, SKU, weight, dimensions, inventory count, supplier ID, cost basis, tax class, 12 image URLs, and 40 other fields.
The client serializes 50 of these, transmits them over the network, and deserializes them, then uses six fields and discards the rest. The work multiplies across every mobile client, every request, every page load.
This is over-fetching. It wastes bandwidth, increases latency on slow connections, increases memory pressure on mobile clients, and leaks fields the client probably should not have (cost basis, supplier ID, inventory count).
The REST approach: sparse fieldsets
The JSON:API spec defines a fields query parameter for requesting specific fields:
GET /products?fields[products]=name,price,thumbnail_url
A more common REST convention uses fields or include as flat query params:
GET /products?fields=name,price,thumbnail_url
The response includes only the requested fields:
{
"data": [
{ "id": "prod_01HZ", "name": "Widget Pro", "price": 29.99, "thumbnail_url": "https://..." },
...
]
}
Implementation in a typical REST framework:
@app.get("/products")
def list_products(fields: Optional[str] = None):
field_list = fields.split(",") if fields else DEFAULT_FIELDS
allowed = set(PUBLIC_PRODUCT_FIELDS)
requested = set(field_list) & allowed # never allow fields outside whitelist
products = db.query(Product).limit(50).all()
return [serialize(p, fields=requested) for p in products]
The whitelist is non-negotiable. Do not allow clients to request any field they name — enforce a set of fields the client is permitted to see. This prevents the sparse fieldset mechanism from becoming a data exposure vector.
GraphQL as structural projection
GraphQL solves the same problem at the protocol level. Clients specify exactly the shape of the data they need:
query {
products(first: 50) {
nodes {
name
price
thumbnailUrl
}
}
}
The server resolves only the requested fields. With a dataloader pattern, this also reduces database queries: instead of fetching related objects that were not requested, only the queried fields trigger data resolution.
GraphQL makes sense when: clients have highly variable data needs, you have multiple clients (mobile, web, partner) with different field requirements, and you are willing to invest in the schema definition and resolver architecture.
It does not make sense when: your API has a small number of well-defined resources with relatively stable shapes, you do not control the clients (public API), or you need HTTP-level caching (GraphQL's POST-only convention breaks standard HTTP caching).
The database query implication
Returning fewer fields is most valuable when it also reduces the database query. If you are doing SELECT * and then filtering in application code, you saved network bandwidth but not I/O.
Use projection at the query level:
# Before: fetches all columns
products = db.query(Product).all()
# After: only fetches requested columns
products = db.query(
Product.id, Product.name, Product.price, Product.thumbnail_url
).all()
In PostgreSQL, this reduces the amount of data read from disk for tables with wide rows, especially when the columns not requested include large text fields or JSONB columns.
For list endpoints with many rows, this compound effect — fewer bytes from the database, fewer bytes over the network, less serialization work — is meaningful. A product table with a description column averaging 2KB per row: a list of 50 products requesting everything reads 100KB+ just for descriptions. A projection that excludes description drops that to near zero.
ETags and conditional requests for change detection
When a client needs to check whether data has changed (polling pattern), returning the full object on every poll is wasteful even if the data is the same.
Use ETag and If-None-Match for conditional responses:
First request
GET /products/42
→ 200 OK
ETag: "v1-abc123"
{full product object}
Subsequent request
GET /products/42
If-None-Match: "v1-abc123"
→ 304 Not Modified
(empty body)
The client's cache is still valid. No data transferred. This requires computing a hash of the response content (or using a version field), but the bandwidth savings for polling-heavy clients are significant.
What the defaults should be
Every list endpoint should have a defined default field set — the fields that are returned when no fields parameter is specified. Make the default minimal and useful rather than comprehensive. Add fields on explicit request, not by default.
This is a contract. Once you ship a default field set, removing fields from it is a breaking change. Be conservative about what goes in the default from the start.