Writing Efficient JPA Queries — Fetch Strategies, Projections, and Native Queries
by Eric Hanson, Backend Developer at Clean Systems Consulting
Fetch strategies — the decision that determines query count
Every association in a JPA entity has a fetch type: EAGER (loaded with the parent) or LAZY (loaded on first access). The default differs by association type:
@ManyToOne,@OneToOne— defaultEAGER@OneToMany,@ManyToMany— defaultLAZY
The @ManyToOne eager default is the primary source of unintended JOINs. Every query for Order automatically JOINs User because of the eager user association — even when the user is never accessed in the response.
Set all associations to LAZY globally, then load eagerly per query:
@Entity
public class Order {
@ManyToOne(fetch = FetchType.LAZY) // override the default
private User user;
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY) // already default
private List<LineItem> lineItems;
@OneToOne(fetch = FetchType.LAZY) // override the default
private ShippingAddress shippingAddress;
}
With all associations lazy, a query for orders never automatically joins related tables. Load what you need explicitly per query.
JOIN FETCH — eager loading for a specific query
JOIN FETCH in JPQL tells Hibernate to include the association in the same SQL query:
// Loads order + user + line items in one SQL statement
@Query("SELECT o FROM Order o " +
"JOIN FETCH o.user " +
"LEFT JOIN FETCH o.lineItems " +
"WHERE o.status = :status")
List<Order> findByStatusWithUserAndItems(@Param("status") OrderStatus status);
Generated SQL:
SELECT o.*, u.*, li.*
FROM orders o
JOIN users u ON o.user_id = u.id
LEFT JOIN line_items li ON li.order_id = o.id
WHERE o.status = ?
One query. No N+1. The tradeoff: JOIN FETCH with @OneToMany produces a Cartesian product — each order row is duplicated for each line item. An order with 10 line items returns 10 rows, all with identical order and user columns.
For small, bounded collections this is acceptable. For large collections (hundreds of items per parent), the row duplication is wasteful. Use DISTINCT in JPQL to deduplicate at the Hibernate level:
@Query("SELECT DISTINCT o FROM Order o LEFT JOIN FETCH o.lineItems WHERE o.status = :status")
List<Order> findByStatusWithItems(@Param("status") OrderStatus status);
DISTINCT in JPQL filters duplicates in memory after the query — it does not add DISTINCT to the SQL (Hibernate adds it to the SQL only in specific scenarios). The result set is deduplicated by entity identity.
@EntityGraph — declarative fetch paths
@EntityGraph achieves the same result as JOIN FETCH without modifying the JPQL:
@EntityGraph(attributePaths = {"user", "lineItems"})
List<Order> findByStatus(OrderStatus status);
// Generates the same JOIN FETCH query as above
@EntityGraph is cleaner for repository-level control without explicit JPQL. Its limitation: it always generates a JOIN, which is not appropriate for large @OneToMany collections where batch fetching is more efficient.
For named entity graphs defined on the entity:
@Entity
@NamedEntityGraph(
name = "Order.withDetails",
attributeNodes = {
@NamedAttributeNode("user"),
@NamedAttributeNode(value = "lineItems", subgraph = "lineItems.product")
},
subgraphs = {
@NamedSubgraph(name = "lineItems.product",
attributeNodes = @NamedAttributeNode("product"))
}
)
public class Order { ... }
// Repository
@EntityGraph("Order.withDetails")
Optional<Order> findById(Long id);
Named entity graphs express complex fetch paths including subgraphs. The definition lives on the entity and can be reused across multiple repository methods.
Batch fetching — the alternative for large collections
JOIN FETCH for @OneToMany collections causes row duplication. Batch fetching avoids this by loading collections in IN clause batches after the parent query:
// On the entity
@OneToMany(mappedBy = "order")
@BatchSize(size = 50)
private List<LineItem> lineItems;
With @BatchSize(50), loading 200 orders and accessing their line items generates:
- 1 query for orders
- 4 queries for line items:
WHERE order_id IN (1..50),IN (51..100), etc.
Total: 5 queries instead of 201. No row duplication. For large collections accessed across many parents, batch fetching is significantly more efficient than JOIN FETCH.
Configure batch fetching globally for all associations:
spring:
jpa:
properties:
hibernate:
default_batch_fetch_size: 50
The global setting applies to all lazy associations unless overridden per-entity. This is often the single configuration change with the most impact on N+1 query reduction across an entire application.
Projections — selecting only what you need
Full entity loading returns all mapped columns. For read-only views, summary lists, and API responses that need a subset of fields, projections reduce data transfer and entity tracking overhead.
Interface projections — Spring Data generates a SELECT for only the projected columns:
public interface OrderSummary {
Long getId();
OrderStatus getStatus();
BigDecimal getTotal();
// Nested projection — joins to user table
UserEmail getUser();
interface UserEmail {
String getEmail();
}
}
List<OrderSummary> findByStatus(OrderStatus status);
// SELECT o.id, o.status, o.total, u.email FROM orders o JOIN users u ON ...
Interface projections create proxy objects — Spring generates an implementation at runtime. They're convenient but carry the overhead of proxy creation for each result row.
Record projections — no proxy, direct object construction:
public record OrderRow(Long id, OrderStatus status, BigDecimal total, String userEmail) {}
@Query("SELECT new com.example.OrderRow(o.id, o.status, o.total, u.email) " +
"FROM Order o JOIN o.user u WHERE o.status = :status")
List<OrderRow> findOrderRows(@Param("status") OrderStatus status);
Constructor expressions in JPQL create value objects directly — no entity lifecycle, no dirty checking, no persistence context tracking. Faster for high-volume read operations.
The projection vs entity tradeoff. Projections are faster for reads but can't be used for writes — they're not managed entities. The pattern: use projections for queries that feed API responses or reports; use full entities for queries that lead to updates.
When JPQL is not enough
JPQL covers most query patterns. Three cases where it falls short:
Database-specific functions and operators. Window functions (ROW_NUMBER(), LAG(), LEAD()), LATERAL joins, RETURNING, PostgreSQL's tsvector full-text search — JPQL has no syntax for these.
Complex aggregations. GROUPING SETS, CUBE, ROLLUP, conditional aggregation across multiple columns with complex filter conditions.
Performance-critical queries. When you need specific index hints, join order control, or query plan determinism.
For these cases, native SQL:
@Query(value = """
WITH monthly_totals AS (
SELECT user_id,
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS monthly_total,
COUNT(*) AS order_count
FROM orders
WHERE status = 'COMPLETED'
GROUP BY user_id, DATE_TRUNC('month', created_at)
)
SELECT u.email,
mt.month,
mt.monthly_total,
mt.order_count,
RANK() OVER (PARTITION BY mt.month ORDER BY mt.monthly_total DESC) AS rank
FROM monthly_totals mt
JOIN users u ON mt.user_id = u.id
WHERE :minTotal IS NULL OR mt.monthly_total > :minTotal
""", nativeQuery = true)
List<Object[]> findMonthlyTopSpenders(@Param("minTotal") BigDecimal minTotal);
Native queries return Object[] by default. Map to a projection interface for cleaner access:
public interface MonthlySpenderSummary {
String getEmail();
Instant getMonth();
BigDecimal getMonthlyTotal();
Long getOrderCount();
Long getRank();
}
@Query(value = "...", nativeQuery = true)
List<MonthlySpenderSummary> findMonthlyTopSpenders(@Param("minTotal") BigDecimal minTotal);
Spring Data maps native query column names to interface getter names (case-insensitive, underscore-to-camelCase conversion). monthly_total maps to getMonthlyTotal().
Pagination — controlling result set size
Unbounded queries on large tables are a reliability risk. Always apply pagination or explicit limits:
// Pageable — adds LIMIT and OFFSET
Page<Order> findByStatus(OrderStatus status, Pageable pageable);
// Slice — adds LIMIT only, no COUNT query
Slice<Order> findByStatus(OrderStatus status, Pageable pageable);
// Top N — implicit limit
List<Order> findTop10ByStatusOrderByCreatedAtDesc(OrderStatus status);
Page<T> executes two queries: the data query with LIMIT/OFFSET and a COUNT(*) for total element count. For large tables, the count query is expensive — avoid it when clients don't need the total.
Slice<T> executes only the data query, fetching pageSize + 1 rows to determine if more pages exist. Use Slice for infinite scroll and cursor-based pagination where total count is irrelevant.
For the data query itself, OFFSET-based pagination degrades with large offsets — OFFSET 100000 requires the database to read and discard 100,000 rows before returning results. For large datasets, cursor-based pagination (keyset pagination) is more efficient:
@Query("SELECT o FROM Order o WHERE o.createdAt < :cursor ORDER BY o.createdAt DESC")
List<Order> findBeforeCursor(@Param("cursor") Instant cursor, Pageable pageable);
The cursor is the createdAt value of the last item on the previous page. The database uses the index on created_at to find the starting position directly — no row skipping.
The query audit before production
Before any endpoint goes to production:
Enable SQL logging and exercise the endpoint with a realistic data volume:
logging:
level:
org.hibernate.SQL: DEBUG
org.hibernate.type.descriptor.sql: TRACE
Check for:
- Query count per request — more than 3–5 queries for a single request warrants investigation
- Full entity loads where projections would suffice — look for
SELECT *where only some columns are used - Missing WHERE clause — unbounded queries that grow with table size
OFFSETon queries where cursor pagination would be more appropriate
The SQL log shows exactly what Hibernate generates. The gap between what you intended and what was generated is where most JPA performance problems live.