JPA Query Optimization — What Hibernate Generates and How to Control It
by Eric Hanson, Backend Developer at Clean Systems Consulting
See what Hibernate actually generates
Before optimizing, log the SQL. Two configuration levels:
spring:
jpa:
show-sql: true # logs SQL without parameters — limited value
properties:
hibernate:
format_sql: true
logging:
level:
org.hibernate.SQL: DEBUG # logs SQL statements
org.hibernate.type.descriptor.sql: TRACE # logs bind parameters
show-sql: true logs unformatted SQL without bind values. The logging.level configuration produces formatted SQL with parameters — much more useful for diagnosing generated queries. In production, keep SQL logging at WARN or ERROR to avoid log volume; enable DEBUG temporarily when investigating.
hibernate.generate_statistics: true produces aggregate statistics at session close — query count, execution time, cache hit/miss. Useful for integration tests:
@Test
void processOrder_executesExpectedQueryCount() {
SessionFactory sf = entityManager.getEntityManagerFactory().unwrap(SessionFactory.class);
sf.getStatistics().setStatisticsEnabled(true);
sf.getStatistics().clear();
orderService.processOrder(orderId);
assertThat(sf.getStatistics().getQueryExecutionCount()).isLessThanOrEqualTo(3);
}
A test that asserts on query count is a regression detector for N+1 introduction. It's not a substitute for profiling, but it catches query count regressions automatically.
N+1 — the fetch strategy decisions that matter
Hibernate's default fetch type for @ManyToOne and @OneToOne is EAGER — the associated entity is loaded with the parent in a JOIN. For @OneToMany and @ManyToMany, the default is LAZY — the collection is loaded only when accessed.
EAGER loading sounds efficient but creates problems at scale: every query for the entity automatically joins related entities, even when they're not needed. A list of 1,000 Order entities with @ManyToOne(fetch = EAGER) to User executes a JOIN to users on every order query — even when the user data is never accessed.
The practical rule: always use LAZY for all associations. Load eagerly only when specifically needed, at the query level:
// Entity — all associations lazy
@Entity
public class Order {
@ManyToOne(fetch = FetchType.LAZY)
private User user;
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
private List<LineItem> lineItems;
}
For a specific query that needs the user and line items:
// JOIN FETCH — loads order + user + line items in one SQL query
@Query("SELECT o FROM Order o " +
"JOIN FETCH o.user " +
"LEFT JOIN FETCH o.lineItems " +
"WHERE o.status = :status")
List<Order> findByStatusWithDetails(@Param("status") OrderStatus status);
The generated SQL:
SELECT o.*, u.*, li.*
FROM orders o
JOIN users u ON o.user_id = u.id
LEFT JOIN line_items li ON li.order_id = o.id
WHERE o.status = ?
One query. JOIN FETCH is the JPQL syntax for eager loading at query time without modifying the entity mapping.
@EntityGraph — declarative fetch path without modifying JPQL:
@EntityGraph(attributePaths = {"user", "lineItems", "lineItems.product"})
List<Order> findByStatus(OrderStatus status);
@EntityGraph is equivalent to JOIN FETCH but expressed as a repository annotation. It's cleaner for simple fetch paths; JOIN FETCH is more flexible for complex queries.
Batch fetching — the alternative to JOIN FETCH for collections:
JOIN FETCH for @OneToMany collections produces a result set that duplicates the parent row for each child. An order with 100 line items returns 100 rows — all with identical order columns. For large collections, this is wasteful.
Batch fetching loads collections in bulk using IN clauses:
@Entity
@BatchSize(size = 20) // load collections in batches of 20
public class Order {
@OneToMany
@BatchSize(size = 20) // or on the collection
private List<LineItem> lineItems;
}
With @BatchSize(20), if 100 orders are loaded and their lineItems are accessed, Hibernate executes 5 queries instead of 100: SELECT * FROM line_items WHERE order_id IN (1, 2, ..., 20), repeated 5 times. This is significantly better than 100 individual queries but avoids the row duplication of JOIN FETCH.
Use JOIN FETCH when the collection is always needed and is small. Use @BatchSize when the collection is sometimes needed and may be large.
Projections — stop loading what you don't need
findAll() and findById() return fully hydrated entities — all columns, all mapped fields. When you only need a subset of columns, projections avoid loading unnecessary data:
Interface projections:
public interface OrderSummary {
Long getId();
OrderStatus getStatus();
BigDecimal getTotal();
String getCustomerEmail(); // from joined table
}
// Spring Data generates: SELECT id, status, total, u.email FROM orders o JOIN users u ON ...
List<OrderSummary> findByStatus(OrderStatus status);
Spring Data JPA generates a query that selects only the projected columns. Hibernate doesn't hydrate an Order entity — it creates proxy instances implementing the projection interface.
Class/record projections (constructor expressions):
public record OrderRow(Long id, OrderStatus status, BigDecimal total) {}
@Query("SELECT new com.example.OrderRow(o.id, o.status, o.total) FROM Order o WHERE o.status = :status")
List<OrderRow> findOrderRows(@Param("status") OrderStatus status);
Constructor expressions in JPQL produce DTO instances directly — no entity lifecycle, no dirty checking, no persistence context tracking. Faster for read-only bulk operations.
The Tuple projection for ad hoc queries:
@Query("SELECT o.id, o.status, SUM(li.amount) as total " +
"FROM Order o JOIN o.lineItems li " +
"GROUP BY o.id, o.status")
List<Tuple> findOrderTotals();
// Usage
orderTotals.forEach(t -> {
Long id = t.get(0, Long.class);
String status = t.get(1, String.class);
BigDecimal total = t.get("total", BigDecimal.class);
});
Tuple is useful for aggregation queries that don't map cleanly to an entity or DTO. Access by index or by named alias.
Pagination — Slice vs Page
Spring Data's findAll(Pageable pageable) returns Page<T> — it executes both the data query and a SELECT COUNT(*) query to populate Page.getTotalElements(). The count query runs against the full unfiltered result set:
-- Data query
SELECT * FROM orders WHERE status = 'PENDING' LIMIT 25 OFFSET 50
-- Count query (always executed with Page)
SELECT COUNT(*) FROM orders WHERE status = 'PENDING'
-- On a table with 10M orders, this scans the index for every page request
If clients don't need the total count, use Slice<T> — it executes only the data query:
// Returns Slice — no count query
Slice<Order> findByStatus(OrderStatus status, Pageable pageable);
Slice provides hasNext() (fetches one extra row to determine if more pages exist) but no total count. For infinite scroll, cursor pagination, and most mobile patterns, Slice is correct and avoids the count query overhead.
For queries where the count is expensive — complex joins, large tables — consider caching the count separately with a longer TTL than the data query.
The IN clause explosion problem
Spring Data's derived query methods with collections generate IN clauses:
List<Order> findByIdIn(List<Long> ids);
// Generates: SELECT * FROM orders WHERE id IN (?, ?, ?, ...)
This is efficient for small lists. For lists of thousands of IDs, the IN clause becomes a performance problem: the query plan can't use an index efficiently for very large IN lists, and the parameter binding overhead grows linearly.
Chunking for large ID lists:
public List<Order> findByIds(List<Long> ids) {
if (ids.size() <= 1000) {
return orderRepository.findByIdIn(ids);
}
// Split into chunks of 1000
return Lists.partition(ids, 1000).stream()
.flatMap(chunk -> orderRepository.findByIdIn(chunk).stream())
.collect(toList());
}
Temporary table for very large sets — insert IDs into a temp table and join against it:
@Query(value = """
SELECT o.* FROM orders o
JOIN (VALUES :ids) AS t(id) ON o.id = t.id
""", nativeQuery = true)
List<Order> findByIdList(@Param("ids") List<Long> ids);
PostgreSQL's VALUES constructor is efficient for this pattern — the query planner treats it as a table and can use an index join. Better than a large IN clause for hundreds of IDs.
When to write native SQL
JPQL is sufficient for most queries. The cases where native SQL is the right call:
Database-specific features. Window functions (ROW_NUMBER() OVER), LATERAL joins, full-text search (tsvector, @@), RETURNING clauses — JPQL has no syntax for these. Write them in native SQL:
@Query(value = """
SELECT o.*, u.email,
ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.created_at DESC) as rn
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = :status
""", nativeQuery = true)
List<Object[]> findOrdersWithRanking(@Param("status") String status);
Complex aggregations. Multi-level grouping, conditional aggregation (SUM(CASE WHEN...)), CUBE and ROLLUP — JPQL can't express these cleanly.
Performance-critical queries where you need exact SQL. When the query planner behavior matters and you need specific index hints, join order, or table statistics.
The nativeQuery = true mapping considerations. Native queries return Object[] by default unless mapped to a DTO projection or an entity. Returning Object[] and casting at the call site is fragile — use SqlResultSetMapping or Spring Data's interface projections for native queries:
@Query(value = "SELECT id, status, total FROM orders WHERE status = :status",
nativeQuery = true)
List<OrderSummary> findByStatusNative(@Param("status") String status);
// OrderSummary is an interface projection — Spring Data maps columns to getters
The query execution plan
All of the above produces SQL. Whether that SQL is efficient depends on the query execution plan — which indexes are used, whether sequential scans occur, join order. The tools:
EXPLAIN ANALYZE in PostgreSQL shows the actual execution plan with row estimates and timing. Run it against production-scale data; staging data with different cardinality produces different plans.
org.hibernate.engine.jdbc.internal.LogicalConnectionManagedImpl logging at DEBUG shows each SQL statement. Enable temporarily, run the feature, look for unexpectedly high query counts or large IN clauses.
Spring Boot Actuator's /actuator/metrics/hibernate.query.executions and /actuator/metrics/hibernate.query.plan.cache.hit.ratio reveal query count and plan cache utilization. A low plan cache hit ratio indicates many unique queries — check for queries built with string concatenation instead of bind parameters.
Efficient JPA is not about avoiding JPA — it's about understanding what SQL Hibernate generates and when to guide it explicitly. The entity model defines the default; @Query, @EntityGraph, projections, and @BatchSize are the tools for overriding the default when the generated SQL isn't good enough.