JPA Query Optimization — What Hibernate Generates and How to Control It

by Eric Hanson, Backend Developer at Clean Systems Consulting

See what Hibernate actually generates

Before optimizing, log the SQL. Two configuration levels:

spring:
  jpa:
    show-sql: true  # logs SQL without parameters — limited value
    properties:
      hibernate:
        format_sql: true
logging:
  level:
    org.hibernate.SQL: DEBUG          # logs SQL statements
    org.hibernate.type.descriptor.sql: TRACE  # logs bind parameters

show-sql: true logs unformatted SQL without bind values. The logging.level configuration produces formatted SQL with parameters — much more useful for diagnosing generated queries. In production, keep SQL logging at WARN or ERROR to avoid log volume; enable DEBUG temporarily when investigating.

hibernate.generate_statistics: true produces aggregate statistics at session close — query count, execution time, cache hit/miss. Useful for integration tests:

@Test
void processOrder_executesExpectedQueryCount() {
    SessionFactory sf = entityManager.getEntityManagerFactory().unwrap(SessionFactory.class);
    sf.getStatistics().setStatisticsEnabled(true);
    sf.getStatistics().clear();

    orderService.processOrder(orderId);

    assertThat(sf.getStatistics().getQueryExecutionCount()).isLessThanOrEqualTo(3);
}

A test that asserts on query count is a regression detector for N+1 introduction. It's not a substitute for profiling, but it catches query count regressions automatically.

N+1 — the fetch strategy decisions that matter

Hibernate's default fetch type for @ManyToOne and @OneToOne is EAGER — the associated entity is loaded with the parent in a JOIN. For @OneToMany and @ManyToMany, the default is LAZY — the collection is loaded only when accessed.

EAGER loading sounds efficient but creates problems at scale: every query for the entity automatically joins related entities, even when they're not needed. A list of 1,000 Order entities with @ManyToOne(fetch = EAGER) to User executes a JOIN to users on every order query — even when the user data is never accessed.

The practical rule: always use LAZY for all associations. Load eagerly only when specifically needed, at the query level:

// Entity — all associations lazy
@Entity
public class Order {
    @ManyToOne(fetch = FetchType.LAZY)
    private User user;

    @OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
    private List<LineItem> lineItems;
}

For a specific query that needs the user and line items:

// JOIN FETCH — loads order + user + line items in one SQL query
@Query("SELECT o FROM Order o " +
       "JOIN FETCH o.user " +
       "LEFT JOIN FETCH o.lineItems " +
       "WHERE o.status = :status")
List<Order> findByStatusWithDetails(@Param("status") OrderStatus status);

The generated SQL:

SELECT o.*, u.*, li.*
FROM orders o
JOIN users u ON o.user_id = u.id
LEFT JOIN line_items li ON li.order_id = o.id
WHERE o.status = ?

One query. JOIN FETCH is the JPQL syntax for eager loading at query time without modifying the entity mapping.

@EntityGraph — declarative fetch path without modifying JPQL:

@EntityGraph(attributePaths = {"user", "lineItems", "lineItems.product"})
List<Order> findByStatus(OrderStatus status);

@EntityGraph is equivalent to JOIN FETCH but expressed as a repository annotation. It's cleaner for simple fetch paths; JOIN FETCH is more flexible for complex queries.

Batch fetching — the alternative to JOIN FETCH for collections:

JOIN FETCH for @OneToMany collections produces a result set that duplicates the parent row for each child. An order with 100 line items returns 100 rows — all with identical order columns. For large collections, this is wasteful.

Batch fetching loads collections in bulk using IN clauses:

@Entity
@BatchSize(size = 20)  // load collections in batches of 20
public class Order {
    @OneToMany
    @BatchSize(size = 20)  // or on the collection
    private List<LineItem> lineItems;
}

With @BatchSize(20), if 100 orders are loaded and their lineItems are accessed, Hibernate executes 5 queries instead of 100: SELECT * FROM line_items WHERE order_id IN (1, 2, ..., 20), repeated 5 times. This is significantly better than 100 individual queries but avoids the row duplication of JOIN FETCH.

Use JOIN FETCH when the collection is always needed and is small. Use @BatchSize when the collection is sometimes needed and may be large.

Projections — stop loading what you don't need

findAll() and findById() return fully hydrated entities — all columns, all mapped fields. When you only need a subset of columns, projections avoid loading unnecessary data:

Interface projections:

public interface OrderSummary {
    Long getId();
    OrderStatus getStatus();
    BigDecimal getTotal();
    String getCustomerEmail();  // from joined table
}

// Spring Data generates: SELECT id, status, total, u.email FROM orders o JOIN users u ON ...
List<OrderSummary> findByStatus(OrderStatus status);

Spring Data JPA generates a query that selects only the projected columns. Hibernate doesn't hydrate an Order entity — it creates proxy instances implementing the projection interface.

Class/record projections (constructor expressions):

public record OrderRow(Long id, OrderStatus status, BigDecimal total) {}

@Query("SELECT new com.example.OrderRow(o.id, o.status, o.total) FROM Order o WHERE o.status = :status")
List<OrderRow> findOrderRows(@Param("status") OrderStatus status);

Constructor expressions in JPQL produce DTO instances directly — no entity lifecycle, no dirty checking, no persistence context tracking. Faster for read-only bulk operations.

The Tuple projection for ad hoc queries:

@Query("SELECT o.id, o.status, SUM(li.amount) as total " +
       "FROM Order o JOIN o.lineItems li " +
       "GROUP BY o.id, o.status")
List<Tuple> findOrderTotals();

// Usage
orderTotals.forEach(t -> {
    Long id     = t.get(0, Long.class);
    String status = t.get(1, String.class);
    BigDecimal total = t.get("total", BigDecimal.class);
});

Tuple is useful for aggregation queries that don't map cleanly to an entity or DTO. Access by index or by named alias.

Pagination — Slice vs Page

Spring Data's findAll(Pageable pageable) returns Page<T> — it executes both the data query and a SELECT COUNT(*) query to populate Page.getTotalElements(). The count query runs against the full unfiltered result set:

-- Data query
SELECT * FROM orders WHERE status = 'PENDING' LIMIT 25 OFFSET 50

-- Count query (always executed with Page)
SELECT COUNT(*) FROM orders WHERE status = 'PENDING'
-- On a table with 10M orders, this scans the index for every page request

If clients don't need the total count, use Slice<T> — it executes only the data query:

// Returns Slice — no count query
Slice<Order> findByStatus(OrderStatus status, Pageable pageable);

Slice provides hasNext() (fetches one extra row to determine if more pages exist) but no total count. For infinite scroll, cursor pagination, and most mobile patterns, Slice is correct and avoids the count query overhead.

For queries where the count is expensive — complex joins, large tables — consider caching the count separately with a longer TTL than the data query.

The IN clause explosion problem

Spring Data's derived query methods with collections generate IN clauses:

List<Order> findByIdIn(List<Long> ids);
// Generates: SELECT * FROM orders WHERE id IN (?, ?, ?, ...)

This is efficient for small lists. For lists of thousands of IDs, the IN clause becomes a performance problem: the query plan can't use an index efficiently for very large IN lists, and the parameter binding overhead grows linearly.

Chunking for large ID lists:

public List<Order> findByIds(List<Long> ids) {
    if (ids.size() <= 1000) {
        return orderRepository.findByIdIn(ids);
    }
    // Split into chunks of 1000
    return Lists.partition(ids, 1000).stream()
        .flatMap(chunk -> orderRepository.findByIdIn(chunk).stream())
        .collect(toList());
}

Temporary table for very large sets — insert IDs into a temp table and join against it:

@Query(value = """
    SELECT o.* FROM orders o
    JOIN (VALUES :ids) AS t(id) ON o.id = t.id
    """, nativeQuery = true)
List<Order> findByIdList(@Param("ids") List<Long> ids);

PostgreSQL's VALUES constructor is efficient for this pattern — the query planner treats it as a table and can use an index join. Better than a large IN clause for hundreds of IDs.

When to write native SQL

JPQL is sufficient for most queries. The cases where native SQL is the right call:

Database-specific features. Window functions (ROW_NUMBER() OVER), LATERAL joins, full-text search (tsvector, @@), RETURNING clauses — JPQL has no syntax for these. Write them in native SQL:

@Query(value = """
    SELECT o.*, u.email,
           ROW_NUMBER() OVER (PARTITION BY o.user_id ORDER BY o.created_at DESC) as rn
    FROM orders o
    JOIN users u ON o.user_id = u.id
    WHERE o.status = :status
    """, nativeQuery = true)
List<Object[]> findOrdersWithRanking(@Param("status") String status);

Complex aggregations. Multi-level grouping, conditional aggregation (SUM(CASE WHEN...)), CUBE and ROLLUP — JPQL can't express these cleanly.

Performance-critical queries where you need exact SQL. When the query planner behavior matters and you need specific index hints, join order, or table statistics.

The nativeQuery = true mapping considerations. Native queries return Object[] by default unless mapped to a DTO projection or an entity. Returning Object[] and casting at the call site is fragile — use SqlResultSetMapping or Spring Data's interface projections for native queries:

@Query(value = "SELECT id, status, total FROM orders WHERE status = :status",
       nativeQuery = true)
List<OrderSummary> findByStatusNative(@Param("status") String status);
// OrderSummary is an interface projection — Spring Data maps columns to getters

The query execution plan

All of the above produces SQL. Whether that SQL is efficient depends on the query execution plan — which indexes are used, whether sequential scans occur, join order. The tools:

EXPLAIN ANALYZE in PostgreSQL shows the actual execution plan with row estimates and timing. Run it against production-scale data; staging data with different cardinality produces different plans.

org.hibernate.engine.jdbc.internal.LogicalConnectionManagedImpl logging at DEBUG shows each SQL statement. Enable temporarily, run the feature, look for unexpectedly high query counts or large IN clauses.

Spring Boot Actuator's /actuator/metrics/hibernate.query.executions and /actuator/metrics/hibernate.query.plan.cache.hit.ratio reveal query count and plan cache utilization. A low plan cache hit ratio indicates many unique queries — check for queries built with string concatenation instead of bind parameters.

Efficient JPA is not about avoiding JPA — it's about understanding what SQL Hibernate generates and when to guide it explicitly. The entity model defines the default; @Query, @EntityGraph, projections, and @BatchSize are the tools for overriding the default when the generated SQL isn't good enough.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Consistent Error Handling Across Your API Is Not a Nice to Have

Inconsistent error shapes across endpoints force developers to write defensive code for every route they touch. Consistency is not polish — it is correctness.

Read more

What It Actually Costs to Hire a Senior Backend Developer in Sydney

You budgeted $160K for a senior backend hire. Then you saw what they actually cost once super, recruiter fees, and three months of low output were factored in.

Read more

Why Your API Feels Inconsistent and How to Fix It

Inconsistent APIs aren’t just annoying—they slow teams down and introduce subtle bugs. Most inconsistency comes from a lack of enforced patterns, not lack of skill, and fixing it requires deliberate constraints.

Read more

When Asynchronous Developers Are the Right Choice for Your Team

Not every team needs daily meetings and instant replies. Sometimes, the best work happens when people aren’t online at the same time.

Read more