Hibernate Bulk Operations — update_all, delete_all, and Bypassing Entity Lifecycle

by Eric Hanson, Backend Developer at Clean Systems Consulting

The cost of the entity-by-entity approach

JPA's default workflow — load an entity, modify it, let Hibernate flush the change — is correct for single-record updates and small batches. For bulk operations, it's prohibitively expensive:

// Loads 10,000 entities into memory, updates each individually
List<Order> expiredOrders = orderRepository.findByExpiresAtBefore(LocalDateTime.now());
expiredOrders.forEach(order -> {
    order.setStatus(OrderStatus.EXPIRED);
    order.setExpiredAt(LocalDateTime.now());
});
// Generates 10,000 UPDATE statements at flush time

Three costs compound here: one SELECT to load all entities, 10,000 UPDATE statements at transaction commit (Hibernate flushes each entity's dirty state individually), and 10,000 entity instances in the persistence context accumulating memory and increasing dirty-check overhead.

The bulk operation does this in one statement:

UPDATE orders SET status = 'EXPIRED', expired_at = NOW()
WHERE expires_at < NOW() AND status != 'EXPIRED'

@Modifying queries — JPQL and native bulk operations

Spring Data's @Modifying annotation marks a @Query as a write operation. Without it, Spring Data treats all repository queries as reads:

@Modifying
@Query("UPDATE Order o SET o.status = :newStatus WHERE o.status = :currentStatus " +
       "AND o.expiresAt < :now")
int updateExpiredOrders(
    @Param("newStatus") OrderStatus newStatus,
    @Param("currentStatus") OrderStatus currentStatus,
    @Param("now") LocalDateTime now);

Returns the number of affected rows. The method must be called within a transaction — either annotate the repository method with @Transactional or call it from a @Transactional service method.

The persistence context synchronization problem. After a @Modifying query executes, the persistence context may be out of sync with the database. If entities modified by the bulk operation are already loaded in the persistence context, those in-memory instances still reflect the old state — subsequent reads within the same transaction may return stale data:

@Transactional
public void expireOrders() {
    // Load some orders — they're in the persistence context
    Order order = orderRepository.findById(42L).orElseThrow();
    System.out.println(order.getStatus()); // ACTIVE

    // Bulk update — modifies order 42 in the database
    orderRepository.updateExpiredOrders(EXPIRED, ACTIVE, LocalDateTime.now());

    // Stale read — persistence context still has the old state
    order = orderRepository.findById(42L).orElseThrow();
    System.out.println(order.getStatus()); // Still ACTIVE — loaded from persistence context
}

The fix: clear the persistence context after a @Modifying query, or ensure the bulk operation runs before loading entities:

@Modifying(clearAutomatically = true)  // clears persistence context after execution
@Query("UPDATE Order o SET o.status = 'EXPIRED' WHERE o.expiresAt < :now")
int expireOrders(@Param("now") LocalDateTime now);

clearAutomatically = true clears the entire persistence context after the bulk operation. Subsequent entity loads go to the database. Use it when the bulk operation affects entities that might be loaded later in the same transaction.

flushAutomatically = true flushes pending changes before executing the bulk query — ensures the database reflects any in-memory entity changes before the bulk operation runs. Use it when you've modified entities in the same transaction and need those changes visible to the bulk query.

Bulk deletes

deleteAllByStatus in Spring Data generates a query that loads the entities then deletes them individually, firing @PreRemove and @PostRemove lifecycle callbacks:

// DO NOT USE for bulk deletes — loads all matching entities
void deleteAllByStatus(OrderStatus status);
// Equivalent to: findAllByStatus(status).forEach(repository::delete)
// N+1 deletes: SELECT * FROM orders WHERE status = ?, then DELETE for each

For a bulk delete without lifecycle callbacks:

@Modifying(clearAutomatically = true)
@Query("DELETE FROM Order o WHERE o.status = :status")
int deleteByStatus(@Param("status") OrderStatus status);

This generates a single DELETE FROM orders WHERE status = ?. No entity loading, no lifecycle callbacks, no individual delete statements.

The cascade and orphan removal implications. Bulk deletes in JPQL bypass cascade operations. If Order has @OneToMany(cascade = CascadeType.REMOVE) to LineItem, the JPQL bulk delete removes orders but not their line items — foreign key constraint violations will occur unless line items are deleted first or the constraint has ON DELETE CASCADE at the database level.

Correct order for bulk deletes with cascade dependencies:

@Transactional
public int deleteExpiredOrders() {
    // Delete children first
    lineItemRepository.deleteByOrderStatus(OrderStatus.EXPIRED);
    // Then delete parents
    return orderRepository.deleteByStatus(OrderStatus.EXPIRED);
}

Or use a native query with CASCADE if the database schema defines it:

@Modifying
@Query(value = "DELETE FROM orders WHERE status = 'EXPIRED'", nativeQuery = true)
int deleteExpiredOrdersNative();
// Works if line_items has ON DELETE CASCADE on orders.id

Bulk inserts with saveAll and JDBC batch

saveAll(Iterable<T>) in Spring Data calls save() for each entity — it doesn't batch inserts by default. Each save() may execute a SELECT to determine whether to insert or update (merge semantics), followed by an INSERT.

For true batch inserts, configure Hibernate's JDBC batching:

spring:
  jpa:
    properties:
      hibernate:
        jdbc:
          batch_size: 50       # batch 50 inserts per JDBC batch
          order_inserts: true  # sort inserts by entity type for better batching
          order_updates: true  # same for updates
        id:
          new_generator_mappings: true  # required for sequence-based ID generation with batching

With batch_size = 50 and order_inserts = true, saveAll(1000 entities) generates 20 batched INSERT statements instead of 1000 individual ones.

Identity columns break batching. If your entity uses @GeneratedValue(strategy = GenerationType.IDENTITY) (auto-increment), Hibernate disables JDBC batching for that entity. The reason: identity columns return the generated ID only after the INSERT executes — Hibernate must execute each INSERT individually to get the ID back.

Switch to sequence-based ID generation for entities where batch insert performance matters:

@Entity
public class Order {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "order_seq")
    @SequenceGenerator(name = "order_seq", sequenceName = "order_seq",
                       allocationSize = 50)  // pre-fetches 50 IDs per sequence call
    private Long id;
}

allocationSize = 50 fetches 50 IDs from the database sequence in one call. Inserting 50 entities requires one sequence call and one batched INSERT — dramatically faster than 50 individual inserts.

When to use JDBC directly

For the highest-throughput bulk operations, bypass Hibernate entirely and use JdbcTemplate or NamedParameterJdbcTemplate:

@Repository
public class OrderBulkRepository {

    private final NamedParameterJdbcTemplate jdbcTemplate;

    public int bulkInsert(List<CreateOrderRequest> requests) {
        String sql = """
            INSERT INTO orders (customer_id, status, total, created_at)
            VALUES (:customerId, :status, :total, :createdAt)
            """;

        SqlParameterSource[] params = requests.stream()
            .map(r -> new MapSqlParameterSource()
                .addValue("customerId", r.customerId())
                .addValue("status", OrderStatus.PENDING.name())
                .addValue("total", r.total())
                .addValue("createdAt", Instant.now()))
            .toArray(SqlParameterSource[]::new);

        return Arrays.stream(jdbcTemplate.batchUpdate(sql, params)).sum();
    }
}

jdbcTemplate.batchUpdate() uses JDBC batch semantics — all inserts in a single round trip to the database. No entity lifecycle, no persistence context, no dirty checking, no ID generation overhead.

The tradeoff: JDBC operations bypass the persistence context entirely. Entities inserted via JDBC are not in the L1 cache — subsequent findById calls within the same transaction will query the database. L2 cache is also bypassed — manually evict if necessary.

Use JDBC for: importing large datasets (CSV import, data migration), high-frequency event recording (click tracking, audit logs), and any operation where the entity lifecycle (callbacks, cascade, L1 cache) adds overhead without value.

Bulk operations and audit fields

@PrePersist, @PreUpdate, @EntityListeners (JPA Auditing) — these fire only when entities are managed through the JPA lifecycle. Bulk @Modifying queries bypass them.

If created_at and updated_at fields are managed by JPA Auditing, a bulk update via @Modifying won't update updated_at. Two approaches:

Include the timestamp in the bulk query explicitly:

@Modifying
@Query("UPDATE Order o SET o.status = :status, o.updatedAt = :now WHERE o.id IN :ids")
int updateStatuses(@Param("status") OrderStatus status,
                   @Param("ids") List<Long> ids,
                   @Param("now") Instant now);

Let the database handle it with a trigger or default: DEFAULT now() in PostgreSQL or a BEFORE UPDATE trigger maintains updated_at automatically regardless of how the row is modified. This is the most reliable approach for audit fields in a codebase that mixes JPA and bulk SQL operations.

The decision hierarchy

Single record or small batches (< 100 records): standard JPA entity lifecycle. save(), delete(), entity mutation. Lifecycle callbacks, cascade, auditing all work correctly.

Medium batches (100–10,000 records): @Modifying @Query with JPQL. One SQL statement, no entity loading overhead. Add clearAutomatically = true if entities might be in the persistence context.

Large batches (10,000+ records) or high-frequency operations: JDBC batch operations via JdbcTemplate.batchUpdate(). Maximum throughput, minimum overhead. Handle audit fields and cache invalidation explicitly.

Bulk reads with streaming: @Query with Stream<T> return type — Hibernate streams entities through the persistence context without loading all at once:

@Query("SELECT o FROM Order o WHERE o.status = :status")
Stream<Order> streamByStatus(@Param("status") OrderStatus status);

// Usage — call within @Transactional, close the stream
@Transactional(readOnly = true)
public void processExpiredOrders(OrderStatus status) {
    try (Stream<Order> orders = orderRepository.streamByStatus(status)) {
        orders.forEach(this::processOrder);
    }
}

Streaming avoids loading the entire result set into memory. Hibernate keeps a fixed-size persistence context window. Essential for processing large tables without heap exhaustion.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

The True Cost of Maintaining Software

Launching an app feels like crossing the finish line. In reality, it’s just the moment the meter really starts running.

Read more

How to Onboard a Remote Backend Contractor So They Deliver From Week One

Contractor onboarding isn't employee onboarding with fewer steps. It's a different process with a different goal — and most startups set it up wrong.

Read more

The Decorator Pattern in Ruby — Clean Code Without the Bloat

Decorators solve the problem of adding behavior to objects without subclassing, but Ruby gives you several ways to implement them — each with different tradeoffs around interface fidelity, performance, and testability.

Read more

How I Handle Authentication in Rails API Mode Without Overcomplicating It

JWT, sessions, Devise, OAuth — Rails API authentication has more options than decisions that need making. Here is a clear-eyed breakdown of what to use when and how to implement it without pulling in more than you need.

Read more