Hibernate Bulk Operations — update_all, delete_all, and Bypassing Entity Lifecycle
by Eric Hanson, Backend Developer at Clean Systems Consulting
The cost of the entity-by-entity approach
JPA's default workflow — load an entity, modify it, let Hibernate flush the change — is correct for single-record updates and small batches. For bulk operations, it's prohibitively expensive:
// Loads 10,000 entities into memory, updates each individually
List<Order> expiredOrders = orderRepository.findByExpiresAtBefore(LocalDateTime.now());
expiredOrders.forEach(order -> {
order.setStatus(OrderStatus.EXPIRED);
order.setExpiredAt(LocalDateTime.now());
});
// Generates 10,000 UPDATE statements at flush time
Three costs compound here: one SELECT to load all entities, 10,000 UPDATE statements at transaction commit (Hibernate flushes each entity's dirty state individually), and 10,000 entity instances in the persistence context accumulating memory and increasing dirty-check overhead.
The bulk operation does this in one statement:
UPDATE orders SET status = 'EXPIRED', expired_at = NOW()
WHERE expires_at < NOW() AND status != 'EXPIRED'
@Modifying queries — JPQL and native bulk operations
Spring Data's @Modifying annotation marks a @Query as a write operation. Without it, Spring Data treats all repository queries as reads:
@Modifying
@Query("UPDATE Order o SET o.status = :newStatus WHERE o.status = :currentStatus " +
"AND o.expiresAt < :now")
int updateExpiredOrders(
@Param("newStatus") OrderStatus newStatus,
@Param("currentStatus") OrderStatus currentStatus,
@Param("now") LocalDateTime now);
Returns the number of affected rows. The method must be called within a transaction — either annotate the repository method with @Transactional or call it from a @Transactional service method.
The persistence context synchronization problem. After a @Modifying query executes, the persistence context may be out of sync with the database. If entities modified by the bulk operation are already loaded in the persistence context, those in-memory instances still reflect the old state — subsequent reads within the same transaction may return stale data:
@Transactional
public void expireOrders() {
// Load some orders — they're in the persistence context
Order order = orderRepository.findById(42L).orElseThrow();
System.out.println(order.getStatus()); // ACTIVE
// Bulk update — modifies order 42 in the database
orderRepository.updateExpiredOrders(EXPIRED, ACTIVE, LocalDateTime.now());
// Stale read — persistence context still has the old state
order = orderRepository.findById(42L).orElseThrow();
System.out.println(order.getStatus()); // Still ACTIVE — loaded from persistence context
}
The fix: clear the persistence context after a @Modifying query, or ensure the bulk operation runs before loading entities:
@Modifying(clearAutomatically = true) // clears persistence context after execution
@Query("UPDATE Order o SET o.status = 'EXPIRED' WHERE o.expiresAt < :now")
int expireOrders(@Param("now") LocalDateTime now);
clearAutomatically = true clears the entire persistence context after the bulk operation. Subsequent entity loads go to the database. Use it when the bulk operation affects entities that might be loaded later in the same transaction.
flushAutomatically = true flushes pending changes before executing the bulk query — ensures the database reflects any in-memory entity changes before the bulk operation runs. Use it when you've modified entities in the same transaction and need those changes visible to the bulk query.
Bulk deletes
deleteAllByStatus in Spring Data generates a query that loads the entities then deletes them individually, firing @PreRemove and @PostRemove lifecycle callbacks:
// DO NOT USE for bulk deletes — loads all matching entities
void deleteAllByStatus(OrderStatus status);
// Equivalent to: findAllByStatus(status).forEach(repository::delete)
// N+1 deletes: SELECT * FROM orders WHERE status = ?, then DELETE for each
For a bulk delete without lifecycle callbacks:
@Modifying(clearAutomatically = true)
@Query("DELETE FROM Order o WHERE o.status = :status")
int deleteByStatus(@Param("status") OrderStatus status);
This generates a single DELETE FROM orders WHERE status = ?. No entity loading, no lifecycle callbacks, no individual delete statements.
The cascade and orphan removal implications. Bulk deletes in JPQL bypass cascade operations. If Order has @OneToMany(cascade = CascadeType.REMOVE) to LineItem, the JPQL bulk delete removes orders but not their line items — foreign key constraint violations will occur unless line items are deleted first or the constraint has ON DELETE CASCADE at the database level.
Correct order for bulk deletes with cascade dependencies:
@Transactional
public int deleteExpiredOrders() {
// Delete children first
lineItemRepository.deleteByOrderStatus(OrderStatus.EXPIRED);
// Then delete parents
return orderRepository.deleteByStatus(OrderStatus.EXPIRED);
}
Or use a native query with CASCADE if the database schema defines it:
@Modifying
@Query(value = "DELETE FROM orders WHERE status = 'EXPIRED'", nativeQuery = true)
int deleteExpiredOrdersNative();
// Works if line_items has ON DELETE CASCADE on orders.id
Bulk inserts with saveAll and JDBC batch
saveAll(Iterable<T>) in Spring Data calls save() for each entity — it doesn't batch inserts by default. Each save() may execute a SELECT to determine whether to insert or update (merge semantics), followed by an INSERT.
For true batch inserts, configure Hibernate's JDBC batching:
spring:
jpa:
properties:
hibernate:
jdbc:
batch_size: 50 # batch 50 inserts per JDBC batch
order_inserts: true # sort inserts by entity type for better batching
order_updates: true # same for updates
id:
new_generator_mappings: true # required for sequence-based ID generation with batching
With batch_size = 50 and order_inserts = true, saveAll(1000 entities) generates 20 batched INSERT statements instead of 1000 individual ones.
Identity columns break batching. If your entity uses @GeneratedValue(strategy = GenerationType.IDENTITY) (auto-increment), Hibernate disables JDBC batching for that entity. The reason: identity columns return the generated ID only after the INSERT executes — Hibernate must execute each INSERT individually to get the ID back.
Switch to sequence-based ID generation for entities where batch insert performance matters:
@Entity
public class Order {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "order_seq")
@SequenceGenerator(name = "order_seq", sequenceName = "order_seq",
allocationSize = 50) // pre-fetches 50 IDs per sequence call
private Long id;
}
allocationSize = 50 fetches 50 IDs from the database sequence in one call. Inserting 50 entities requires one sequence call and one batched INSERT — dramatically faster than 50 individual inserts.
When to use JDBC directly
For the highest-throughput bulk operations, bypass Hibernate entirely and use JdbcTemplate or NamedParameterJdbcTemplate:
@Repository
public class OrderBulkRepository {
private final NamedParameterJdbcTemplate jdbcTemplate;
public int bulkInsert(List<CreateOrderRequest> requests) {
String sql = """
INSERT INTO orders (customer_id, status, total, created_at)
VALUES (:customerId, :status, :total, :createdAt)
""";
SqlParameterSource[] params = requests.stream()
.map(r -> new MapSqlParameterSource()
.addValue("customerId", r.customerId())
.addValue("status", OrderStatus.PENDING.name())
.addValue("total", r.total())
.addValue("createdAt", Instant.now()))
.toArray(SqlParameterSource[]::new);
return Arrays.stream(jdbcTemplate.batchUpdate(sql, params)).sum();
}
}
jdbcTemplate.batchUpdate() uses JDBC batch semantics — all inserts in a single round trip to the database. No entity lifecycle, no persistence context, no dirty checking, no ID generation overhead.
The tradeoff: JDBC operations bypass the persistence context entirely. Entities inserted via JDBC are not in the L1 cache — subsequent findById calls within the same transaction will query the database. L2 cache is also bypassed — manually evict if necessary.
Use JDBC for: importing large datasets (CSV import, data migration), high-frequency event recording (click tracking, audit logs), and any operation where the entity lifecycle (callbacks, cascade, L1 cache) adds overhead without value.
Bulk operations and audit fields
@PrePersist, @PreUpdate, @EntityListeners (JPA Auditing) — these fire only when entities are managed through the JPA lifecycle. Bulk @Modifying queries bypass them.
If created_at and updated_at fields are managed by JPA Auditing, a bulk update via @Modifying won't update updated_at. Two approaches:
Include the timestamp in the bulk query explicitly:
@Modifying
@Query("UPDATE Order o SET o.status = :status, o.updatedAt = :now WHERE o.id IN :ids")
int updateStatuses(@Param("status") OrderStatus status,
@Param("ids") List<Long> ids,
@Param("now") Instant now);
Let the database handle it with a trigger or default: DEFAULT now() in PostgreSQL or a BEFORE UPDATE trigger maintains updated_at automatically regardless of how the row is modified. This is the most reliable approach for audit fields in a codebase that mixes JPA and bulk SQL operations.
The decision hierarchy
Single record or small batches (< 100 records): standard JPA entity lifecycle. save(), delete(), entity mutation. Lifecycle callbacks, cascade, auditing all work correctly.
Medium batches (100–10,000 records): @Modifying @Query with JPQL. One SQL statement, no entity loading overhead. Add clearAutomatically = true if entities might be in the persistence context.
Large batches (10,000+ records) or high-frequency operations: JDBC batch operations via JdbcTemplate.batchUpdate(). Maximum throughput, minimum overhead. Handle audit fields and cache invalidation explicitly.
Bulk reads with streaming: @Query with Stream<T> return type — Hibernate streams entities through the persistence context without loading all at once:
@Query("SELECT o FROM Order o WHERE o.status = :status")
Stream<Order> streamByStatus(@Param("status") OrderStatus status);
// Usage — call within @Transactional, close the stream
@Transactional(readOnly = true)
public void processExpiredOrders(OrderStatus status) {
try (Stream<Order> orders = orderRepository.streamByStatus(status)) {
orders.forEach(this::processOrder);
}
}
Streaming avoids loading the entire result set into memory. Hibernate keeps a fixed-size persistence context window. Essential for processing large tables without heap exhaustion.