Your Transactions Are Bigger Than They Need to Be
by Eric Hanson, Backend Developer at Clean Systems Consulting
The transaction that does too much
You're reviewing a checkout flow and find one database transaction that: validates the cart, locks inventory rows, creates an order, creates order items, charges a payment (via an internal stored procedure that calls an external service), updates inventory, and sends a confirmation email trigger. The payment step can take 1–3 seconds. The entire transaction holds row locks on inventory for that entire time.
Every other user trying to purchase items in that inventory is queued for up to 3 seconds per checkout. At 50 concurrent checkouts, you have a lock queue problem. The fix is not more database capacity — it's narrowing the transaction scope.
What belongs inside a transaction
A transaction should contain only the operations that must be atomic together. Ask: if these two operations execute and the third one fails, is the data in an invalid state that can't be recovered? If yes, they belong together. If not, they probably don't need to be in the same transaction.
Correct: debit account A + credit account B must be atomic
Incorrect reason: "they're related, so they should be in one transaction"
The inventory example, restructured:
Transaction 1 (fast, ~10ms):
- Validate cart items exist and have sufficient stock
- Reserve inventory (UPDATE stock SET reserved = reserved + qty WHERE ...)
- Create order record in 'pending' state
- Create order items
External call (outside any transaction):
- Charge payment (~1–3 seconds)
Transaction 2 (fast, ~5ms):
- If payment succeeded: update order to 'confirmed', decrement actual stock
- If payment failed: update order to 'failed', release inventory reservation
Now inventory row locks are held for ~10ms per transaction instead of 3 seconds. Concurrency scales linearly.
The anti-patterns that inflate transaction scope
Including network calls inside transactions
# Wrong: HTTP call inside an open transaction
with db.transaction():
order = Order.create(user_id=user.id, total=cart.total)
payment_result = payment_service.charge(user.card, cart.total) # 1–3 seconds
if payment_result.success:
order.update(status='confirmed')
The database transaction holds locks for the duration of the HTTP call. If the payment service is slow or times out, your transaction is held open indefinitely (until the query timeout or the connection drops).
Fetching data inside a write transaction when it's not needed
# Wrong: read-only lookup inside the write transaction
with db.transaction():
product = Product.find(product_id) # Just a read
price = PricingRule.calculate(product) # Just a read, may be slow
Order.create(product=product, price=price) # The actual write
The reads don't need to be inside the transaction unless you need to lock the rows for consistency (e.g., you need to ensure the price doesn't change between reading it and using it — in which case use SELECT FOR UPDATE explicitly). In most cases, reads can happen before the transaction opens.
Batch operations in a single transaction
# Wrong: one transaction for all 100,000 records
with db.transaction():
for record in large_dataset:
Record.create(record)
If this fails at record 80,000, it rolls back all 100,000. The transaction is open for the full duration of the batch — which might be minutes. Every write page modified is held in the WAL. Replication lag accumulates. Autovacuum is blocked on those tables.
Batch in chunks:
BATCH_SIZE = 500
for chunk in chunks(large_dataset, BATCH_SIZE):
with db.transaction():
for record in chunk:
Record.create(record)
# Checkpoint: if this fails, restart from last successful chunk
500-row transactions commit every few hundred milliseconds. WAL is flushed incrementally. Replication stays current. A failure loses at most one batch.
Long-running transactions and VACUUM interference
In PostgreSQL, VACUUM cannot remove row versions that are still visible to any open transaction. A transaction open for 30 minutes prevents VACUUM from reclaiming 30 minutes' worth of dead rows across every table it has queried — not just the tables it's writing to.
Check for long-running transactions:
SELECT
pid,
now() - pg_stat_activity.xact_start AS duration,
query,
state
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
AND now() - pg_stat_activity.xact_start > INTERVAL '5 minutes'
ORDER BY duration DESC;
Any transaction open for more than a few minutes in an OLTP system is suspicious. Either it's doing something it shouldn't (external calls, user interaction, large batch), or it's an idle-in-transaction connection that forgot to commit.
Set idle_in_transaction_session_timeout in PostgreSQL to automatically terminate sessions that start a transaction and then go idle:
-- Terminate connections idle in a transaction for more than 60 seconds
SET idle_in_transaction_session_timeout = '60s';
-- Or set globally in postgresql.conf
The design rule
Before writing any transaction, identify the minimum set of operations that must be atomic. Put only those inside the transaction. Do reads, external calls, and non-critical notifications outside of it. If atomicity requires including something slow, find an architectural pattern (two-phase commit, saga, outbox pattern) rather than holding database locks across that slow operation.