Why Your System Is Slower Than Expected Even Under Normal Load

by Arif Ikhsanudin, Backend Developer

Normal Load, Abnormal Latency

The system is handling 200 requests per minute. Your servers are at 30% CPU. The database connection pool is mostly idle. But p95 response time is 2.4 seconds, and users are complaining. Adding more instances does not help — the problem is not throughput, it is latency per request.

This pattern has a consistent set of root causes. Most of them are not visible at the infrastructure level. They are visible in query plans, profiler output, and application code.

The Most Common Causes

N+1 queries. A single endpoint issues one query to fetch a list, then one query per item for related data. With 50 items, that is 51 queries where 2 would suffice. In development, with a database on localhost and a dataset of 20 rows, this takes 5ms total. In production, with 30ms average query latency and 200 items, it is 6 seconds.

Finding N+1 queries requires query logging or an APM that counts queries per request. In Rails, the Bullet gem flags them. In Django, Django Debug Toolbar counts them. In any system, a spike in query count per request is the signal.

-- N+1 pattern (pseudocode):
-- Query 1: SELECT * FROM orders WHERE user_id = $1
-- For each order:
--   Query 2..N: SELECT * FROM order_items WHERE order_id = $1

-- Fix: use a JOIN or a second query with IN clause
SELECT o.*, oi.*
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.user_id = $1;

Missing indexes on filter and join columns. A query plan that includes Seq Scan on a large table is almost always the problem. PostgreSQL's EXPLAIN ANALYZE makes this explicit — it shows the plan, the actual row estimates, and the time spent at each node.

-- Find slow queries in PostgreSQL:
SELECT query, mean_exec_time, calls, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Check if an index would help:
EXPLAIN ANALYZE SELECT * FROM events WHERE user_id = 123 AND created_at > now() - interval '7 days';
-- Look for "Seq Scan" on a large table -- that's your index gap

Synchronous calls to slow dependencies. An endpoint that makes a synchronous HTTP call to a third-party API — an address validation service, a fraud scoring API, a mapping provider — is bounded by that API's latency. If the API averages 800ms, the endpoint averages at least 800ms. This is not your infrastructure problem. It is a coupling problem.

Solutions: cache the third-party response for appropriate TTLs, move the call to a background job if the result is not needed immediately, or add a circuit breaker with a fallback response if the service is non-critical.

Connection acquisition time. If your database connection pool is too small relative to concurrent request volume, threads spend time waiting for a connection before the query even starts. This does not show in database metrics — the database is idle, waiting for queries. It shows in application-level profiling as time spent before the query.

PgBouncer with transaction-mode pooling allows many application threads to share a small number of database connections, because connections are released during inter-query idle time. A system with 200 application threads can operate comfortably against a pool of 20 database connections.

Serialization and deserialization overhead. Parsing large JSON payloads, deserializing wide database rows where only a few columns are used, and rendering complex templates all consume CPU that shows up as application latency without obvious cause. SELECT * when you need three columns is wasteful. Selecting only the columns you need reduces row size, reduces network transfer, and reduces deserialization cost — often meaningfully for wide tables with large text or JSON columns.

The Diagnostic Sequence

  1. Enable query logging with slow query threshold (log queries over 100ms)
  2. Run EXPLAIN ANALYZE on the top 5 slowest queries
  3. Profile the application — New Relic, Datadog APM, or a language-level profiler — to see where wall time is spent per request
  4. Check connection acquisition time in your pool metrics
  5. Look at external call latency — what is p95 latency for each external dependency?

Work through this list before assuming the system needs more infrastructure. In most cases, one or two of these is the cause, and fixing it brings latency to acceptable levels without any new infrastructure.

The answer is rarely "add more servers." The answer is usually "fix the query" or "stop making that external call synchronously."

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

What a Useful API Error Response Actually Looks Like

Most API error responses are structurally incomplete. Here is a concrete template for what to include, why each field earns its place, and what to leave out.

Read more

The Hidden Complexity of Backend Systems

Why “it’s just an API” is usually a massive understatement (and where the real work actually happens)

Read more

What Clients Often Get Wrong When Outsourcing Backend Development

“We just need someone to build the backend.” That sentence sounds simple — until reality shows up.

Read more

Docker Networking Is Confusing Until You Understand This One Thing

Most Docker networking confusion comes from conflating three distinct namespaces: how containers reach each other, how the host reaches containers, and how containers reach the outside world. Once you separate those three, the rules become predictable.

Read more