Metrics and Alerts in Microservices: What You Should Actually Be Watching

by Eric Hanson, Backend Developer at Clean Systems Consulting

The monitoring gap between infrastructure and user experience

Your infrastructure metrics look fine. CPU at 30%, memory at 60%, pods healthy. But your error rate has been at 8% for the last four minutes and no alert has fired. Users are seeing failures. Your on-call engineer found out from a Slack message, not a PagerDuty notification.

The problem is that infrastructure metrics (CPU, memory, disk) don't directly reflect user experience. A service can consume 80% CPU and serve traffic perfectly. A service can consume 20% CPU and return 500 errors for 15% of requests. Alerting on infrastructure thresholds while ignoring user-facing signal metrics means your alerting is optimized for finding noisy servers, not broken services.

The four golden signals

Google's SRE Book defines four golden signals as the primary metrics for service health. These are the metrics that should drive your alerts:

Latency: how long requests take to process. Specifically, P50 (median), P95, and P99. P99 is what your most sensitive users experience. If P99 exceeds your SLA threshold, users are having a bad time even if P50 looks fine.

Traffic: requests per second (or other throughput measure appropriate for your service). Traffic metrics establish the baseline that makes other metrics meaningful. An error rate of 100 errors/minute is trivial at 100,000 requests/minute and catastrophic at 200 requests/minute.

Errors: the rate of failed requests. Distinguish between user errors (4xx, particularly 400, 422) which may be legitimate, and server errors (5xx) which always indicate a problem. Alert on server error rate.

Saturation: how close to capacity the service is. Connection pool utilization, queue depth, thread pool utilization. High saturation precedes failures — a connection pool at 95% utilization is about to cause request queuing and latency spikes.

# Prometheus alerts: golden signals for Order Service
groups:
- name: order-service
  rules:
  - alert: OrderServiceHighErrorRate
    expr: |
      rate(http_server_requests_seconds_count{
        service="order-service", status=~"5.."
      }[5m])
      /
      rate(http_server_requests_seconds_count{
        service="order-service"
      }[5m]) > 0.01
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "Order Service error rate above 1% for 2 minutes"

  - alert: OrderServiceHighLatency
    expr: |
      histogram_quantile(0.99,
        rate(http_server_requests_seconds_bucket{
          service="order-service"
        }[5m])
      ) > 2.0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Order Service P99 latency above 2s"

RED versus USE: choosing the right model per layer

RED (Rate, Errors, Duration): the right model for request-handling services — anything with an API. Rate = requests/sec, Errors = error rate, Duration = latency. Apply this to every service endpoint.

USE (Utilization, Saturation, Errors): the right model for resources — databases, connection pools, queues, thread pools. Utilization = how busy, Saturation = how much work is queuing, Errors = error rate in the resource. Apply this to your infrastructure components.

For a database connection pool, USE gives you: Utilization (active connections / max connections), Saturation (requests waiting for a connection), Errors (connection acquisition failures). A connection pool at 90% utilization with a growing saturation queue is about to become a bottleneck. Alerting on this proactively prevents the cascade where connection pool exhaustion causes upstream service latency, which exhausts upstream thread pools.

Consumer lag for event-driven services

Services that consume from Kafka have an additional critical metric: consumer lag — the number of messages in a partition that have not yet been processed. Increasing consumer lag means your consumer is falling behind producers.

# Alert on Kafka consumer lag
- alert: KafkaConsumerLagHigh
  expr: |
    kafka_consumer_group_lag{
      group="inventory-service",
      topic="orders.confirmed"
    } > 1000
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Inventory Service is 1000+ messages behind on orders.confirmed"

Consumer lag growth indicates either the consumer is too slow (processing bottleneck), the producer is generating more volume than expected (load spike), or the consumer is down (lag grows rapidly to infinity). Each has a different remediation.

Alerting philosophy: alert on symptoms, not causes

Alert on user-visible symptoms. Root cause analysis happens after the alert fires — not before.

Wrong: alert when CPU > 80%. CPU at 80% might not affect users at all. This produces false positives.

Right: alert when P99 latency > SLA threshold or error rate > X%. These directly impact users. This is what matters.

Causes (high CPU, connection pool saturation, slow queries) are visible in dashboards for diagnosis after the symptom alert fires. Alerting on causes without symptoms produces noisy, low-signal-to-noise-ratio alerts that engineers learn to ignore. Ignoring alerts is worse than having none.

Keep your alert count low and actionable. An on-call rotation with twenty noisy alerts that fire regularly produces alert fatigue. Alert fatigue produces missed incidents. Three to five high-signal alerts per service that fire infrequently and always require action are worth more than twenty alerts that fire several times a week and are usually safe to ignore.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

The Monolith Is Not Your Enemy. Bad Architecture Is.

The backlash against monoliths is a category error. Deployment model is not the same as architecture quality — and distributed systems add genuine complexity that a well-structured monolith avoids entirely.

Read more

Secrets in Your Pipeline Are a Security Risk You Cannot Ignore

CI/CD pipelines require credentials to do their job — and that makes them a high-value target. How you store, inject, and rotate those secrets determines whether your pipeline is a security asset or a liability.

Read more

The Problem With “John”: The Developer Who Built Everything but Documented Nothing

Meet John: the developer who delivers miracles but leaves the team in a silent struggle. His code shines to managers, but living inside it is a minefield for other developers.

Read more

Why Adding More Developers Doesn’t Always Make Projects Faster

Hiring more developers feels like the fastest way to speed up a project. In reality, it often slows things down before it gets better.

Read more