Single Points of Failure Are Hiding in Your System Right Now

January 16, 2026

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Obvious Ones and the Hidden Ones

Every engineer knows to look for obvious single points of failure: a single database primary, a single application server, a load balancer with no standby. These are easy to spot in an architecture diagram and easy to remediate with redundancy.

The failures that actually cause incidents are the ones that do not appear in the architecture diagram.

Your deployment pipeline. If all deployments run through a single CI/CD server or pipeline with no redundancy, a failure during deployment leaves you with code that cannot be shipped. If a critical fix needs to go out during that failure, you have a problem. Managed CI/CD (GitHub Actions, CircleCI) with redundancy built in is standard. Self-hosted Jenkins with no HA configuration is a hidden SPOF.

Your DNS provider. A misconfiguration or provider outage at your DNS registrar or DNS hosting provider makes your entire domain unreachable. All your application redundancy becomes irrelevant if clients cannot resolve your domain. Using a DNS provider with high availability (Cloudflare DNS, Route 53) and maintaining secondary DNS or NS redundancy is a standard mitigation. Keeping all domains at a single registrar without 2FA on the registrar account is a hidden SPOF.

Configuration and secrets management. If your application fetches configuration or secrets from a single service at startup — a self-hosted Vault instance, an EC2 instance running a config server — that service becomes a SPOF for all deployments and restarts. Instances cannot start without it. It fails; your auto-scaling group cannot replace unhealthy instances.

Your CDN or TLS certificate. A certificate expiration or a CDN misconfiguration can take down HTTPS access to your entire service in minutes. Automated certificate renewal (Let's Encrypt with certbot, AWS Certificate Manager) and CDN configuration in version-controlled infrastructure code reduce this risk.

The Non-Infrastructure SPOFs

Implicit dependency on a single team member. The engineer who built the payment integration is the only one who understands it. They go on vacation. Payment processing has an issue. This is an organizational SPOF. It is not visible in any infrastructure diagram.

A single large database transaction that locks tables. A background job that runs a long-running UPDATE on a large table without proper batching takes a table lock, stalling concurrent reads and writes. The job is infrequent enough that nobody noticed — until it ran on the first day of the month when traffic was highest. Not a hardware SPOF, but functionally equivalent to one.

Centralized session storage without HA. Sessions stored in a single Redis instance with no replication mean a Redis failure logs out all active users simultaneously. Redis Sentinel or Redis Cluster provides HA.

The Audit Process

A useful exercise: for each critical user flow (checkout, authentication, data submission), trace the complete dependency chain and ask: "what single component failure stops this flow from working?"

# Example: checkout flow dependency trace

User -> Load Balancer -> App Server -> Session (Redis) -> Database (Primary)
                                    -> Payment API (external)
                                    -> Email Service (external)
                                    -> Fraud Check (internal service)

SPOFs identified:
- Session Redis: single instance -> add Redis Sentinel
- Database Primary: single AZ -> enable RDS Multi-AZ
- Payment API: no circuit breaker -> add circuit breaker + fallback message
- Fraud Check: synchronous in checkout -> evaluate async post-checkout
- Email Service: if checkout fails when email fails -> move to async queue

Do this for your top three critical flows. You will find at least one SPOF per flow that is not on your architecture diagram. Fix the highest-impact ones before your next incident.

Our offices

Follow us

Single Points of Failure Are Hiding in Your System Right Now

The Obvious Ones and the Hidden Ones

The Non-Infrastructure SPOFs

The Audit Process

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

Recovering From a Failed Software Project

How I Make Architecture Decisions Without Endless Meetings

CDN Is Not Just for Frontend. Backend Developers Need to Understand It Too.

Java Code Quality in Practice — The Rules That Help and the Ones That Don't