What a Deployment Strategy Actually Is and Why You Need One
by Eric Hanson, Backend Developer at Clean Systems Consulting
"We Just Deploy" Is a Strategy. A Bad One.
Ask a team how they deploy and the answer is usually mechanical: "We run kubectl set image and wait for the rollout" or "We trigger the pipeline and it updates the Auto Scaling Group." That describes the mechanics. It doesn't describe the strategy.
The strategy answers different questions: What happens to in-flight requests during the transition? How quickly does the new version replace the old one? When do you know it's safe to complete the rollout? What's the plan if you don't? What percentage of your users see the new version first, and which percentage?
Teams that can't answer these questions don't have a strategy — they have a deployment process that works until it doesn't, at which point they discover the answers to these questions under pressure, at 11pm, with a P1 incident timer running.
The Risk Surface of a Deployment
Every deployment exposes three risk windows:
The transition window — the period during which both old and new versions are running simultaneously. This is unavoidable in any non-disruptive deployment. During this window, a user's request might be handled by the old version or the new version depending on which instance the load balancer selects. If the new version has an incompatible API change or a data format change, users get inconsistent behavior.
The validation window — the period after the new version is fully deployed but before you're confident it's stable. Deployment succeeded, but is it correct? Does it handle load? Are error rates elevated? This window requires monitoring and clear criteria for what "healthy" means.
The rollback window — the period during which reverting to the previous version is straightforward. As time passes after a deployment, rollback becomes more complex: database migrations may have been applied, events may have been consumed, external state may have been mutated. A rollback strategy must account for how quickly this window closes.
The Four Main Strategies and Their Tradeoffs
Big bang (recreate) — shut down all old instances, start all new instances. Simplest to implement. Guarantees no mixed-version state during the transition. Also guarantees downtime and the longest rollback time (you have to spin up old instances again). Acceptable for internal tools or services with explicit maintenance windows. Not acceptable for production user-facing systems.
Rolling — replace old instances with new ones incrementally, a few at a time. The default in Kubernetes and ECS. No downtime. The transition window is the duration of the rollout (typically 5–15 minutes). The risk: mixed-version traffic is the default state during rollout. Requires that old and new versions are compatible during the overlap. Good default for stateless services with backward-compatible changes.
Blue-green — maintain two complete environments (blue and green). Switch traffic between them atomically via load balancer. Zero mixed-version traffic. Instant rollback (flip the load balancer back). The cost: twice the infrastructure during deployments and a more complex deployment orchestration. Best for services where mixed-version state is dangerous and rollback speed is critical.
Canary — route a small percentage of traffic to the new version, validate behavior, gradually increase the percentage. Slowest to complete a deployment. Most risk control. Catches issues before they affect the majority of users. Requires sophisticated traffic splitting (weighted routing in ALB, Istio, or Nginx) and automated metrics analysis to decide whether to proceed or roll back. Best for high-traffic services where catching a 0.1% error rate increase matters.
Choosing the Right Strategy
The choice is not a one-time architectural decision — it's a per-service, per-change decision driven by risk profile:
If the change is backward-compatible and stateless:
→ Rolling deployment, default Kubernetes behavior
If rollback speed is critical (financial transactions, auth):
→ Blue-green, with pre-deployment validation
If the change affects a high-traffic path and error rate sensitivity is high:
→ Canary, with automated promotion/rollback criteria
If there's a maintenance window and simplicity matters:
→ Big bang recreate
Most teams use rolling deployments as the default and blue-green for high-risk releases. Canary is the most operationally complex and requires mature observability before it's useful — you can't analyze canary metrics you're not collecting.
The Rollback Question
Every deployment strategy must have a paired rollback strategy. Not "we'd roll back by doing the reverse of the deployment" — but a specific, documented, tested procedure with a known time-to-complete.
For rolling deployments in Kubernetes: kubectl rollout undo deployment/myapp. Time to complete: same as the forward rollout, typically 5–10 minutes. Database migration caveat: if the new version ran a migration, the old version may not be compatible with the new schema.
For blue-green: point the load balancer back to the old environment. Time to complete: under 60 seconds. No migration caveat — old environment was running against the old schema.
The migration caveat is the most common reason teams can't roll back when they want to. Plan migrations to be backward-compatible before rollback is needed, not after the incident starts.
Know your strategy. Know your rollback. Deploy with confidence.