Why Your Docker Image Works Locally But Breaks in Production

by Eric Hanson, Backend Developer at Clean Systems Consulting

The container that passes CI and fails in ECS

Your service works in local Docker Compose. It builds clean in CI. The image gets pushed to ECR. It deploys to ECS Fargate, and the health check fails. The logs show the application started but the health endpoint never responds. You spend two hours ssh-ing into nothing (Fargate doesn't have ssh), reading CloudWatch logs, and eventually discover the issue is a read-only filesystem mount the app was trying to write to.

This is the category of Docker problem that doesn't show up until production: the image is fine, the Dockerfile is fine, but the environment the image runs in is different in ways you didn't account for.

Here's a map of the most common mismatches, and how to close them.

Architecture: the ARM/AMD64 gap

If you develop on an Apple M-series Mac, you build ARM64 images by default. If production runs on x86-64 (most cloud instances, most CI runners), the image you built locally won't run in production — or worse, it will run via emulation and behave differently.

Verify your image architecture:

docker inspect your-image:tag | grep Architecture

Build for the production platform explicitly:

docker build --platform linux/amd64 -t your-image:tag .

Or use docker buildx for multi-platform builds that produce manifests supporting both:

docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t your-registry/your-image:tag \
  --push .

In CI, always set --platform linux/amd64 (or whatever your production target is) explicitly. Don't let the runner's native architecture determine the output architecture.

File permissions and user mismatch

Locally, Docker often runs as root or with a user that matches your laptop's UID. In production environments — Kubernetes with runAsNonRoot: true, ECS task definitions with a user field, Fargate with restricted execution — the container user may differ.

If your application writes to a directory inside the container that was created by root during the build, a non-root runtime user will get permission denied errors.

# Creates /app as root, copies files as root
FROM node:20-alpine
WORKDIR /app
COPY --chown=node:node . .
USER node

The --chown flag on COPY sets ownership at copy time. Do this for all COPY instructions when you intend to run as non-root. Also:

RUN mkdir -p /app/logs /app/tmp \
    && chown -R node:node /app/logs /app/tmp

Create any directories your application writes to during build, set ownership explicitly, then switch to the non-root user.

Environment variables: present locally, absent in production

In local development, environment variables come from a .env file loaded by Docker Compose or a local shell profile. In production, they come from Kubernetes secrets, ECS task definition environment fields, or a secrets manager at startup.

The failure mode: a variable is set in your local .env but missing from the production environment config. The application starts, reaches the code path that uses the variable, and either crashes or behaves unexpectedly.

Fail fast at startup for required variables:

// Node.js
const required = ['DATABASE_URL', 'JWT_SECRET', 'PORT'];
for (const key of required) {
  if (!process.env[key]) {
    console.error(`Missing required environment variable: ${key}`);
    process.exit(1);
  }
}
// Spring Boot — fail fast with @Value
@Value("${database.url:#{null}}")
private String databaseUrl;

@PostConstruct
public void validate() {
    if (databaseUrl == null) {
        throw new IllegalStateException("database.url must be configured");
    }
}

An application that crashes at startup with a clear error message (Missing required environment variable: DATABASE_URL) is infinitely easier to diagnose than one that starts, fails silently, and reports a 500 response three requests later.

Resource limits: unlimited locally, constrained in production

Local Docker runs don't have memory or CPU limits unless you explicitly set them. Production environments almost always do — Kubernetes resource limits, ECS task definition memory, Fargate task size.

The failure mode: your JVM application uses up to 4GB heap locally. In production it's limited to 1GB. The JVM's default heap sizing is based on the host's total memory, not the container's limit. In older JVM versions (pre-11), the JVM would ignore container memory limits entirely and allocate heap based on host RAM, leading to OOMKilled containers.

For JVM applications in containers, always set explicit GC and heap options:

ENV JAVA_OPTS="-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=50.0"
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]

-XX:+UseContainerSupport (default since JDK 8u191) makes the JVM respect container memory limits. -XX:MaxRAMPercentage=75.0 sets heap to 75% of the container's memory limit, leaving headroom for the JVM's off-heap memory and the OS.

Test locally with the same limits as production:

docker run --memory=512m --cpus=0.5 your-image:tag

If the app fails under these constraints, you want to know before production does.

Filesystem: writable locally, read-only in production

Kubernetes securityContext.readOnlyRootFilesystem: true mounts the container root filesystem as read-only. If your application writes anywhere inside the container filesystem (temp files, log files, PID files, JVM crash dumps), it will fail.

Common offenders:

  • Log files written to a path like /app/logs/
  • JVM's -XX:+HeapDumpOnOutOfMemoryError writing to the working directory
  • Applications writing temp files to /tmp

Solutions:

  • Write logs to stdout/stderr, not files (let the orchestrator handle log collection)
  • Mount a writable volume for any path that needs writes: /tmp, /app/logs, etc.
  • Configure JVM heap dumps to a mounted volume path

In your Kubernetes deployment:

securityContext:
  readOnlyRootFilesystem: true
volumeMounts:
  - name: tmp
    mountPath: /tmp
  - name: logs
    mountPath: /app/logs
volumes:
  - name: tmp
    emptyDir: {}
  - name: logs
    emptyDir: {}

Test this locally:

docker run --read-only --tmpfs /tmp your-image:tag

If the application starts cleanly under --read-only, it will work with readOnlyRootFilesystem: true in Kubernetes.

Networking: localhost means something different

In Docker Compose, services reach each other by service name. postgresql://postgres:5432/mydb works because Compose creates a network and registers DNS for service names. Your application assumes the same pattern in production.

In production (Kubernetes, ECS), the networking model is different: services are reached via cluster DNS (myservice.namespace.svc.cluster.local) or environment-injected service endpoints, not Compose service names.

The fix is ensuring your application's service endpoints are fully configurable via environment variables and that local defaults don't leak into production configs. Never hardcode localhost or Compose service names in application code. Everything that varies between environments goes into environment variables.

Close the gap intentionally

Add a docker run test to your CI pipeline that mimics production constraints before the image is pushed:

docker run \
  --read-only \
  --tmpfs /tmp \
  --memory=512m \
  --cpus=0.5 \
  --user 1001:1001 \
  --env-file .env.test \
  --platform linux/amd64 \
  your-image:tag \
  /bin/sh -c "echo 'startup check passed'"

This catches the most common environment mismatches before the image reaches a real environment. Not everything — but enough to stop the "works locally, fails in production" class of incidents before they happen.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Why Your Containers Can't Talk to Each Other

Container-to-container communication failures almost always trace back to a small set of root causes: wrong hostname, containers on different networks, or a service not actually listening on the expected address. Here's how to diagnose each one.

Read more

Stop Writing Loops When SQL Aggregations Can Do the Work

Fetching rows and aggregating in application code is slower, uses more memory, and is harder to maintain than letting the database aggregate at the source — yet this pattern persists because developers reach for familiar imperative constructs instead of SQL aggregations.

Read more

How I Handle File Uploads in Rails with Active Storage

Active Storage works well out of the box and quietly fails at scale. Here is how to configure it correctly for production, avoid the common traps, and extend it where the defaults fall short.

Read more

Distributed Caching With Redis in Spring Boot — Beyond the Basics

Spring Boot's Redis cache integration works with minimal configuration. The decisions that matter — serialization format, key design, eviction policy, and how to handle cache-aside vs read-through patterns — require deliberate choices that affect correctness and performance under load.

Read more