Your Dockerfile Works But Your Image Is Bigger Than It Needs to Be

by Arif Ikhsanudin, Backend Developer

The image that "works" but ships 900MB of nothing

Your Spring Boot app is maybe 50MB of compiled bytecode and dependencies. Yet docker images shows 900MB. Your colleague's Python service? 1.2GB. CI takes four minutes just to push the thing. Nobody files a ticket because it "works," and nobody wants to touch the Dockerfile because it's been there since the project started.

This is the image bloat problem, and it compounds: slower pushes, slower pulls, bigger attack surface, higher registry storage costs, and longer cold starts in environments like AWS Fargate or Cloud Run where images are pulled on every scale event.

Here's what's actually inflating your image, and how to fix it without rewriting everything.

You're installing more than you need

The most common offender is the base image choice combined with unrestricted package installation.

FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
    curl \
    wget \
    vim \
    git \
    build-essential \
    python3 \
    openjdk-17-jdk

ubuntu:22.04 is ~77MB compressed. After that package list, you're over 500MB before your app is even copied in. Most of those tools — vim, wget, git — exist because someone needed them once to debug a container interactively and never cleaned it up.

The fix is two-pronged: start with a minimal base, and install only what the runtime actually needs.

For JVM apps:

FROM eclipse-temurin:17-jre-alpine

eclipse-temurin:17-jre-alpine is around 180MB compressed — and it's the JRE, not the full JDK, which means no compiler, no javac, no tools your app doesn't need at runtime. For a Spring Boot fat JAR, this is all you need.

For Python:

FROM python:3.12-slim

python:3.12-slim is ~45MB compressed versus ~330MB for python:3.12. The difference is Debian versus Debian-slim — the full image includes things like gcc and documentation that a pure runtime environment doesn't need.

Build artifacts are leaking into the image

Another major source of bloat: the build process itself.

FROM maven:3.9-eclipse-temurin-17
WORKDIR /app
COPY . .
RUN mvn package -DskipTests
CMD ["java", "-jar", "target/app.jar"]

This works. It also ships Maven's entire local repository (~300–500MB depending on your dependency tree) along with your source code, test files, and any intermediate build artifacts. All of that is in the final image because there's only one stage.

Multi-stage builds solve this cleanly:

FROM maven:3.9-eclipse-temurin-17 AS build
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests

FROM eclipse-temurin:17-jre-alpine
WORKDIR /app
COPY --from=build /app/target/app.jar app.jar
ENTRYPOINT ["java", "-jar", "app.jar"]

The final image contains exactly one file: your JAR. Maven, the source tree, the local .m2 cache — none of it makes it in. A typical Spring Boot app goes from 700–900MB down to 200–250MB with this change alone.

Package manager caches are not cleaned up

Even when your base image is lean, package installation leaves caches behind:

RUN apt-get update && apt-get install -y curl

This leaves the apt cache in /var/cache/apt/archives. It doesn't get cleared automatically. The fix is to clean in the same RUN layer:

RUN apt-get update && apt-get install -y --no-install-recommends curl \
    && rm -rf /var/lib/apt/lists/*

The --no-install-recommends flag prevents apt from pulling in suggested packages. The rm -rf /var/lib/apt/lists/* removes the package list cache. Both need to happen in the same RUN instruction — if you run the cleanup in a separate layer, Docker has already committed the cache to the previous layer and cleaning it later does nothing to the final image size.

Same principle applies to Alpine's apk:

RUN apk add --no-cache curl

--no-cache tells apk not to create the cache in the first place.

Hidden files you're copying in

If you don't have a .dockerignore file, COPY . . copies everything: your .git directory, node_modules if it exists, target/ or build/ directories, local .env files, test fixtures, IDE config.

A minimal .dockerignore:

.git
.gitignore
target/
build/
*.log
.env
.DS_Store
node_modules/

On a project with a populated target/ directory and a .git folder with history, the build context sent to the Docker daemon can be 500MB+. Trimming this also speeds up builds because the daemon doesn't have to upload what it doesn't need.

How to diagnose what's in your image right now

Before optimizing, measure. Two tools are worth knowing:

docker history shows layer sizes:

docker history your-image:latest

This tells you which RUN instruction created a 200MB layer so you know where to focus.

dive (github.com/wagoodman/dive) is an interactive TUI that lets you browse the filesystem at each layer and see which files are eating space. Run it once on a bloated image and you'll immediately see what doesn't belong.

The tradeoffs worth knowing

Smaller base images often mean fewer debugging tools in production. You won't have curl to health-check a sidecar, no ps to inspect processes, no shell at all with distroless images. This is actually desirable from a security standpoint — less tooling means less attack surface — but it makes live debugging harder.

The practical middle ground: use a minimal runtime image in production, keep a debug variant (with a tag like :debug) that extends it with tools, and use that variant only when actively diagnosing a production issue.

What to do today

Run docker history your-image:tag on whatever you're shipping right now. If any layer is over 100MB and it's not your application runtime, that's your first target. Add a .dockerignore file if you don't have one — that's five minutes and zero risk. Then move your build to a multi-stage Dockerfile if it isn't already.

The goal isn't a perfect image. It's a measurably smaller one by the end of this sprint.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

API Gateways in Spring Boot — What They Do, When You Need One, and How to Configure Spring Cloud Gateway

An API gateway is a single entry point that handles cross-cutting concerns — routing, authentication, rate limiting, and observability — so individual services don't have to. Spring Cloud Gateway is the Spring-native implementation. Here is what it solves and how to configure it.

Read more

Secrets in Docker: Stop Hardcoding Them in Your Compose File

Hardcoding secrets in Docker Compose files or passing them as plaintext environment variables is the most common secret management failure in containerized applications. The alternatives are simpler than most teams assume.

Read more

Why Backend Developers Often Carry the Most Responsibility in a Team

Backend developers rarely get the spotlight, but they often hold the threads that keep an entire system running. Their work affects performance, reliability, and scalability.

Read more

Why Asynchronous Work Is Essential for Remote Teams

Working across time zones can feel impossible. Asynchronous work makes collaboration smoother, without the chaos of constant real-time meetings.

Read more