Your Dockerfile Works But Your Image Is Bigger Than It Needs to Be
by Arif Ikhsanudin, Backend Developer
The image that "works" but ships 900MB of nothing
Your Spring Boot app is maybe 50MB of compiled bytecode and dependencies. Yet docker images shows 900MB. Your colleague's Python service? 1.2GB. CI takes four minutes just to push the thing. Nobody files a ticket because it "works," and nobody wants to touch the Dockerfile because it's been there since the project started.
This is the image bloat problem, and it compounds: slower pushes, slower pulls, bigger attack surface, higher registry storage costs, and longer cold starts in environments like AWS Fargate or Cloud Run where images are pulled on every scale event.
Here's what's actually inflating your image, and how to fix it without rewriting everything.
You're installing more than you need
The most common offender is the base image choice combined with unrestricted package installation.
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \
curl \
wget \
vim \
git \
build-essential \
python3 \
openjdk-17-jdk
ubuntu:22.04 is ~77MB compressed. After that package list, you're over 500MB before your app is even copied in. Most of those tools — vim, wget, git — exist because someone needed them once to debug a container interactively and never cleaned it up.
The fix is two-pronged: start with a minimal base, and install only what the runtime actually needs.
For JVM apps:
FROM eclipse-temurin:17-jre-alpine
eclipse-temurin:17-jre-alpine is around 180MB compressed — and it's the JRE, not the full JDK, which means no compiler, no javac, no tools your app doesn't need at runtime. For a Spring Boot fat JAR, this is all you need.
For Python:
FROM python:3.12-slim
python:3.12-slim is ~45MB compressed versus ~330MB for python:3.12. The difference is Debian versus Debian-slim — the full image includes things like gcc and documentation that a pure runtime environment doesn't need.
Build artifacts are leaking into the image
Another major source of bloat: the build process itself.
FROM maven:3.9-eclipse-temurin-17
WORKDIR /app
COPY . .
RUN mvn package -DskipTests
CMD ["java", "-jar", "target/app.jar"]
This works. It also ships Maven's entire local repository (~300–500MB depending on your dependency tree) along with your source code, test files, and any intermediate build artifacts. All of that is in the final image because there's only one stage.
Multi-stage builds solve this cleanly:
FROM maven:3.9-eclipse-temurin-17 AS build
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src ./src
RUN mvn package -DskipTests
FROM eclipse-temurin:17-jre-alpine
WORKDIR /app
COPY --from=build /app/target/app.jar app.jar
ENTRYPOINT ["java", "-jar", "app.jar"]
The final image contains exactly one file: your JAR. Maven, the source tree, the local .m2 cache — none of it makes it in. A typical Spring Boot app goes from 700–900MB down to 200–250MB with this change alone.
Package manager caches are not cleaned up
Even when your base image is lean, package installation leaves caches behind:
RUN apt-get update && apt-get install -y curl
This leaves the apt cache in /var/cache/apt/archives. It doesn't get cleared automatically. The fix is to clean in the same RUN layer:
RUN apt-get update && apt-get install -y --no-install-recommends curl \
&& rm -rf /var/lib/apt/lists/*
The --no-install-recommends flag prevents apt from pulling in suggested packages. The rm -rf /var/lib/apt/lists/* removes the package list cache. Both need to happen in the same RUN instruction — if you run the cleanup in a separate layer, Docker has already committed the cache to the previous layer and cleaning it later does nothing to the final image size.
Same principle applies to Alpine's apk:
RUN apk add --no-cache curl
--no-cache tells apk not to create the cache in the first place.
Hidden files you're copying in
If you don't have a .dockerignore file, COPY . . copies everything: your .git directory, node_modules if it exists, target/ or build/ directories, local .env files, test fixtures, IDE config.
A minimal .dockerignore:
.git
.gitignore
target/
build/
*.log
.env
.DS_Store
node_modules/
On a project with a populated target/ directory and a .git folder with history, the build context sent to the Docker daemon can be 500MB+. Trimming this also speeds up builds because the daemon doesn't have to upload what it doesn't need.
How to diagnose what's in your image right now
Before optimizing, measure. Two tools are worth knowing:
docker history shows layer sizes:
docker history your-image:latest
This tells you which RUN instruction created a 200MB layer so you know where to focus.
dive (github.com/wagoodman/dive) is an interactive TUI that lets you browse the filesystem at each layer and see which files are eating space. Run it once on a bloated image and you'll immediately see what doesn't belong.
The tradeoffs worth knowing
Smaller base images often mean fewer debugging tools in production. You won't have curl to health-check a sidecar, no ps to inspect processes, no shell at all with distroless images. This is actually desirable from a security standpoint — less tooling means less attack surface — but it makes live debugging harder.
The practical middle ground: use a minimal runtime image in production, keep a debug variant (with a tag like :debug) that extends it with tools, and use that variant only when actively diagnosing a production issue.
What to do today
Run docker history your-image:tag on whatever you're shipping right now. If any layer is over 100MB and it's not your application runtime, that's your first target. Add a .dockerignore file if you don't have one — that's five minutes and zero risk. Then move your build to a multi-stage Dockerfile if it isn't already.
The goal isn't a perfect image. It's a measurably smaller one by the end of this sprint.