Choosing the Right Base Image Is More Important Than You Think
by Eric Hanson, Backend Developer at Clean Systems Consulting
The decision that shapes everything downstream
You're writing a new Dockerfile. Without much deliberation, you type:
FROM node:20
This pulls a Debian bookworm image with Node.js 20, npm, yarn, and a full Debian userland — around 340MB compressed, 1.1GB uncompressed. Your Node.js application is probably 50MB of code and 150MB of dependencies. You're shipping 900MB more than your app needs, and that ratio stays roughly constant through every build and every deploy.
The base image selection is the foundation of everything that comes after — image size, vulnerability exposure, available system tools, how often you need security updates, and whether your build is reproducible. Treating it as a default rather than a decision is a mistake.
The main base image families and what they actually mean
Full official images (node:20, python:3.12, openjdk:17):
These are Debian-based with the full runtime environment plus development headers. They're designed to be convenient: you can do almost anything without installing additional packages. That convenience comes with size (300MB+ compressed) and a large vulnerability surface. Every package in the base image is a potential CVE, and Debian full images ship a lot of packages.
-slim variants (node:20-slim, python:3.12-slim):
Debian-based but with non-essential packages removed. Roughly 30–50% smaller than the full image. No development headers, no build tools, fewer system utilities. The right choice when you need a Debian base (for glibc compatibility or specific Debian packages) but don't need a full development environment.
Alpine-based (node:20-alpine, python:3.12-alpine):
Based on Alpine Linux, which uses musl libc instead of glibc and the apk package manager. Alpine images are dramatically smaller — node:20-alpine is ~60MB compressed versus ~340MB for node:20. The attack surface is proportionally smaller: fewer packages means fewer CVEs.
The tradeoff: musl libc is not drop-in compatible with glibc. Native extensions compiled against glibc won't run on Alpine. Python packages with C extensions sometimes have Alpine-specific issues. Node.js itself works fine on Alpine; npm packages with native addons may not.
Distroless (gcr.io/distroless/nodejs20, gcr.io/distroless/java17):
Google's distroless images contain only the language runtime and its direct dependencies — no shell, no package manager, no utilities at all. They're the most minimal option for runtime images and have the smallest vulnerability surface by a significant margin.
The tradeoff: debugging is difficult because there's no shell to exec into. You need a separate debug image variant (gcr.io/distroless/java17:debug) for troubleshooting. Some application bootstrap patterns that exec shell scripts break entirely. Use distroless when you have the operational maturity to work without in-container debugging.
scratch:
An empty base, literally nothing. The only viable target for Go binaries with CGO disabled (which produces fully static binaries), or other fully self-contained executables.
FROM scratch
COPY --from=build /app/server /server
CMD ["/server"]
If your binary doesn't need any OS facilities, this produces the smallest possible image and has zero inherited vulnerabilities.
The JVM base image landscape
Java deserves specific attention because there are several competing image families:
eclipse-temurin: The Eclipse Foundation's distribution of OpenJDK. Well-maintained, available for multiple JDK versions in JDK and JRE variants on Alpine, Debian, and Ubuntu.amazoncorretto: Amazon's OpenJDK distribution, optimized for AWS workloads, AL2023-based.azul/zulu: Azul Systems' OpenJDK, with long support windows.openjdk: The official Docker Hub image, deprecated since 2022. Avoid new usage.
For most teams: eclipse-temurin:17-jre-alpine for JDK 17 or eclipse-temurin:21-jre-alpine for JDK 21. It's the JRE (not JDK), Alpine-based, and maintained by a neutral foundation. amazoncorretto is a reasonable alternative if you're heavily invested in AWS and want Amazon's support window.
Explicitly: use JRE images for runtime, not JDK images. The JDK includes javac, the compiler, and development tools that serve no purpose in a production container and add ~100MB and additional CVE surface.
Tagging strategy and reproducibility
FROM node:20 tracks the latest patch version of Node 20. This means your builds can change between Monday and Tuesday if a new patch is released. For most teams this is acceptable; for regulated environments or strict reproducibility requirements, pin to a specific version:
FROM node:20.12.2-alpine3.19
For absolute reproducibility, pin to the image digest:
FROM node:20.12.2-alpine3.19@sha256:bf77dc26e48ea95fca9d1aceb5acfa69d2e546b765ec2abfb502975f1a2d4def
The digest guarantees that no matter what the tag points to in the future, you build against this exact byte sequence. Renovate and Dependabot both support Dockerfile pin updates — you can have reproducibility without manually tracking versions.
Vulnerability surface: why fewer packages matters
When a CVE affects a package in your base image, your security team gets an alert, your scanner reports a finding, and someone has to triage whether the vulnerability is exploitable in your context. Each unnecessary package is a potential source of these alerts.
Compare typical scan results:
node:20: 200–500 findings at any given time (many informational, some critical)node:20-slim: 50–150 findingsnode:20-alpine: 20–60 findingsdistroless/nodejs20: 5–20 findings
These numbers vary by day and scanner, but the magnitude difference is consistent. Every package you remove from your base image is a category of vulnerability you never have to triage.
The practical decision tree
- Is your application a statically compiled binary with no OS dependencies? →
scratch - Is your application a JVM fat JAR? →
eclipse-temurin:VERSION-jre-alpine - Does your application have C extensions or glibc dependencies? →
-slimDebian variant - Is your application pure interpreted code or compiled without native extensions? → Alpine variant
- Do you need maximum security posture with the operational maturity to support it? → distroless
Check what you're using today against this list. The migration from node:20 to node:20-alpine takes an hour and saves 280MB per image — multiply that by the number of images in your registry and the number of pulls per day, and the savings are real.