Choosing the Right Base Image Is More Important Than You Think

by Eric Hanson, Backend Developer at Clean Systems Consulting

The decision that shapes everything downstream

You're writing a new Dockerfile. Without much deliberation, you type:

FROM node:20

This pulls a Debian bookworm image with Node.js 20, npm, yarn, and a full Debian userland — around 340MB compressed, 1.1GB uncompressed. Your Node.js application is probably 50MB of code and 150MB of dependencies. You're shipping 900MB more than your app needs, and that ratio stays roughly constant through every build and every deploy.

The base image selection is the foundation of everything that comes after — image size, vulnerability exposure, available system tools, how often you need security updates, and whether your build is reproducible. Treating it as a default rather than a decision is a mistake.

The main base image families and what they actually mean

Full official images (node:20, python:3.12, openjdk:17):

These are Debian-based with the full runtime environment plus development headers. They're designed to be convenient: you can do almost anything without installing additional packages. That convenience comes with size (300MB+ compressed) and a large vulnerability surface. Every package in the base image is a potential CVE, and Debian full images ship a lot of packages.

-slim variants (node:20-slim, python:3.12-slim):

Debian-based but with non-essential packages removed. Roughly 30–50% smaller than the full image. No development headers, no build tools, fewer system utilities. The right choice when you need a Debian base (for glibc compatibility or specific Debian packages) but don't need a full development environment.

Alpine-based (node:20-alpine, python:3.12-alpine):

Based on Alpine Linux, which uses musl libc instead of glibc and the apk package manager. Alpine images are dramatically smaller — node:20-alpine is ~60MB compressed versus ~340MB for node:20. The attack surface is proportionally smaller: fewer packages means fewer CVEs.

The tradeoff: musl libc is not drop-in compatible with glibc. Native extensions compiled against glibc won't run on Alpine. Python packages with C extensions sometimes have Alpine-specific issues. Node.js itself works fine on Alpine; npm packages with native addons may not.

Distroless (gcr.io/distroless/nodejs20, gcr.io/distroless/java17):

Google's distroless images contain only the language runtime and its direct dependencies — no shell, no package manager, no utilities at all. They're the most minimal option for runtime images and have the smallest vulnerability surface by a significant margin.

The tradeoff: debugging is difficult because there's no shell to exec into. You need a separate debug image variant (gcr.io/distroless/java17:debug) for troubleshooting. Some application bootstrap patterns that exec shell scripts break entirely. Use distroless when you have the operational maturity to work without in-container debugging.

scratch:

An empty base, literally nothing. The only viable target for Go binaries with CGO disabled (which produces fully static binaries), or other fully self-contained executables.

FROM scratch
COPY --from=build /app/server /server
CMD ["/server"]

If your binary doesn't need any OS facilities, this produces the smallest possible image and has zero inherited vulnerabilities.

The JVM base image landscape

Java deserves specific attention because there are several competing image families:

  • eclipse-temurin: The Eclipse Foundation's distribution of OpenJDK. Well-maintained, available for multiple JDK versions in JDK and JRE variants on Alpine, Debian, and Ubuntu.
  • amazoncorretto: Amazon's OpenJDK distribution, optimized for AWS workloads, AL2023-based.
  • azul/zulu: Azul Systems' OpenJDK, with long support windows.
  • openjdk: The official Docker Hub image, deprecated since 2022. Avoid new usage.

For most teams: eclipse-temurin:17-jre-alpine for JDK 17 or eclipse-temurin:21-jre-alpine for JDK 21. It's the JRE (not JDK), Alpine-based, and maintained by a neutral foundation. amazoncorretto is a reasonable alternative if you're heavily invested in AWS and want Amazon's support window.

Explicitly: use JRE images for runtime, not JDK images. The JDK includes javac, the compiler, and development tools that serve no purpose in a production container and add ~100MB and additional CVE surface.

Tagging strategy and reproducibility

FROM node:20 tracks the latest patch version of Node 20. This means your builds can change between Monday and Tuesday if a new patch is released. For most teams this is acceptable; for regulated environments or strict reproducibility requirements, pin to a specific version:

FROM node:20.12.2-alpine3.19

For absolute reproducibility, pin to the image digest:

FROM node:20.12.2-alpine3.19@sha256:bf77dc26e48ea95fca9d1aceb5acfa69d2e546b765ec2abfb502975f1a2d4def

The digest guarantees that no matter what the tag points to in the future, you build against this exact byte sequence. Renovate and Dependabot both support Dockerfile pin updates — you can have reproducibility without manually tracking versions.

Vulnerability surface: why fewer packages matters

When a CVE affects a package in your base image, your security team gets an alert, your scanner reports a finding, and someone has to triage whether the vulnerability is exploitable in your context. Each unnecessary package is a potential source of these alerts.

Compare typical scan results:

  • node:20: 200–500 findings at any given time (many informational, some critical)
  • node:20-slim: 50–150 findings
  • node:20-alpine: 20–60 findings
  • distroless/nodejs20: 5–20 findings

These numbers vary by day and scanner, but the magnitude difference is consistent. Every package you remove from your base image is a category of vulnerability you never have to triage.

The practical decision tree

  1. Is your application a statically compiled binary with no OS dependencies? → scratch
  2. Is your application a JVM fat JAR? → eclipse-temurin:VERSION-jre-alpine
  3. Does your application have C extensions or glibc dependencies? → -slim Debian variant
  4. Is your application pure interpreted code or compiled without native extensions? → Alpine variant
  5. Do you need maximum security posture with the operational maturity to support it? → distroless

Check what you're using today against this list. The migration from node:20 to node:20-alpine takes an hour and saves 280MB per image — multiply that by the number of images in your registry and the number of pulls per day, and the savings are real.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Why Your API Returns 200 Even When Something Goes Wrong

Returning HTTP 200 for failed operations hides errors, breaks client logic, and makes systems harder to debug. Using proper status codes is not pedantry—it’s critical for correctness and reliability.

Read more

When a Developer Writes Code Nobody Else Is Allowed to Touch

At some point, someone says: “Don’t touch that part of the code.” And just like that, a normal system turns into a fragile one.

Read more

Why Figma Designs Are Not Enough to Build an API

Figma designs show how an app looks, but not how it works under the hood. APIs require more than screens—they need rules, workflows, and integration logic.

Read more

How US Startups Use Async Backend Contractors to Move Fast Without the Burn Rate

Your burn rate doesn't care that you're still onboarding your new backend hire. It just keeps burning.

Read more