Your Docker Image Has More Inside It Than You Think

by Arif Ikhsanudin, Backend Developer

The image nobody has actually opened

Your production image has been running for eight months. It's built by CI, pushed to a registry, pulled by Kubernetes. Hundreds of deploys. Nobody has ever opened it to see what's inside.

This is normal. It's also a problem. Images accumulate content from base layers, build artifacts, and COPY instructions in ways that aren't obvious from reading the Dockerfile. Sensitive configuration files, debug utilities, package manager caches, test fixtures — any of these can end up in a layer and stay there until someone specifically looks.

Here's how to actually inspect your image, and what to do when you find something that shouldn't be there.

Layer anatomy: what you're actually looking at

A Docker image is a stack of read-only layers. Each layer is a tarball of filesystem changes — files added, modified, or deleted relative to the previous layer. The final image filesystem is the union of all layers.

The important implication: deleting a file in a later layer doesn't remove it from the image. The file is present in the earlier layer's tarball. If you add a file in RUN step 3 and delete it in RUN step 7, the file is still in the layer created by step 3, and therefore still in the image. Anyone who unpacks the image or inspects the layers can retrieve it.

This is why cache cleanup must happen in the same RUN instruction that creates the cache:

# Wrong — cache is in one layer, cleanup is in another
RUN apt-get update && apt-get install -y build-essential
RUN rm -rf /var/lib/apt/lists/*   # too late — the lists are in the previous layer

# Right — cleanup in same RUN, same layer
RUN apt-get update && apt-get install -y --no-install-recommends build-essential \
    && rm -rf /var/lib/apt/lists/*

How to inspect what's actually in your image

docker history: start here

docker history --no-trunc your-image:tag

This shows each layer, the instruction that created it, and the layer size. Large layers are your first investigation target. A 200MB layer created by COPY . . means your build context was copied in with something heavy.

Exploring the filesystem with docker run

docker run --rm -it your-image:tag sh

If the image has a shell, you can browse interactively. Check:

ls -la /app           # what's in your working directory?
ls -la /tmp           # temp files that shouldn't be there?
find / -name "*.env" 2>/dev/null   # env files anywhere in the image?
find / -name "*.pem" 2>/dev/null   # certificates or private keys?
find / -name ".git" -type d 2>/dev/null  # git history?

Exporting and unpacking layers

For images without a shell (distroless, scratch-based), or for a systematic audit:

docker save your-image:tag | tar x -C /tmp/image-audit/

This extracts the image to disk as a directory of tarballs (one per layer). You can then unpack each layer:

for layer in /tmp/image-audit/*/layer.tar; do
  echo "=== $layer ==="
  tar -tv -f "$layer" | sort -k5 -rn | head -20  # largest files first
done

dive: the practical tool

dive (github.com/wagoodman/dive) provides an interactive TUI for browsing image layers and seeing exactly which files changed in each layer. Install it and run:

dive your-image:tag

The left panel shows layers with their size. The right panel shows the filesystem diff for the selected layer — green for added, yellow for modified, red for removed. Files "removed" in a layer show as red but are still present in earlier layers — that's the problem the UI helps you visualize.

dive also has a CI mode that fails if any image efficiency metric falls below a threshold:

dive --ci your-image:tag

Common things that shouldn't be there

Build artifacts and intermediate files

Source code, test directories, compiled intermediate objects that didn't make it into the final artifact. Common in single-stage builds that didn't clean up.

Package manager caches

/root/.m2 (Maven), /root/.cache/pip (pip), /root/.npm (npm cache), /var/cache/apt/ (apt). These are left by dependency installation and don't serve any purpose at runtime.

Development credentials and configuration

.env files, application-local.yml, AWS credential files in ~/.aws/, private keys copied in during build. These end up in images when .dockerignore is absent or incomplete.

Version control history

.git/ directories containing your full commit history. This is surprisingly common when COPY . . is used without a .dockerignore. A .git directory in an image means anyone with registry access can clone your repository history.

Build tools that weren't removed

gcc, make, curl, wget installed for build-time use and never cleaned up. These tools make container escape and lateral movement easier for an attacker who has code execution in the container.

Fixing what you find

If you find something that shouldn't be in the image, the fix depends on where it entered:

Via COPY: add it to .dockerignore.

Via RUN: ensure cleanup happens in the same RUN instruction, or switch to a multi-stage build so the intermediate layer never enters the final stage.

From the base image: switch to a more minimal base (alpine variants, distroless, slim variants), or add an explicit deletion in the same layer if it's a specific known file.

Credentials baked in: this is a multi-part fix. Remove them from the image, rotate them immediately (assume they've been compromised if the image was ever pushed to a registry), and implement runtime secret injection.

Making image auditing part of the process

Ad hoc audits find problems after the fact. Better to integrate inspection into CI:

  1. docker scout or trivy for known vulnerabilities (covered separately)
  2. dive --ci for efficiency and unexpected large files
  3. A simple script that checks for known-bad patterns:
# Fail if .env or .git appear in the image
docker run --rm your-image:tag sh -c '
  (find / -name ".env" 2>/dev/null | grep -q .) && exit 1
  (find / -name ".git" -type d 2>/dev/null | grep -q .) && exit 1
  exit 0
'

The first time you run this on an existing image, you will find something. Almost every project does. The goal isn't to be surprised by a security audit — it's to find these things yourself first.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Docker in CI/CD Is Easier Than Most Tutorials Make It Look

Most CI/CD Docker tutorials are either too simple (just run docker build) or too complex (full GitOps with Argo and Helm). The practical middle ground — building, testing, tagging, and pushing images in a CI pipeline — is straightforward once you see it laid out.

Read more

JPA Query Optimization — What Hibernate Generates and How to Control It

Hibernate generates SQL from your entity model and query methods. The generated SQL is often correct but rarely optimal. Understanding what gets generated — and the specific patterns that override it — determines whether JPA is a productivity tool or a performance liability.

Read more

Ruby Symbols vs Strings — When It Actually Matters in Production

Most Ruby developers know symbols are "faster" than strings, but few can explain exactly why or when the difference is worth caring about. Here's where it genuinely matters at scale.

Read more

Performance Testing Is Not Something You Do Right Before Launch

Running your first load test the week before launch is risk management theater. By that point, performance problems are architectural — and architectural problems cannot be fixed in a week. Performance testing belongs earlier in the development cycle than most teams put it.

Read more