Your Docker Setup Is Not as Secure as You Think
by Eric Hanson, Backend Developer at Clean Systems Consulting
The assumption that containers are isolated enough
Docker containers provide namespace and cgroup isolation — they're not VMs. The kernel is shared. A container process running as root with full Linux capabilities on a default Docker installation has more access to the host than most developers realize. The Docker socket, if mounted into a container, gives that container complete control over the host's Docker daemon. Privileged containers have near-unrestricted host access.
None of this is news to Docker's maintainers — these are documented behaviors and known tradeoffs. The problem is that most production Docker setups were configured once, were never reviewed from a security standpoint, and have silently accumulated risk.
Here's where that risk lives and how to address it.
The Docker socket: your most dangerous exposure
The Docker daemon socket (/var/run/docker.sock) allows full control of Docker on the host. Any process that can communicate with it can start containers, mount host filesystems, extract images, and escalate to root on the host.
This is mounted into containers more often than it should be:
services:
app:
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Common reasons: CI/CD tools that build images (Jenkins, Drone), container management UIs (Portainer), monitoring tools that enumerate containers. Legitimate use cases exist — but each one is effectively a root escalation path.
Alternatives where possible:
- Rootless Docker: run the Docker daemon as a non-root user. The socket is still exposed but access grants user-level, not root-level, host access.
- Docker-in-Docker (DinD): run a separate Docker daemon inside the CI container, in privileged mode, without mounting the host socket. Isolates the risk to the DinD container.
- Kaniko or Buildah: container image build tools that don't require the Docker daemon at all. Build from Dockerfiles without Docker socket access.
- Registry-level builds: AWS CodeBuild, Cloud Build, GitHub Actions — build images in the CI provider's infrastructure without bringing the socket into your containers.
If you must mount the socket, restrict access with a proxy like docker-socket-proxy (Tecnativa) that exposes only specific Docker API endpoints, filtered to what the tool actually needs.
Root processes in containers
The default is to run as root. The risk: if there's a vulnerability in your application or a dependency, an attacker with code execution inside the container has root privileges inside that container. Combined with other misconfigurations (excessive capabilities, privileged mode), this can become host root.
Fix in the Dockerfile:
FROM node:20-alpine
# ... app setup ...
USER node # switch to non-root before final CMD
For images without a built-in service user:
RUN addgroup -S -g 1001 appgroup && adduser -S -u 1001 -G appgroup appuser
USER appuser
Enforce at the orchestrator level too. In Kubernetes:
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
runAsNonRoot: true causes the kubelet to reject the pod if the image is configured to run as root — a safety net for images you don't control.
Linux capabilities: drop what you don't need
Linux capabilities are fine-grained privileges granted to processes. A root container, by default, has a set of capabilities defined by Docker's OCI runtime spec. Many of these are unnecessary for typical backend applications.
Default capabilities included in Docker containers: CHOWN, DAC_OVERRIDE, FOWNER, FSETID, KILL, SETGID, SETUID, SETPCAP, NET_BIND_SERVICE, NET_RAW, SYS_CHROOT, MKNOD, AUDIT_WRITE, SETFCAP.
Most backend apps need none of these at runtime. Drop all and add back only what's required:
# Docker Compose
services:
app:
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # only if binding port < 1024
security_opt:
- no-new-privileges:true
# Kubernetes
securityContext:
capabilities:
drop:
- ALL
allowPrivilegeEscalation: false
no-new-privileges / allowPrivilegeEscalation: false prevents processes from gaining new capabilities through setuid binaries. A common container escape technique relies on setuid binaries — this blocks it.
Seccomp profiles: limit system calls
Seccomp (Secure Computing Mode) filters which Linux system calls a container process can make. Docker applies a default seccomp profile that blocks ~44 of the 300+ available system calls — ones that are rarely needed and frequently exploited.
For most applications, Docker's default profile is a good baseline. For higher security requirements, create a custom profile that only allows the specific syscalls your application uses. This is complex to maintain but significantly reduces the attack surface.
The Docker default profile blocks: clone with certain flags (namespace creation), mount, kexec_load, open_by_handle_at, and others that are commonly used in container escape exploits.
Enable it explicitly in environments where it might be disabled:
# Compose
security_opt:
- seccomp:/path/to/custom-profile.json
# or use the default:
- seccomp=default
Resource limits: preventing one container from taking everything
Without resource limits, a single container can consume all CPU and memory on the host, starving other containers and the host OS. This is a denial-of-service risk, not just a performance concern.
# Docker Compose
services:
app:
deploy:
resources:
limits:
cpus: '0.5'
memory: 256M
reservations:
cpus: '0.1'
memory: 128M
# Kubernetes
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
JVM applications need memory limits set carefully. Set JVM heap relative to the container limit using -XX:MaxRAMPercentage, not as an absolute value:
JAVA_OPTS=-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0
Read-only filesystems
# Compose
services:
app:
read_only: true
tmpfs:
- /tmp
A read-only root filesystem prevents writing to paths that don't have explicit volume mounts. An attacker with code execution can't write malicious files to the container filesystem, can't modify application binaries, and can't persist state between exploits.
Test this before deploying — your application needs to be explicitly designed for it (all writes go to mounted volumes or tmpfs).
The security audit checklist
Review your Docker configuration against these questions:
- Is
/var/run/docker.sockmounted into any container? If yes, is it justified? - Does every container run as a non-root user?
- Is
--privilegedused anywhere? (Find it:docker inspect -f '{{.HostConfig.Privileged}}' $(docker ps -q)) - Are capabilities dropped in production service definitions?
- Are memory and CPU limits set?
- Does any container use
network_mode: hostwithout a specific technical reason?
Items 1–4 are exploitable. Items 5–6 are operational risks. Address the exploitable ones first.