Least Privilege in Docker: Why It Matters for Backend Apps
by Arif Ikhsanudin, Backend Developer
Code execution in a container is not the end of the incident
An attacker who achieves code execution inside a container has not necessarily achieved much — if the container is configured correctly. They're in an isolated process namespace, running as a non-root user with no capabilities, unable to write to the filesystem, unable to reach services they shouldn't reach, and unable to access the host. The incident is contained.
The same attacker in a poorly configured container — running as root, with the Docker socket mounted, with host network access, with full capabilities — may be one command away from owning the host.
Least privilege is the discipline of removing every permission the application doesn't need, so that a compromised container is a small problem rather than a large one.
The principle applied concretely
Least privilege in Docker means: your container has exactly the access it needs to perform its function, and no more. That breaks down into:
- Runs as a non-root user
- Has only the Linux capabilities it actually uses
- Can only write to the filesystem paths it legitimately writes to
- Can only reach the network services it actually talks to
- Has no access to host resources (socket, host networking, host PID namespace) unless technically required
- Has resource limits that prevent resource exhaustion attacks
Apply each one as a positive constraint, not a checklist to half-complete.
Non-root user: the baseline
Already covered in depth elsewhere, but it's the first layer of least privilege:
FROM eclipse-temurin:17-jre-alpine
RUN addgroup -S -g 1001 app && adduser -S -u 1001 -G app app
WORKDIR /app
COPY --chown=app:app target/app.jar .
USER app
ENTRYPOINT ["java", "-jar", "app.jar"]
Running as a non-root user doesn't prevent all attacks — a misconfigured capability or a container escape vulnerability can still be exploited. But it removes the trivial path: a root container process can write to host-mounted volumes, modify shared namespaces, and use setuid binaries to escalate. A non-root process can't.
Capabilities: remove what the JVM/Node/Python app doesn't need
Linux capabilities divide root's monolithic privilege into discrete units. Docker grants a default set. Most backend applications need none of them.
The default capabilities granted to Docker containers:
CAP_CHOWN— change file ownershipCAP_NET_RAW— raw network packets (used for ping, some monitoring tools)CAP_SYS_CHROOT— change root directoryCAP_AUDIT_WRITE— write to kernel audit log- And others
A Spring Boot REST API doesn't need any of these. A Go binary serving HTTP doesn't need these. Drop all and verify nothing breaks:
# docker-compose.yml
services:
app:
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
# kubernetes pod spec
securityContext:
capabilities:
drop:
- ALL
allowPrivilegeEscalation: false
Test your application under this configuration. If something breaks, add back the specific capability that's needed — don't revert to keeping all capabilities.
The one exception: if your application binds to a port below 1024, it needs CAP_NET_BIND_SERVICE. Prefer running on a high port (8080, 8443) and letting the load balancer handle external port exposure.
Filesystem: read-only root with explicit write mounts
A read-only container filesystem means that even with code execution, an attacker can't write malicious files to the container, can't modify the application binary, and can't persist a backdoor:
services:
app:
read_only: true
tmpfs:
- /tmp:size=64m,mode=1777
volumes:
- app_logs:/app/logs # writable only for intended paths
# Kubernetes
securityContext:
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
- name: logs
mountPath: /app/logs
volumes:
- name: tmp
emptyDir: {}
- name: logs
emptyDir: {}
Test this locally before deploying. Common failures:
- JVM writes heap dumps to the working directory — configure
-XX:HeapDumpPath=/app/logs - Applications create temp files in the working directory — redirect to
/tmp - Log frameworks write to
./logs/— configure them to write to a mounted path or stdout
The test is simple:
docker run --read-only --tmpfs /tmp your-image:tag
If it starts and serves requests, the read-only configuration works.
Network: limit what containers can reach
A compromised container on your default Compose network can reach every other container in the project. If your API container is compromised, it can directly reach your database container — bypassing application-level authentication entirely.
Segment your network so services only reach what they need:
networks:
frontend: # proxy <-> app
backend: # app <-> database, app <-> redis
services:
proxy:
networks: [frontend]
app:
networks: [frontend, backend]
db:
networks: [backend] # only reachable via the backend network
redis:
networks: [backend]
A compromised proxy container can't reach db or redis directly. It can only reach app. A compromised app container can reach db and redis — which is necessary for its function — but the blast radius is smaller than if everything was on the same flat network.
In Kubernetes, use NetworkPolicy resources to enforce this at the cluster level:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: db-policy
spec:
podSelector:
matchLabels:
app: database
ingress:
- from:
- podSelector:
matchLabels:
app: backend-api
ports:
- port: 5432
This allows only pods labeled app: backend-api to connect to the database on port 5432. All other inbound connections to the database pod are blocked at the network level.
Resource limits: preventing DoS from within
Without resource limits, a single container can consume all available CPU or memory on the host, affecting every other service running there. In a multi-tenant environment, this is a security concern, not just an operational one.
services:
app:
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
For JVM applications, pair this with heap configuration:
JAVA_OPTS=-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -XX:InitialRAMPercentage=25.0
UseContainerSupport makes the JVM respect the container's memory limit. Without it (on older JVM versions), the JVM sizes its heap based on the host's total RAM and OOMs when it exceeds the container limit.
Putting it together: the secure service template
services:
app:
image: your-registry/your-app:${VERSION}
user: "1001:1001"
read_only: true
tmpfs:
- /tmp:size=64m
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
networks:
- frontend
- backend
deploy:
resources:
limits:
cpus: '0.5'
memory: 256M
volumes:
- app_logs:/app/logs
environment:
DATABASE_URL: ${DATABASE_URL}
This template doesn't require application changes for most services — it's entirely runtime configuration. Start with cap_drop: ALL and no-new-privileges this week. Add read_only: true after testing. Add network segmentation when you have time to reorganize your Compose file. Each change independently reduces the blast radius of a compromise.