Every Backend Job Requires Docker. Few Engineers Use It Well.
Docker appears in virtually every backend and DevOps job posting. It's the standard for packaging, shipping, and running applications. But there's a massive gap between "I can write a Dockerfile" and "I build production-grade container images."
The gap manifests as 2GB images that take 10 minutes to pull, containers running as root with full filesystem write access, applications that don't respond to shutdown signals, and builds that break caching on every commit. These aren't theoretical problems — they're the source of real production incidents, slow deployments, and security vulnerabilities.
This guide covers the Docker practices that separate hobby projects from production-grade infrastructure. Every pattern here addresses a real failure mode we've seen in production.
Multi-Stage Builds: Smaller, Safer Images
A typical Dockerfile installs build tools, dependencies, compiles code, and serves the application — all in one layer. The result is an image packed with compilers, package managers, and development headers that have no business being in production.
Multi-stage builds solve this:
# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Stage 2: Production
FROM node:20-alpine AS production
WORKDIR /app
RUN addgroup -g 1001 appgroup && \
adduser -u 1001 -G appgroup -s /bin/sh -D appuser
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
USER appuser
EXPOSE 3000
CMD ["node", "dist/server.js"]
What this achieves: The final image contains only the compiled application and production dependencies. Build tools, source code, and dev dependencies are left behind in the builder stage. Image size typically drops 60-80%.
Python Multi-Stage Example
# Stage 1: Build dependencies
FROM python:3.12-slim AS builder
WORKDIR /app
RUN pip install --no-cache-dir poetry
COPY pyproject.toml poetry.lock ./
RUN poetry export -f requirements.txt --output requirements.txt --without-hashes
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# Stage 2: Production
FROM python:3.12-slim AS production
WORKDIR /app
COPY --from=builder /install /usr/local
COPY src/ ./src/
RUN useradd --create-home --shell /bin/bash appuser
USER appuser
EXPOSE 8000
CMD ["python", "-m", "uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
Choose the Right Base Image
Use -alpine or -slim variants of base images. The difference is dramatic: python:3.12 is 1GB, python:3.12-slim is 150MB, and python:3.12-alpine is 50MB. For even smaller images, consider distroless images from Google (gcr.io/distroless/python3) which contain only the runtime and no shell at all — the ultimate security hardening.
Layer Caching: Fast Builds in CI
Docker builds layers from top to bottom. When a layer changes, every subsequent layer is rebuilt. The order of your Dockerfile instructions directly impacts build speed.
Bad: Breaks Cache on Every Code Change
COPY . .
RUN pip install -r requirements.txt
Every code change copies all files, which invalidates the cache for the dependency install layer — even if dependencies didn't change.
Good: Dependencies Cached Separately
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
Dependencies are only reinstalled when requirements.txt changes. Code changes only rebuild the final COPY layer.
.dockerignore
Always include a .dockerignore file. Without it, COPY . . sends your entire directory — including .git, node_modules, .env files, and test data — to the Docker daemon.
.git
.env
.env.*
node_modules
__pycache__
*.pyc
.pytest_cache
.coverage
dist
build
*.md
docker-compose*.yml
Security: Don't Run as Root
By default, Docker containers run as root. This means a vulnerability in your application gives the attacker root access inside the container, and potentially access to the host via container escape vulnerabilities.
Create a Non-Root User
RUN addgroup -g 1001 appgroup && \
adduser -u 1001 -G appgroup -s /bin/sh -D appuser
USER appuser
Read-Only Root Filesystem
Run containers with a read-only filesystem and only write to designated volumes:
# docker-compose.yml
services:
api:
image: myapp:latest
read_only: true
tmpfs:
- /tmp
volumes:
- app-data:/app/data
Drop All Capabilities
Linux capabilities give fine-grained control over what a container process can do. Drop all capabilities and add back only what you need:
services:
api:
image: myapp:latest
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if binding to ports < 1024
Never Use --privileged
Running a container with --privileged gives it full access to the host system, including all devices, all capabilities, and the ability to modify the host kernel. There is almost never a legitimate reason to use --privileged in production. If you think you need it, you almost certainly need a specific capability or device mount instead.
Health Checks: Let the Orchestrator Help
Without health checks, Docker (and Kubernetes) can only tell if your container process is running — not if it's actually healthy and serving requests. A container can be "running" while the application inside is deadlocked, out of memory connections, or stuck in an infinite loop.
Dockerfile HEALTHCHECK
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:8000/health || exit 1
Application-Level Health Endpoints
Build a dedicated health endpoint that checks real dependencies:
@app.get("/health")
async def health():
checks = {
"database": await check_db_connection(),
"redis": await check_redis_connection(),
}
healthy = all(checks.values())
return JSONResponse(
status_code=200 if healthy else 503,
content={"status": "healthy" if healthy else "degraded", "checks": checks},
)
Graceful Shutdown: Handle SIGTERM
When Docker stops a container, it sends SIGTERM and waits 10 seconds before sending SIGKILL. Your application must handle SIGTERM to drain connections and finish in-flight requests.
The Problem with Shell Form CMD
# Bad: runs via /bin/sh, which doesn't forward signals
CMD python server.py
# Good: exec form, process receives signals directly
CMD ["python", "server.py"]
Always use the exec form (CMD ["executable", "arg"]) so your application process is PID 1 and receives signals directly.
Handle SIGTERM in Application Code
import signal
import sys
def graceful_shutdown(signum, frame):
print("Received SIGTERM, shutting down gracefully...")
# Close database connections
# Finish in-flight requests
# Flush logs and metrics
sys.exit(0)
signal.signal(signal.SIGTERM, graceful_shutdown)
Logging: stdout, Not Files
Docker expects applications to write logs to stdout and stderr. The Docker logging driver then captures these logs and routes them to whatever backend you've configured (json-file, fluentd, CloudWatch, etc.).
Don't write logs to files inside the container. This fills up the container's writable layer, makes logs inaccessible to docker logs, and breaks log aggregation in orchestration platforms.
import logging
logging.basicConfig(
level=logging.INFO,
format='{"time":"%(asctime)s","level":"%(levelname)s","message":"%(message)s"}',
handlers=[logging.StreamHandler()], # stdout
)
Use JSON-formatted structured logging. This makes logs parseable by automated systems without regex gymnastics.
Secrets Management: Never Bake Secrets Into Images
# NEVER do this
ENV DATABASE_PASSWORD=mysecretpassword
COPY .env /app/.env
Secrets baked into images persist in every layer and are visible to anyone with access to the image.
Instead, inject secrets at runtime:
- Environment variables: Pass via
docker run -eor Docker Composeenvironment:. Simple but visible indocker inspect. - Docker secrets: Native secrets management for Docker Swarm. Mounted as files in
/run/secrets/. - External secret managers: HashiCorp Vault, AWS Secrets Manager, Azure Key Vault. Pull secrets at application startup. The most secure approach for production.
Build-Time Secrets
If you need secrets during the build (e.g., pulling from a private package registry), use Docker BuildKit's --mount=type=secret:
RUN --mount=type=secret,id=npm_token \
NPM_TOKEN=$(cat /run/secrets/npm_token) npm ci
The secret is available during the build step but is never committed to any image layer.
Image Scanning and Updates
Production images should be scanned for known vulnerabilities before deployment:
# Scan with Docker Scout (built into Docker Desktop)
docker scout cves myapp:latest
# Scan with Trivy (open source)
trivy image myapp:latest
Integrate image scanning into your CI pipeline. Block deployments if critical or high-severity CVEs are detected. For a complete CI/CD pipeline setup, see our guide on CI/CD with GitHub Actions and Docker.
Rebuild images regularly (at least monthly) to pick up base image security patches, even if your application code hasn't changed.
Production Dockerfile Checklist
Before deploying any Docker image to production, verify:
- Multi-stage build — production image contains only runtime dependencies
- Non-root user — container runs as an unprivileged user
- Minimal base image — using
-slim,-alpine, or distroless -
.dockerignore— excludes.git,.env,node_modules, test files - Layer caching — dependency install before code copy
- HEALTHCHECK — meaningful health endpoint, not just process alive
- Exec form CMD — signals forwarded to application process
- No secrets in image — environment variables or external secret managers
- Image scanning — no critical CVEs in base image or dependencies
- Structured logging — JSON to stdout/stderr
- Read-only filesystem — where possible, with tmpfs for write needs
- Pinned versions — specific base image tags, not
latest
Conclusion
Docker is a solved problem at the surface level — anyone can containerize an application. The difference between a containerized application and a production-grade container is security, reliability, and operational efficiency.
The practices in this guide — multi-stage builds, non-root execution, health checks, graceful shutdown, proper logging, and secrets management — aren't optional extras. They're the baseline for any container that handles real traffic. Skip any one of them and you're accumulating technical debt that will surface as production incidents.
Start with your most important service. Apply these patterns one at a time. Each improvement reduces your attack surface, speeds up your deployments, and makes your containers more reliable. If you're moving from single containers to orchestration, our guide on Docker and Kubernetes orchestration covers the next step in the journey.
Want to practice this hands-on?
CloudaQube generates complete labs from a simple description. Try it free.
Get Started Free