Welcome to from-docker-to-kubernetes

Optimization

Learn how to optimize Docker images, containers, and overall Docker performance

Docker Optimization

Optimizing Docker involves improving image size, build time, runtime performance, and resource utilization. These optimizations lead to faster deployments, reduced costs, and better application performance. A well-optimized Docker environment ensures efficient resource usage, faster CI/CD pipelines, improved application startup times, and enhanced security posture.

Docker optimization can be approached from multiple angles:

  • Image optimization: Reducing size and improving build efficiency
  • Build performance: Accelerating the image creation process
  • Runtime optimization: Enhancing container execution efficiency
  • Resource management: Controlling CPU, memory, and I/O usage
  • Infrastructure optimization: Tuning the Docker daemon and host system

Image Optimization

Multi-Stage Builds

Use multi-stage builds to create smaller production images by separating build-time dependencies from runtime requirements:

# Build stage - includes all build dependencies and tools
FROM node:16 AS builder
WORKDIR /app
# Copy dependencies first to leverage layer caching
COPY package*.json ./
RUN npm install
# Copy application code
COPY . .
# Build the application
RUN npm run build

# Production stage - much smaller base image with only runtime requirements
FROM nginx:alpine
# Copy only the build artifacts from the builder stage
COPY --from=builder /app/build /usr/share/nginx/html
# Optional: Copy only specific files needed for runtime
COPY --from=builder /app/config/nginx.conf /etc/nginx/conf.d/default.conf
# Set non-root user for better security
USER nginx

Multi-stage builds provide several benefits:

  • Dramatically smaller final images (often 10-20x reduction in size)
  • Separation of build-time and runtime dependencies
  • Reduced attack surface with fewer installed packages
  • Better layer caching for faster rebuilds
  • More secure images without build tools or source code

Minimize Layers

Docker images are composed of layers, and each layer adds overhead. Optimize by reducing unnecessary layers:

  • Combine related commands into single RUN instruction
  • Use && to chain commands logically
  • Clean up temporary files and package caches in the same layer
  • Group installations by stability/change frequency
# Bad example - each command creates a new layer
# RUN apt-get update
# RUN apt-get install -y package1
# RUN apt-get install -y package2
# RUN apt-get clean

# Good example - single layer with proper cleanup
RUN apt-get update && \
    apt-get install -y \
      package1 \
      package2 \
      --no-install-recommends && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* && \
    # Application-specific cleanup
    rm -rf /tmp/* /var/tmp/* && \
    # Ensure all package installations succeeded
    package1 --version && package2 --version

Every layer:

  • Adds metadata overhead (typically ~4KB per layer)
  • Impacts build and pull times
  • Affects layer cache efficiency
  • Contributes to overall image size complexity

While Docker has a hard limit of 127 layers, a good rule of thumb is to aim for fewer than 20 layers in a production image.

Use .dockerignore

Create a .dockerignore file to exclude unnecessary files from the build context. This reduces build time, context size, and prevents sensitive information from being included in images:

# Version control
.git
.gitignore
.svn
.hg

# Development artifacts
node_modules
bower_components
vendor
*.o
*.obj
*.exe
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib64/
parts/
sdist/
var/
.installed.cfg

# Logs and databases
*.log
logs/
*.sql
*.sqlite
*.sqlite3
*.db

# Environment and secrets
.env
.env.*
.venv
.aws/
.ssh/
.config/
.npm/
.cache/
*.pem
*.key
*_rsa
*_dsa
*_ed25519
*_ecdsa
credentials.json

# IDE files
.idea/
.vscode/
*.swp
*.swo
.DS_Store
Thumbs.db

# Testing
coverage/
.coverage
htmlcov/
.pytest_cache/
.tox/
.nox/

# Docker specific
Dockerfile*
docker-compose*
.dockerignore

Benefits of a well-configured .dockerignore:

  • Reduces build context size (can improve build time by orders of magnitude)
  • Prevents unnecessary cache invalidation
  • Improves security by excluding secrets and credentials
  • Makes builds more deterministic by excluding variable content
  • Reduces bandwidth usage when transferring build context to remote Docker daemons

You can also use pattern matching similar to .gitignore:

  • **/temp* - matches any file or directory starting with "temp"
  • !important.log - negates a previous pattern, including important.log
  • #comment - comments for documentation

Choose Appropriate Base Images

Base image selection significantly impacts image size, security, and performance:

  • Use official slim or alpine variants
    • Alpine: Extremely small (~5MB) but uses musl libc instead of glibc
    • Slim variants: Trimmed official images with minimal packages (~40-60MB)
    • Debian/Ubuntu-based images: Better compatibility but larger size
  • Consider distroless images for production
    • Contains only your application and its runtime dependencies
    • No package manager, shell, or other utilities
    • Minimal attack surface and smaller size
    • Examples: gcr.io/distroless/java, gcr.io/distroless/nodejs
    • Challenging to debug but excellent for security
  • Be specific with image tags
    • Always use explicit version tags (e.g., node:16.14.2-alpine3.15)
    • Avoid latest tag for reproducible builds
    • Consider using image digests for immutability: node@sha256:3e36d7d8458e14...
    • Balance freshness with stability when choosing versions
  • Choose based on application requirements
    • CPU architecture compatibility (x86_64, ARM64, etc.)
    • Required system libraries and dependencies
    • Security patch update frequency
    • Community support and documentation

Base image size comparison:

node:16                 # ~910MB
node:16-slim            # ~175MB
node:16-alpine          # ~110MB
gcr.io/distroless/nodejs:16  # ~80MB

Consider creating standardized, security-hardened base images for your organization that include common configurations, monitoring agents, and security patches.

Build Performance

# Enable BuildKit (Docker 18.09+)
export DOCKER_BUILDKIT=1

# Or enable permanently in daemon.json
# { "features": { "buildkit": true } }

# Build with BuildKit and verbose output
docker build --progress=plain -t my-app .

# Use build cache from specific images (helps in CI/CD)
docker build --cache-from my-app:previous -t my-app:latest .

# Advanced BuildKit features
# Mount a secret during build (won't be in final image)
docker build --secret id=npmrc,src=.npmrc -t my-app .

# Mount SSH agent for private repository access
docker build --ssh default -t my-app .

# Export build cache to registry for distributed caching
docker build --push -t my-app:latest \
  --cache-to type=registry,ref=my-registry.io/cache/my-app:buildcache \
  --cache-from type=registry,ref=my-registry.io/cache/my-app:buildcache .

# Use inline cache metadata for registry caching
docker build --push \
  --build-arg BUILDKIT_INLINE_CACHE=1 \
  -t my-app:latest .

# Parallelize independent stages
# (BuildKit automatically parallelizes independent stages)
docker build -t my-app --target production .

In Dockerfile, use BuildKit-specific features with syntax directive:

# syntax=docker/dockerfile:1.4
FROM node:16-alpine AS builder
# Mount secret without exposing in layer
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc npm install
# Mount SSH agent for private repo access
RUN --mount=type=ssh mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
# Mount cache for package manager
RUN --mount=type=cache,target=/root/.npm npm install

Container Runtime Optimization

Runtime optimization ensures that containers operate efficiently, reliably, and securely when deployed. These optimizations affect resource utilization, startup time, stability, and security posture.

Resource Constraints

Setting appropriate resource limits prevents containers from consuming excessive resources or affecting neighboring containers:

# Limit CPU and memory
docker run -d --name app \
  # Limit to 1.5 CPU cores (CPU shares)
  --cpus=1.5 \
  # Hard memory limit (container is killed if exceeded)
  --memory=512m \
  # Soft memory reservation (container attempts to stay below)
  --memory-reservation=256m \
  # Memory swappiness (0-100, lower = less swapping)
  --memory-swappiness=0 \
  # CPU scheduling priority
  --cpu-shares=1024 \
  # Restrict to specific CPU cores
  --cpuset-cpus="0,1" \
  # Limit PIDs (prevents fork bombs)
  --pids-limit=100 \
  # Block I/O weights
  --blkio-weight=500 \
  # Set ulimits
  --ulimit nofile=8192:16384 \
  # OOM score adjustment (-1000 to 1000, lower = less likely to be killed)
  --oom-score-adj=500 \
  my-app:latest

These constraints provide several benefits:

  • Predictable performance through resource isolation
  • Protection against noisy neighbor problems
  • Improved host stability by preventing resource exhaustion
  • More accurate capacity planning and scheduling
  • Better Quality of Service (QoS) management

Resource constraints should be tailored to each application's needs and based on performance profiling rather than arbitrary values.

Read-Only Filesystem

Using read-only filesystems provides significant security benefits by preventing modification of container contents at runtime:

# Mount root filesystem as read-only
docker run -d --name app \
  # Make the entire filesystem read-only
  --read-only \
  # Allow specific directories to be writable using tmpfs (in-memory)
  --tmpfs /tmp:rw,size=256m,mode=1777 \
  --tmpfs /var/run:rw,size=64m \
  --tmpfs /var/cache:rw,size=128m \
  # Mount a specific volume for data that needs persistence
  -v app-data:/data:rw \
  # Mount specific host directories as read-only
  -v /etc/config:/etc/app-config:ro \
  # Set security options
  --security-opt="no-new-privileges:true" \
  my-app:latest

This approach:

  • Prevents runtime modification of application binaries
  • Mitigates the impact of certain types of attacks
  • Forces proper externalization of persistent data
  • Makes containers more immutable and predictable
  • Helps identify application assumptions about filesystem access

Applications may require adaptation to work with read-only filesystems, particularly those that attempt to write to their installation directories or expect to create temporary files in non-standard locations.

Proper Stop Signal

Configuring appropriate stop signals ensures graceful container termination, preventing data corruption and service disruption:

# Define stop signal in Dockerfile
STOPSIGNAL SIGTERM

# Override stop signal at runtime
docker run -d --stop-signal SIGTERM my-app:latest

# Configure stop timeout (seconds before SIGKILL)
docker run -d --stop-timeout=30 my-app:latest

Understanding container lifecycle signals:

  1. SIGTERM (default): Requests graceful termination, allows cleanup
  2. SIGKILL: Immediate forceful termination, no cleanup possible
  3. SIGINT: Terminal interrupt (like Ctrl+C), may be handled differently
  4. SIGHUP: Terminal disconnect, some applications use for config reload

Proper termination handling includes:

  • Application traps and handles SIGTERM
  • Completes in-progress transactions
  • Closes database connections properly
  • Finishes writing to disk and flushes buffers
  • De-registers from service discovery
  • Returns appropriate exit code

For custom applications, implement proper signal handlers:

// Node.js example
process.on('SIGTERM', async () => {
  console.log('Graceful shutdown initiated');
  await closeConnections();
  await flushBuffers();
  console.log('Graceful shutdown completed');
  process.exit(0);
});

Healthchecks

Healthchecks help Docker monitor container health and automatically restart unhealthy containers. They distinguish between running containers and properly functioning applications:

# Add healthcheck to Dockerfile
HEALTHCHECK --interval=30s --timeout=3s --start-period=60s --retries=3 \
  CMD curl -f http://localhost/health || exit 1
# Define healthcheck at runtime
docker run -d --name app \
  --health-cmd="curl -f http://localhost:8080/actuator/health || exit 1" \
  --health-interval=30s \
  --health-timeout=5s \
  --health-retries=3 \
  --health-start-period=60s \
  my-app:latest

Healthcheck parameters explained:

  • interval: Time between checks (default: 30s)
  • timeout: Maximum time for check to complete (default: 30s)
  • start-period: Grace period for initialization (default: 0s)
  • retries: Number of consecutive failures before unhealthy (default: 3)

Effective healthcheck commands should:

  • Be lightweight and quick to execute
  • Check critical application functionality
  • Have minimal dependencies
  • Return appropriate exit codes (0 = healthy, 1 = unhealthy)
  • Avoid false positives/negatives
  • Include reasonable timeouts

Examples for different application types:

# Web application
HEALTHCHECK CMD curl -f http://localhost/ || exit 1

# Database
HEALTHCHECK CMD pg_isready -U postgres || exit 1

# API service
HEALTHCHECK CMD wget -O- http://localhost:8080/actuator/health | grep UP || exit 1

# Worker/background service
HEALTHCHECK CMD pgrep -f worker.js || exit 1

Healthchecks enable orchestration platforms to make intelligent scheduling decisions and provide automatic remediation for failed containers.

Network Performance

Network performance can significantly impact container communication latency, throughput, and overall application performance. Docker offers several network optimization options:

# Use host network for better performance (when appropriate)
# Eliminates network translation overhead but reduces isolation
docker run -d --network host my-app:latest

# Optimize container DNS to prevent lookup delays
docker run -d --dns 8.8.8.8 --dns 1.1.1.1 --dns-search example.com my-app:latest

# Configure DNS options in daemon.json for all containers
# {
#   "dns": ["8.8.8.8", "8.8.4.4"],
#   "dns-search": ["example.com"]
# }

# Configure MTU for better network performance
# Useful when working with VPNs, overlay networks, or unusual network topologies
docker network create --opt com.docker.network.driver.mtu=1400 my-network

# Use dedicated MAC address for predictable networking
docker run -d --mac-address="02:42:ac:11:00:02" my-app:latest

# Configure specific IP address (useful for services that expect fixed IPs)
docker run -d --ip 172.17.0.10 my-app:latest

# Use specific network driver for performance
docker network create --driver=bridge --subnet=172.28.0.0/16 --gateway=172.28.0.1 \
  --ip-range=172.28.5.0/24 --aux-address="my-router=172.28.1.5" \
  -o "com.docker.network.bridge.enable_icc=true" \
  -o "com.docker.network.driver.mtu=1500" \
  my-network

# Reduce port range for specific services
docker run -d --sysctl net.ipv4.ip_local_port_range="10000 10500" my-app:latest

# Tune TCP parameters for high-performance workloads
docker run -d \
  --sysctl net.core.somaxconn=1024 \
  --sysctl net.ipv4.tcp_max_syn_backlog=1024 \
  --sysctl net.ipv4.tcp_fin_timeout=30 \
  --sysctl net.ipv4.tcp_keepalive_time=300 \
  my-app:latest

Network mode considerations:

  • bridge: Default mode, provides isolation with NAT (small overhead)
  • host: Shares host's network stack (best performance, reduced isolation)
  • overlay: Multi-host networking, higher overhead but necessary for swarm
  • macvlan: Assigns MAC address to container (near-native performance)
  • none: No network connectivity, highest isolation

Performance impact factors:

  • Inter-container communication overhead
  • NAT performance on high-throughput services
  • DNS resolution delays
  • MTU mismatches causing fragmentation
  • Network driver implementation differences

Storage Optimization

Memory Management

Proper memory management is critical for container stability, host protection, and efficient resource utilization. Containers without memory limits can consume all available system memory, causing instability.

Memory Limits

  • Set appropriate memory limits
    • Based on application profiling, not guesswork
    • Include headroom for garbage collection and peak usage
    • Consider both resident set size (RSS) and virtual memory
    • Balance between too restrictive (crashes) and too generous (waste)
    • Example: --memory=512m --memory-reservation=384m
    • Different applications have different memory usage patterns:
      • JVM: Consider heap size + metaspace + native memory
      • Node.js: V8 heap + buffer allocations
      • Python: Interpreter overhead + application memory
  • Configure swap behavior
    • --memory-swap: Total memory+swap limit
    • --memory-swappiness=0: Reduce swapping to minimum
    • Disable swap entirely for latency-sensitive applications
    • For databases, configure appropriate swappiness
    • Example: --memory=1g --memory-swap=1g (no swap)
  • Monitor memory usage
    • Track both current and peak memory usage
    • Identify memory leaks with trending data
    • Monitor garbage collection frequency and duration
    • Analyze OOM kill events
    • Tools: docker stats, cAdvisor, Prometheus metrics
    • Commands:
      # Live memory usage
      docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}"
      
      # Container memory details
      docker inspect $(docker ps -q) | jq '.[].HostConfig.Memory'
      
  • Use cgroup constraints
    • Kernel memory limits: --kernel-memory=64m
    • CPU and memory pressure correlation
    • Soft vs hard limits: reservation vs limit
    • cgroup v2 support for improved memory accounting
    • Example: --memory-reservation=256m --memory=512m
  • Implement OOM handling
    • --oom-kill-disable: Prevent OOM killer (use carefully)
    • --oom-score-adj: Adjust OOM kill priority (-1000 to 1000)
    • Proper application-level error handling
    • Graceful degradation under memory pressure
    • Health monitoring to detect memory-related issues
    • OOM-aware application design
    • Example: --oom-score-adj=-500 (less likely to be killed)

Memory Settings in Compose

Define comprehensive memory constraints in Docker Compose for reproducible multi-container deployments:

version: '3.8'
services:
  app:
    image: my-app:latest
    deploy:
      resources:
        limits:
          memory: 512M  # Hard limit - container is killed if exceeded
          cpus: '0.5'   # CPU limits often affect memory usage patterns
        reservations:
          memory: 256M  # Soft limit - guaranteed minimum memory
          cpus: '0.25'  # Guaranteed CPU reservation
    environment:
      # Application-specific memory settings
      JAVA_OPTS: "-Xms256m -Xmx384m -XX:MaxMetaspaceSize=128m"
      NODE_OPTIONS: "--max-old-space-size=384"
    # Additional memory-related settings
    ulimits:
      memlock: -1  # Unlimited memlock for databases
    # OOM settings (requires host privileges)
    oom_score_adj: 500  # Higher value = higher chance of being killed by OOM killer
    # Memory swappiness control (0-100)
    mem_swappiness: 0   # Prefer to drop page cache than swap out memory
    # If using compose with swarm:
    restart_policy:
      condition: on-failure
      max_attempts: 3
      delay: 5s

  database:
    image: postgres:13
    deploy:
      resources:
        limits:
          memory: 1G
        reservations:
          memory: 512M
    environment:
      POSTGRES_PASSWORD: example
      # Database memory tuning
      POSTGRES_SHARED_BUFFERS: 256MB
      POSTGRES_EFFECTIVE_CACHE_SIZE: 768MB
      POSTGRES_WORK_MEM: 16MB

For production use, consider these additional memory optimization techniques:

  • Setting JVM/runtime-specific memory flags via environment variables
  • Adjusting application server parameters (workers, threads, connection pools)
  • Fine-tuning database memory allocation parameters
  • Implementing graceful degradation under memory pressure
  • Using health checks that monitor memory usage

Monitoring Memory

Comprehensive memory monitoring helps identify issues before they cause outages and provides data for proper sizing:

# Check container memory usage with pretty table format
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.CPUPerc}}"

# Get detailed memory stats in JSON format for processing
docker stats --no-stream --format "{{json .}}" container_name | jq '.'

# Extract specific memory metrics
docker stats --no-stream --format "{{.MemUsage}}" container_name | awk '{print $1}'

# Monitor memory usage over time (samples every 5 seconds)
while true; do 
  docker stats --no-stream --format "{{.Name}},{{.MemUsage}},{{.MemPerc}}" >> memory_log.csv
  sleep 5
done

# Check kernel memory events (OOM kills)
dmesg | grep -i "out of memory"

# Inspect container memory limits
docker inspect --format '{{.HostConfig.Memory}}' container_name

# For detailed memory metrics, use cAdvisor or Prometheus
# Example metrics to track:
# - container_memory_usage_bytes
# - container_memory_cache
# - container_memory_rss
# - container_memory_swap
# - container_memory_failcnt (memory limit hit count)
# - container_memory_mapped_file

# Find containers with no memory limits
docker ps -q | xargs docker inspect -f '{{.Name}} {{.HostConfig.Memory}}' | grep " 0$"

# Memory reservation vs actual usage comparison
docker ps -q | xargs docker inspect -f '{{.Name}}: Limit={{.HostConfig.Memory}} Reservation={{.HostConfig.MemoryReservation}}' | sort

Advanced memory analysis tools:

  • cAdvisor for detailed container memory metrics
  • Prometheus + Grafana for visualization and alerting
  • docker-top for process-level memory visibility within containers
  • memory-profiler for application-specific memory analysis
  • pmap for process memory mapping (requires privileged access)

Memory usage patterns

CPU Optimization

Proper CPU management ensures fair resource allocation, predictable performance, and efficient host utilization:

# Limit CPU usage (absolute limit in number of cores)
docker run -d --cpus=0.5 my-app:latest

# Set CPU shares (relative priority weight, default is 1024)
# Higher values get more CPU time during contention
docker run -d --cpu-shares=512 my-app:latest

# Pin to specific CPUs (useful for NUMA architectures or CPU-sensitive workloads)
docker run -d --cpuset-cpus="0,1" my-app:latest

# Configure real-time scheduling priority (requires privileged access)
docker run -d --cpu-rt-runtime=950000 --cpu-rt-period=1000000 my-app:latest

# Set CPU scheduling priority via nice value
docker run -d --cpu-shares=1024 --pids-limit=100 my-app:latest

# Throttle CPU CFS (Completely Fair Scheduler) period
docker run -d --cpu-period=100000 --cpu-quota=50000 my-app:latest

# Complex example with multiple CPU constraints
docker run -d \
  --cpus=1.5 \
  --cpu-shares=1024 \
  --cpuset-cpus="0,2,4" \
  --cpu-period=100000 \
  --cpu-quota=150000 \
  high-priority-app:latest

Understanding CPU allocation options:

  • cpus: Simple absolute limit (e.g., 0.5 = half a CPU core)
  • cpu-shares: Relative weight during contention (no effect when CPU is abundant)
  • cpuset-cpus: Hard binding to specific CPU cores
  • cpu-period/cpu-quota: Fine-grained CFS control (quota/period = CPU limit)

CPU optimization considerations:

  • Match CPU limits to application needs based on profiling
  • Consider NUMA effects for memory-intensive applications
  • Balance between isolation and efficient resource usage
  • Monitor CPU throttling metrics to detect misconfiguration
  • Consider CPU affinity for cache-sensitive workloads
  • Set appropriate limits for both burst and sustained workloads

Docker Daemon Optimization

The Docker daemon itself can be tuned for better performance, security, and resource utilization. These settings affect all containers on the host.

Daemon Configuration

Edit /etc/docker/daemon.json to configure global settings:

{
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 64000,
      "Soft": 64000
    },
    "nproc": {
      "Name": "nproc",
      "Hard": 32768,
      "Soft": 32768
    }
  },
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3",
    "compress": "true"
  },
  "max-concurrent-downloads": 10,
  "max-concurrent-uploads": 10,
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true",
    "overlay2.size=20G"
  ],
  "registry-mirrors": [
    "https://mirror.gcr.io",
    "https://registry-1.docker.io"
  ],
  "dns": ["8.8.8.8", "8.8.4.4"],
  "default-address-pools": [
    {"base": "172.30.0.0/16", "size": 24}
  ],
  "experimental": true,
  "metrics-addr": "0.0.0.0:9323",
  "features": {
    "buildkit": true
  },
  "live-restore": true,
  "default-runtime": "runc",
  "exec-opts": ["native.cgroupdriver=systemd"],
  "init": true,
  "no-new-privileges": true,
  "icc": false
}

Important daemon settings to consider:

  • storage-driver: Select appropriate driver for workload (overlay2 recommended)
  • live-restore: Keep containers running during daemon restart
  • max-concurrent-downloads/uploads: Adjust based on network and disk I/O
  • registry-mirrors: Set up pull-through cache for frequently used images
  • default-runtime: Select container runtime (runc, containerd, etc.)
  • init: Always use init process for better signal handling and zombie prevention
  • icc: Inter-container communication controls (false for better security)
  • default-ulimits: Set appropriate file descriptor and process limits
  • metrics-addr: Enable Prometheus metrics endpoint

Logging Configuration

  • Use appropriate log drivers
    • json-file: Default, stores logs as JSON files on host
    • local: Newer optimized local logging driver
    • syslog: Forward to syslog daemon
    • journald: Forward to systemd journal
    • splunk, awslogs, gelf: Send to external logging systems
    • none: Disable logging for performance-critical applications
  • Configure log rotation
    • Set max-size to limit individual log file size
    • Set max-file to control number of log files to retain
    • Enable compress to reduce disk usage
    • Consider time-based rotation for compliance requirements
    • Implement host-level log management for Docker's own logs
  • Consider centralized logging
    • Aggregate container logs in Elasticsearch, Splunk, or similar
    • Implement structured logging for better searchability
    • Configure proper retention policies for different log types
    • Set up monitoring and alerting based on log patterns
    • Preserve logs from ephemeral containers
  • Limit log file size
    • Prevent disk space exhaustion from chatty containers
    • Set appropriate limits based on application verbosity
    • Monitor log volume and growth rate
    • Alert on unusual log volume increases
    • Implement emergency pruning for runaway logging
  • Control logging verbosity
    • Configure application log levels appropriately
    • Filter irrelevant logs before storage
    • Implement sampling for high-volume log sources
    • Use environment variables to control log verbosity
    • Consider different verbosity for different environments

System-level Optimizations

  • Increase file descriptor limits in host OS
  • Configure appropriate I/O scheduler for storage devices
  • Adjust kernel parameters for container workloads:
    # /etc/sysctl.conf adjustments
    fs.file-max = 2097152
    fs.inotify.max_user_watches = 524288
    vm.max_map_count = 262144
    net.ipv4.ip_forward = 1
    
  • Disable SWAP for predictable container performance
  • Use high-performance storage for Docker data directory
  • Configure appropriate CPU governor for workload type

Development Optimization

Production Optimization

For production environments, optimize containers for reliability, security, performance, and observability:

version: '3.8'
services:
  optimized-app:
    image: my-app:latest
    # High availability settings
    restart: unless-stopped
    deploy:
      replicas: 2
      update_config:
        parallelism: 1
        delay: 10s
        order: start-first
        failure_action: rollback
      rollback_config:
        parallelism: 1
        delay: 5s
      resources:
        limits:
          cpus: '0.50'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 128M
    
    # Security hardening
    read_only: true
    tmpfs:
      - /tmp:size=64M,mode=1777
      - /var/cache:size=128M
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE
    security_opt:
      - no-new-privileges:true
      - seccomp=default.json
    
    # Health monitoring
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    
    # Efficient logging
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
        compress: "true"
        tag: "{{.Name}}/{{.ID}}"
    
    # Network optimization
    dns:
      - 1.1.1.1
      - 8.8.8.8
    dns_opt:
      - timeout:3
      - attempts:5
    
    # Application tuning
    environment:
      - NODE_ENV=production
      - SERVER_MAX_CONNECTIONS=1000
      - GOMAXPROCS=1
    ulimits:
      nofile:
        soft: 20000
        hard: 40000
    
    # Dependency management
    depends_on:
      database:
        condition: service_healthy
    
    # Secrets management
    secrets:
      - source: app_config
        target: /app/config.json
        mode: 0400
      - source: ssl_cert
        target: /app/cert.pem
        mode: 0400

  database:
    image: postgres:13-alpine
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
    volumes:
      - db-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
      POSTGRES_DB: appdb
    secrets:
      - db_password

volumes:
  db-data:
    driver: local
    driver_opts:
      type: 'none'
      o: 'bind'
      device: '/mnt/data/postgres'

secrets:
  app_config:
    file: ./config/production.json
  ssl_cert:
    file: ./config/ssl/cert.pem
  db_password:
    file: ./config/secrets/db_password.txt

Production optimization involves multiple dimensions:

  1. High availability: Replicas, rolling updates, health checks
  2. Security: Read-only filesystem, dropped capabilities, seccomp profiles
  3. Resource management: Precise CPU/memory limits and reservations
  4. Observability: Logging, health checks, monitoring
  5. Performance tuning: Application-specific optimizations
  6. Dependency management: Proper startup ordering and health checking
  7. Secrets handling: Secure injection of credentials and certificates
  8. Data persistence: Properly configured volumes with appropriate drivers

Performance Measurement

Systematic performance measurement is essential to validate optimizations and identify bottlenecks. Implement comprehensive benchmarking and monitoring practices to understand container performance characteristics.

Benchmarking

  • Use Docker stats for real-time metrics
    # Live monitoring with custom columns
    docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}\t{{.BlockIO}}"
    
    # Capture point-in-time statistics
    docker stats --no-stream > stats-$(date +%Y%m%d-%H%M%S).txt
    
    # Measure startup time
    time docker run --name test-startup my-app:latest
    
  • Implement application-level metrics
    • Instrument code with Prometheus client libraries
    • Measure critical business operations latency
    • Track resource utilization from within the application
    • Monitor cache hit rates and database query performance
    • Example metrics to track:
      • Request latency percentiles (p50, p95, p99)
      • Request throughput (requests per second)
      • Error rates and types
      • Business transactions per second
      • Queue depths and processing times
  • Compare before and after optimizations
    • Create reproducible benchmark scenarios
    • Document baseline performance metrics
    • Isolate variables when testing changes
    • Measure impact on both average and tail latencies
    • Consider cost implications of optimizations
    • Example benchmarking workflow:
      # Run load test on current version
      ./load-test.sh --target=v1 --duration=10m > results-v1.txt
      
      # Apply optimization
      docker build -t my-app:v2 -f Dockerfile.optimized .
      
      # Run identical load test on optimized version
      ./load-test.sh --target=v2 --duration=10m > results-v2.txt
      
      # Compare results
      ./compare-results.sh results-v1.txt results-v2.txt
      
  • Track build and startup times
    • Measure full CI/CD pipeline duration
    • Break down build phases for targeted optimization
    • Track image pull times in different environments
    • Measure time-to-first-request for applications
    • Monitor cold start vs. warm start performance
    • Example measurement script:
      #!/bin/bash
      echo "Build time:"
      time docker build -t test-app .
      
      echo "Image size:"
      docker images test-app --format "{{.Size}}"
      
      echo "Pull time (simulated):"
      docker save test-app | wc -c | awk '{print $1/1024/1024 " MB"}'
      docker rmi test-app
      time docker load < test-app.tar
      
      echo "Startup time:"
      time docker run --rm test-app
      
  • Measure image sizes
    • Track layer composition and sizes
    • Compare compressed vs. uncompressed image sizes
    • Analyze image composition with tools like dive
    • Correlate image size with pull time
    • Track evolution of image size over time
    • Example commands:
      # Basic image size
      docker images my-app
      
      # Detailed breakdown with dive
      docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock wagoodman/dive:latest my-app:latest
      
      # Layer sizes and history
      docker history --no-trunc --format "table {{.CreatedBy}}\t{{.Size}}" my-app:latest
      
      # Calculate compressed size (approximating pull size)
      docker save my-app:latest | gzip -c | wc -c
      

Monitoring Tools

  • Docker stats: Basic built-in resource monitoring
    # Continuous monitoring of all containers
    docker stats
    
    # Custom format for specific metrics
    docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
    
  • cAdvisor: Container-specific metrics with historical data
    # Run cAdvisor container
    docker run \
      --volume=/:/rootfs:ro \
      --volume=/var/run:/var/run:ro \
      --volume=/sys:/sys:ro \
      --volume=/var/lib/docker/:/var/lib/docker:ro \
      --publish=8080:8080 \
      --detach=true \
      --name=cadvisor \
      --privileged \
      --device=/dev/kmsg \
      gcr.io/cadvisor/cadvisor:v0.45.0
    
  • Prometheus: Time-series metrics storage and querying
    • Collect container and host metrics
    • Store historical performance data
    • Create alerts based on performance thresholds
    • PromQL examples for container monitoring:
      # CPU utilization by container
      rate(container_cpu_usage_seconds_total{name=~".+"}[1m])
      
      # Memory usage by container
      container_memory_usage_bytes{name=~".+"}
      
      # Disk I/O by container
      rate(container_fs_reads_bytes_total{name=~".+"}[1m])
      
  • Grafana dashboards: Visualization and dashboarding
    • Create custom dashboards for different stakeholders
    • Set up alerts based on performance thresholds
    • Visualize trends over time
    • Correlate metrics across system components
    • Example dashboard panels:
      • Container CPU and memory usage
      • Network I/O by container
      • Disk operations and throughput
      • Application-specific metrics
      • Health check status and history
  • Docker Desktop metrics: Local development monitoring
    • Resource usage monitoring in Docker Desktop
    • Container inspection and troubleshooting
    • Network traffic visualization
    • Disk usage analysis
    • Extension plugins for enhanced monitoring
  • Specialized monitoring tools
    • Sysdig: Deep container and system monitoring
    • Datadog: Commercial APM and infrastructure monitoring
    • New Relic: Application and infrastructure performance
    • Dynatrace: Full-stack monitoring with AI capabilities
    • Elastic APM: Open-source application performance monitoring

Cleanup and Maintenance

Regular cleanup and maintenance are essential for maintaining a healthy Docker environment, preventing disk space issues, and optimizing performance:

# Remove unused containers (stopped containers)
docker container prune

# Remove unused images (dangling images not referenced by any container)
docker image prune

# Remove unused images more aggressively (including unused tagged images)
docker image prune -a

# Remove unused volumes (not referenced by any container)
docker volume prune

# Remove unused networks
docker network prune

# Remove all unused objects (containers, images, volumes, networks)
docker system prune -a

# Remove all unused objects including volumes (use with caution!)
docker system prune -a --volumes

# Remove containers based on status
docker rm $(docker ps -q -f status=exited)

# Remove old containers (created more than a week ago)
docker container prune --filter "until=168h"

# Clean up specific images by pattern
docker images | grep "pattern" | awk '{print $3}' | xargs docker rmi

# View disk usage details
docker system df -v

# Set up scheduled cleanup (daily prune of dangling resources)
cat > /etc/cron.daily/docker-cleanup << 'EOF'
#!/bin/bash
/usr/bin/docker container prune -f --filter "until=24h"
/usr/bin/docker image prune -f
/usr/bin/docker network prune -f --filter "until=24h"
EOF
chmod +x /etc/cron.daily/docker-cleanup

# More conservative weekly cleanup (including unused images)
cat > /etc/cron.weekly/docker-deep-cleanup << 'EOF'
#!/bin/bash
/usr/bin/docker system prune -f
EOF
chmod +x /etc/cron.weekly/docker-deep-cleanup

# Log rotation for container logs
cat > /etc/logrotate.d/docker-container-logs << 'EOF'
/var/lib/docker/containers/*/*.log {
    rotate 7
    daily
    compress
    missingok
    delaycompress
    copytruncate
}
EOF

# Cleanup script with safety checks and reporting
cat > /usr/local/bin/docker-smart-cleanup << 'EOF'
#!/bin/bash
set -e

echo "=== Docker Cleanup Report ($(date)) ==="
echo "Before cleanup:"
docker system df

# Stop containers idle for more than 1 week
idle_containers=$(docker ps -q -f status=running --format "{{.ID}} {{.CreatedAt}}" | awk '$2 <= "'$(date -d 'now - 1 week' +'%Y-%m-%d')'" {print $1}')
if [ -n "$idle_containers" ]; then
  echo "Stopping idle containers: $idle_containers"
  docker stop $idle_containers
fi

# Remove exited containers
docker container prune -f

# Remove dangling images
docker image prune -f

# Remove unused networks
docker network prune -f

# Check disk usage
current_usage=$(df -h /var/lib/docker | awk 'NR==2 {print $5}' | sed 's/%//')
if [ "$current_usage" -gt 80 ]; then
  echo "High disk usage detected ($current_usage%), performing deep cleanup..."
  # Tagged but unused images older than 1 month
  docker image prune -a -f --filter "until=720h"
fi

echo "After cleanup:"
docker system df

echo "Top 10 largest images:"
docker images --format "{{.Size}}\t{{.Repository}}:{{.Tag}}" | sort -hr | head -n 10
EOF
chmod +x /usr/local/bin/docker-smart-cleanup

Maintenance best practices:

  1. Regular pruning: Implement scheduled cleanups with appropriate filters
  2. Monitoring: Set up alerts for disk space usage in Docker directories
  3. Tiered approach: Use different cleanup strategies based on environment criticality
  4. Retention policies: Define how long to keep unused resources
  5. Image lifecycle: Establish policies for image versioning and cleanup
  6. Backup important data: Ensure volumes with important data are backed up
  7. Audit image usage: Regularly review which images are actively used
  8. Log management: Configure proper log rotation and retention

Advanced Optimization Techniques

Optimization Checklist

A comprehensive approach to Docker optimization should cover all aspects of the container lifecycle, from image building to runtime performance.

Image Optimization

  • Multi-stage builds
    • Separate build-time and runtime dependencies
    • Use appropriate base images for each stage
    • Copy only necessary artifacts between stages
    • Consider separate stages for testing and production
  • Minimal base images
    • Use lightweight variants (alpine, slim, distroless)
    • Remove unnecessary packages and tools
    • Select appropriate base for compatibility vs. size
    • Consider custom organizational base images
    • Keep base images updated for security
  • Layer optimization
    • Combine related operations in single RUN instructions
    • Order layers from least to most frequently changed
    • Clean up package manager caches in same layer
    • Use multi-line arguments for readability
    • Keep layer count reasonable (under 20 layers)
  • Proper caching
    • Leverage BuildKit's enhanced caching
    • Use .dockerignore to prevent unnecessary context
    • Split dependencies and application code
    • Consider explicit cache mounting for dependencies
    • Implement registry-based caching for CI/CD
  • Efficient COPY and RUN
    • Copy only what's needed (avoid COPY . .)
    • Use wildcard patterns for related files
    • Implement chained commands with proper cleanup
    • Set appropriate working directories
    • Use ADD only when extract functionality is needed

Build Optimization

  • BuildKit enabled
    • Set DOCKER_BUILDKIT=1 environment variable
    • Configure in daemon.json for system-wide usage
    • Use BuildKit-specific features when appropriate
    • Implement SSH and secret mounting for secure builds
    • Leverage parallel stage execution
  • Ordered instructions
    • Place infrequently changing commands first
    • Group related commands logically
    • Split long RUN commands strategically
    • Organize for maximum cache efficiency
    • Maintain readability and maintainability
  • Build caching
    • Implement proper layer caching strategy
    • Use cache mounts for package managers
    • Consider external cache storage for CI/CD
    • Implement cache warming for frequently built images
    • Cleanup unused cache periodically
  • CI/CD integration
    • Configure appropriate caching in pipelines
    • Implement matrix builds where appropriate
    • Use build arguments for environment-specific builds
    • Tag images with meaningful metadata
    • Implement vulnerability scanning in pipeline
  • Parallel builds
    • Design multi-stage builds for parallelization
    • Keep stages independent where possible
    • Leverage BuildKit's automatic parallelization
    • Scale build infrastructure appropriately
    • Monitor build performance metrics

Runtime Optimization

  • Resource limits
    • Set appropriate CPU and memory constraints
    • Configure proper swap behavior
    • Implement resource monitoring
    • Use resource reservations in orchestration
    • Balance resource efficiency and reliability
  • Proper networking
    • Select appropriate network drivers
    • Configure DNS settings optimally
    • Implement connection pooling
    • Consider service mesh for complex applications
    • Monitor network performance metrics
  • Volume optimization
    • Use appropriate volume drivers for workloads
    • Implement caching strategies for mounted volumes
    • Configure proper mount options
    • Monitor volume performance
    • Consider specialized storage for I/O-intensive workloads
  • Logging configuration
    • Select appropriate logging drivers
    • Implement log rotation and size limits
    • Consider structured logging formats
    • Configure centralized logging
    • Implement log level management
  • Monitoring setup
    • Deploy comprehensive container monitoring
    • Implement health checks
    • Configure alerts for resource constraints
    • Track application-specific metrics
    • Implement tracing for distributed systems

Troubleshooting Performance