Docker Swarm is Docker's native clustering and orchestration solution that turns a group of Docker hosts into a single virtual host, providing high availability and load balancing for containerized applications. Unlike external orchestrators, Swarm is built into the Docker engine, offering a simpler approach to container orchestration with minimal setup overhead while maintaining the familiar Docker CLI experience.
Manager nodes: Maintain cluster state, orchestrate services, and handle API requests; recommended to have 3, 5, or 7 for high availability
Distributed state store (Raft consensus): Built-in database that maintains consistent cluster state across manager nodes; requires a majority of managers (N/2+1) to be available
API endpoints: Expose the Docker API for cluster management; automatically load-balanced across manager nodes
Orchestration decision making: Schedule tasks, reconcile desired state, handle node failures, and perform rolling updates
High availability configuration: Replicated state across multiple manager nodes with leader election for fault tolerance
# Initialize a new swarm (on manager node)
docker swarm init --advertise-addr <MANAGER-IP>
# --advertise-addr specifies the address other nodes will use to connect to this manager
# If omitted, Docker will attempt to auto-detect the IP, which may be incorrect in multi-interface servers
# Output will provide a token to join worker nodes
# Example output:
# docker swarm join --token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb0xx14ie39trti4wxv-8vxv8rssmk743ojnwacrr2e7c <MANAGER-IP>:2377
# This token authenticates new nodes to the swarm, ensuring secure cluster expansion
# Join a node as worker
docker swarm join --token <TOKEN> <MANAGER-IP>:2377
# Workers receive and execute tasks but cannot manage the cluster state
# Port 2377 is the standard swarm management port
# Join a node as manager
docker swarm join-token manager
# This command generates a different token specifically for manager nodes
# Then use the provided token to join as manager
docker swarm join --token <MANAGER-TOKEN> <MANAGER-IP>:2377
# Manager nodes participate in the Raft consensus and can manage the cluster
# List nodes in the swarm
docker node ls
# Shows all nodes with their roles, availability status, and manager status
# MANAGER STATUS column shows "Leader" for the primary manager node
In Docker Swarm, services are the definitions of tasks to execute on the manager or worker nodes:
Services define the container image, replicas, networks, and more, forming the core abstraction layer for deploying applications
Tasks are the individual containers running across the swarm, with each task corresponding to one container instance
The scheduler distributes tasks across available nodes based on resource availability, placement constraints, and service requirements
The orchestrator maintains the desired state by continuously monitoring and reconciling the actual state with the desired configuration, automatically replacing failed containers and handling scaling operations
Services follow a declarative model, where you specify the desired state of your application (what you want to happen), and Swarm continuously works to ensure that state is maintained, handling failures and updates automatically.
# Create a replicated service
docker service create --name webserver \
--replicas 3 \ # Deploy 3 instances of the container
--publish 80:80 \ # Publish port 80 on all swarm nodes
--update-delay 10s \ # Wait 10s between updating each container
--update-parallelism 1 \ # Update one container at a time
--restart-condition on-failure \ # Automatically restart containers that exit non-zero
nginx:latest # Container image to use
# List services
docker service ls
# Shows all services with their ID, name, mode (replicated/global), replicas, image, and ports
# Inspect a service
docker service inspect webserver
# Provides detailed JSON output of the service configuration, including:
# - Container configuration
# - Resource constraints
# - Networks
# - Update and rollback configuration
# - Placement constraints
# View service tasks (containers)
docker service ps webserver
# Shows all tasks (containers) for the service with their:
# - ID, name, image
# - Node assignment
# - Desired and current state
# - Error messages (if any)
# - When the task was created and updated
# Scale a service
docker service scale webserver=5
# Increases or decreases the number of replicas
# Swarm will create or remove containers to match the desired count
# Use comma-separated list to scale multiple services: webserver=5,api=3
# Update a service
docker service update --image nginx:1.21 \ # Change the image version
--limit-cpu 0.5 \ # Add CPU limits
--limit-memory 512M \ # Add memory limits
--rollback-parallelism 2 \ # Configure rollback behavior
webserver
# Remove a service
docker service rm webserver
# Removes the service and stops all associated containers
# Create service with specific publishing mode
docker service create --name web \
--publish mode=host,target=80,published=8080 \
nginx
# 'host' mode bypasses the routing mesh but requires manual port assignment per node
Docker Swarm provides built-in service discovery, allowing containers to find and communicate with each other using service names:
# Service discovery example
docker service create --name db \
--network backend-network \ # Join the overlay network
--mount type=volume,source=db-data,target=/var/lib/postgresql/data \ # Persistent storage
--env POSTGRES_PASSWORD_FILE=/run/secrets/db_password \ # Use secrets for security
--replicas 1 \ # Database typically runs a single instance
postgres:13 # Database image
docker service create --name api \
--network backend-network \ # Same network as the db service
--env DB_HOST=db \ # Reference the database by service name
--env DB_PORT=5432 \ # Standard PostgreSQL port
--replicas 3 \ # Run multiple API instances
--update-order start-first \ # Start new tasks before stopping old ones during updates
myapi:latest # API image
Service discovery works through:
Internal DNS server: Every container in Swarm can query the embedded DNS server
Service VIPs (Virtual IPs): Each service gets a virtual IP in the overlay network
DNS round-robin: Containers can resolve service names to all task IPs using DNS
Example connection from inside a container:
# Connect to database from inside the api container
docker exec -it $(docker ps -q -f name=api) bash
ping db # Resolves to the service VIP
psql -h db -U postgres -d myapp # Connect to database using service name
For services with multiple replicas, connections are automatically load-balanced:
# Create frontend service that connects to the API
docker service create --name frontend \
--network backend-network \
--env API_URL=http://api:8000/ \ # API service name is automatically resolved and load-balanced
--publish 80:80 \
frontend:latest
Stacks allow you to define and deploy complete applications with multiple services using a compose file:
# Deploy a stack
docker stack deploy -c docker-compose.yml myapp
# This creates all services, networks, and volumes defined in the compose file
# All resources are labeled with the stack name for easy management
# Services are named as: <stack-name>_<service-name>
# List stacks
docker stack ls
# Shows all stacks with the number of services in each
# List services in a stack
docker stack services myapp
# Shows detailed information about each service in the stack, including:
# - Replica status (current/desired)
# - Image and configuration
# - Port mappings
# List tasks in a stack
docker stack ps myapp
# Shows all running containers across the stack with:
# - Status (running, failed, etc.)
# - Error messages if applicable
# - Node assignment
# - Current and desired state
# Remove a stack
docker stack rm myapp
# Removes all services, networks created by the stack
# Does NOT remove volumes to protect data
Stacks provide several advantages:
Declarative deployment: Define entire application topology in a single file
Version control: Store compose files in source control for application versioning
Application integrity: Deploy or remove all components together
Simplified management: Group-related services for easier monitoring and updates
Resource isolation: Each stack has its own namespace for services
# docker-compose.yml for Swarm deployment
version: '3.8' # Compose file format version with swarm support
services:
web:
image: nginx:latest
ports:
- "80:80" # Published port (accessible from outside)
deploy:
mode: replicated # 'replicated' (scaled) or 'global' (one per node)
replicas: 3 # Number of container instances
update_config:
parallelism: 1 # Update one container at a time
delay: 10s # Wait 10s between updates
order: start-first # Start new tasks before stopping old ones
failure_action: rollback # Auto-rollback on failure
restart_policy:
condition: on-failure # Restart if container exits non-zero
max_attempts: 3 # Maximum restart attempts
window: 120s # Time window to evaluate restart attempts
placement:
constraints:
- node.role == worker # Only run on worker nodes
preferences:
- spread: node.labels.datacenter # Spread across datacenters
resources:
limits:
cpus: '0.5' # Maximum CPU usage
memory: 512M # Maximum memory usage
reservations:
cpus: '0.1' # Guaranteed CPU allocation
memory: 128M # Guaranteed memory allocation
api:
image: myapi:latest
deploy:
replicas: 3
placement:
constraints:
- node.labels.zone == frontend # Only run on nodes with this label
max_replicas_per_node: 1 # Limit instances per node for HA
update_config:
parallelism: 2 # Update two containers at once
rollback_config:
parallelism: 3 # Faster rollback if needed
environment:
- DB_HOST=db # Service discovery by name
- API_KEY_FILE=/run/secrets/api_key # Reference a secret
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s # Initial grace period
networks:
- backend # Connect to backend network
secrets:
- api_key # Reference to secret defined below
db:
image: postgres:13
volumes:
- db-data:/var/lib/postgresql/data # Persistent storage
deploy:
placement:
constraints:
- node.labels.zone == database # Only run on database nodes
replicas: 1 # Single database instance
restart_policy:
condition: any # Always restart
environment:
- POSTGRES_PASSWORD_FILE=/run/secrets/db_password
networks:
- backend
secrets:
- db_password
networks:
backend:
driver: overlay # Multi-host network
attachable: true # Allow standalone containers to connect
driver_opts:
encrypted: "true" # Encrypt traffic on this network
volumes:
db-data: # Named volume for persistent data
driver: local # Use local storage (default)
# For production, consider using a volume driver that supports replication
secrets:
api_key:
file: ./secrets/api_key.txt # Load from local file during deployment
db_password:
file: ./secrets/db_password.txt
# Create a secret
echo "mypassword" | docker secret create db_password -
# Secrets are stored encrypted in the Raft log
# They are never written to disk unencrypted
# `-` at the end indicates input from stdin
# Create a service with a secret
docker service create --name db \
--secret db_password \ # Reference the created secret
--env POSTGRES_PASSWORD_FILE=/run/secrets/db_password \ # Tell application where to find it
--secret source=ssl_cert,target=server.crt \ # Custom target path for a secret
--secret source=ssl_key,target=server.key,mode=0400 \ # Custom file permissions
postgres:13
# Inside the container, secrets appear as files in /run/secrets/
# Example:
# /run/secrets/db_password
# /run/secrets/server.crt
# /run/secrets/server.key
# List secrets
docker secret ls
# Shows all secrets with creation time and ID
# Inspect a secret (shows metadata only, not the actual content)
docker secret inspect db_password
# Remove a secret
docker secret rm db_password
# Note: Cannot remove secrets used by running services
# Must update or remove services first
# Create a secret from a file
docker secret create ssl_cert ./server.crt
Configs allow you to store and distribute configuration files across the swarm:
# Create a config
docker config create nginx_conf nginx.conf
# Configs are similar to secrets but intended for non-sensitive data
# Can be larger than secrets (up to several MB)
# Also stored in the Swarm Raft log for distribution
# Create service with config
docker service create --name webserver \
--config source=nginx_conf,target=/etc/nginx/nginx.conf \ # Mount at specific path
--config source=proxy_params,target=/etc/nginx/proxy.conf,mode=0444 \ # Read-only
--publish 80:80 \
nginx:latest
# Inside the container, configs appear as files at the specified paths
# Unlike volumes, configs are immutable after creation
# To update a config, create a new one and update the service
# List configs
docker config ls
# Inspect a config
docker config inspect nginx_conf
# Shows metadata and creation information
# Remove a config
docker config rm nginx_conf
# Like secrets, configs in use cannot be removed
# Update a service to use a new config
docker service update --config-rm nginx_conf --config-add source=nginx_conf_v2,target=/etc/nginx/nginx.conf webserver
Configs are ideal for:
Application configuration files
Static web content
Default settings
SSL certificates (if not sensitive)
Any file that needs to be identical across all service instances
# Create a service with resource constraints
docker service create --name resource-limited \
--limit-cpu 0.5 \ # Maximum CPU usage (50% of one core)
--limit-memory 512M \ # Maximum memory usage (512MB)
--reserve-cpu 0.2 \ # Guaranteed CPU (20% of one core)
--reserve-memory 256M \ # Guaranteed memory (256MB)
--reserve-memory-swap 512M \ # Total memory+swap reservation
--generic-resource "gpu=1" \ # Request 1 GPU (requires node labels)
--ulimit nofile=65536:65536 \ # Set file descriptor limits
--ulimit nproc=4096:4096 \ # Process limits
nginx:latest
# Resource constraints ensure:
# 1. Fair resource distribution across services
# 2. Protection against noisy neighbors
# 3. Predictable performance
# 4. Efficient bin-packing of containers
# The scheduler uses reservations to make placement decisions
# Limits enforce maximum resource usage
# A node must have available resources matching the reservation
# Update with rolling update strategy
docker service update \
--image nginx:1.21 \ # New image to deploy
--update-parallelism 2 \ # Update 2 tasks at a time
--update-delay 20s \ # Wait 20s between updating each batch
--update-order start-first \ # Start new tasks before stopping old ones
--update-failure-action pause \ # Pause updates if a task fails
--update-monitor 30s \ # Monitor new tasks for 30s before proceeding
--update-max-failure-ratio 0.2 \ # Allow 20% of tasks to fail before pausing
--health-cmd "curl -f http://localhost/ || exit 1" \ # Health check command
--health-interval 5s \ # Check health every 5s during update
--health-retries 3 \ # Number of retries for health check
webserver
# The rolling update process:
# 1. Start 2 new containers with the updated image
# 2. Wait for them to be healthy (30s monitoring period)
# 3. If healthy, stop 2 old containers
# 4. Wait 20s (update-delay)
# 5. Repeat until all containers are updated
# Rollback to previous version
docker service update --rollback webserver
# This reverts to the configuration before the last update
# Configure rollback behavior
docker service update \
--rollback-parallelism 3 \ # Rollback 3 tasks at a time
--rollback-delay 10s \ # Wait 10s between batches
--rollback-failure-action continue \ # Continue rollback even if tasks fail
--rollback-order stop-first \ # Stop old tasks before starting new ones
--rollback-monitor 10s \ # Monitor new tasks for 10s before proceeding
webserver
# Check update/rollback status
docker service inspect --pretty webserver
# Shows current update status, progress, and configuration
# View update history
docker service ps --no-trunc webserver
# Shows all previous versions of tasks with their images
Rolling updates and rollbacks allow you to:
Deploy new versions without downtime
Test updates with canary deployments (partial updates)
Quickly revert problematic changes
Maintain control over the update process and timing
# List nodes
docker node ls
# Output shows:
# - NODE ID: Unique identifier for each node
# - HOSTNAME: Node's hostname
# - STATUS: Ready, Down, or Disconnected
# - AVAILABILITY: Active, Pause, or Drain
# - MANAGER STATUS: Leader, Reachable, or Unreachable (for manager nodes)
# - ENGINE VERSION: Docker engine version
# Inspect node
docker node inspect node-1
# Returns detailed information about the node:
# - Labels and node attributes
# - Resource availability
# - Platform and architecture
# - Network addresses
# - Join tokens and certificates
# - Status and health
# Format inspect output for specific information
docker node inspect -f '{{.Status.Addr}}' node-1
# Returns just the node's IP address
# Set node availability
docker node update --availability drain node-1 # Prepare for maintenance
# Drain: Stops scheduling new tasks and removes existing tasks from the node
# Existing tasks are gracefully rescheduled to other nodes
# Perfect for maintenance operations and updates
docker node update --availability active node-1 # Return to service
# Active: Node accepts new tasks and retains existing tasks
# Pause: Node keeps existing tasks but won't receive new ones
# Add labels to nodes
docker node update --label-add zone=frontend node-1
docker node update --label-add datacenter=east --label-add cpu=high-performance node-1
# Labels enable sophisticated scheduling strategies:
# - Geographic distribution
# - Hardware differentiation
# - Environment separation (prod/staging)
# - Specialized workload targeting
# Filter nodes by labels
docker node ls --filter node.label=zone=frontend
# Promote worker to manager
docker node promote node-2
# Converts worker to manager role
# Joins the Raft consensus group
# Receives a copy of the cluster state
# Can now accept management commands
# Demote manager to worker
docker node demote node-2
# Removes node from management plane
# If this would break quorum, the operation fails
# Best practice: Demote before removing manager nodes
# Remove a node from the swarm
# First, on the node to remove:
docker swarm leave
# Then on a manager node:
docker node rm node-3
# Backup swarm state (on manager node)
# First, create a full backup
tar -czvf swarm-backup.tar.gz /var/lib/docker/swarm
# This includes:
# - Raft logs and consensus data
# - TLS certificates and keys
# - Secret data (encrypted)
# - Node information and join tokens
# For a more comprehensive backup, include:
docker service ls > services.txt
docker service inspect $(docker service ls -q) > service-details.json
docker secret ls > secrets.txt
docker config ls > configs.txt
docker network ls --filter driver=overlay > networks.txt
# Restore swarm
# 1. Stop Docker on the manager
systemctl stop docker
# 2. Restore the backup
tar -xzvf swarm-backup.tar.gz -C /
# Backup should be restored to the same path structure
# 3. Start Docker
systemctl start docker
# Docker will initialize using the restored swarm state
# The node resumes its role (leader or follower)
# 4. Verify the restore
docker node ls
docker service ls
# Alternative backup approach using manager UCP
# Save the UCP backup file which includes swarm configuration
docker container run --log-driver none --rm -i --name ucp \
-v /var/run/docker.sock:/var/run/docker.sock \
docker/ucp:latest backup --id $(docker container ls -q --filter name=ucp-controller) \
--passphrase "secret" > ucp-backup.tar.gz
For a complete disaster recovery plan:
Regular automated backups of swarm state
Documentation of all custom configurations
Scripts to recreate services if backup is unavailable
Testing of restore procedures in a sandbox environment
# Check swarm status
docker info | grep Swarm
# Verify if node is in swarm mode and its role (manager/worker)
# View task logs
docker service logs service_name
# Add --details to see metadata including which node is running each task
docker service logs --details service_name
# Check failed tasks
docker service ps --filter "desired-state=running" --filter "actual-state=failed" service_name
# Shows tasks that should be running but failed, with error messages
# View task history
docker service ps --no-trunc service_name
# Shows all tasks including previously failed attempts
# Verify network connectivity
docker run --rm --network service_network alpine ping service_name
# Tests DNS resolution and connectivity within overlay network
# Check service configuration
docker service inspect --pretty service_name
# Human-readable service configuration
# Examine network configuration
docker network inspect service_network
# Shows network details, connected containers, and subnet information
# Check node status and health
docker node inspect --pretty node_name
# Human-readable node information including health status
# View service constraints
docker service inspect --format '{{.Spec.TaskTemplate.Placement.Constraints}}' service_name
# Shows placement constraints that might prevent scheduling
# Debug service discovery
docker run --rm --network service_network nicolaka/netshoot \
dig service_name
# Advanced network debugging using specialized tools