Welcome to from-docker-to-kubernetes

Docker Swarm for Orchestration

Learn how to use Docker's native clustering and orchestration capabilities with Docker Swarm

Docker Swarm

Docker Swarm is Docker's native clustering and orchestration solution that turns a group of Docker hosts into a single virtual host, providing high availability and load balancing for containerized applications. Unlike external orchestrators, Swarm is built into the Docker engine, offering a simpler approach to container orchestration with minimal setup overhead while maintaining the familiar Docker CLI experience.

Swarm Architecture

Control Plane

  • Manager nodes: Maintain cluster state, orchestrate services, and handle API requests; recommended to have 3, 5, or 7 for high availability
  • Distributed state store (Raft consensus): Built-in database that maintains consistent cluster state across manager nodes; requires a majority of managers (N/2+1) to be available
  • API endpoints: Expose the Docker API for cluster management; automatically load-balanced across manager nodes
  • Orchestration decision making: Schedule tasks, reconcile desired state, handle node failures, and perform rolling updates
  • High availability configuration: Replicated state across multiple manager nodes with leader election for fault tolerance

Data Plane

  • Worker nodes: Execute containerized workloads as assigned by managers; can scale to thousands of nodes
  • Task execution: Run container instances as directed by the orchestration layer with health monitoring
  • Container runtime: Standard Docker engine running on each node for consistent container execution
  • Network overlay: Multi-host networking with automatic service discovery and load balancing
  • Load distribution: Spread services across nodes based on placement constraints, resource availability, and high availability requirements

Setting Up a Swarm Cluster

# Initialize a new swarm (on manager node)
docker swarm init --advertise-addr <MANAGER-IP>
# --advertise-addr specifies the address other nodes will use to connect to this manager
# If omitted, Docker will attempt to auto-detect the IP, which may be incorrect in multi-interface servers

# Output will provide a token to join worker nodes
# Example output:
# docker swarm join --token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb0xx14ie39trti4wxv-8vxv8rssmk743ojnwacrr2e7c <MANAGER-IP>:2377
# This token authenticates new nodes to the swarm, ensuring secure cluster expansion

# Join a node as worker
docker swarm join --token <TOKEN> <MANAGER-IP>:2377
# Workers receive and execute tasks but cannot manage the cluster state
# Port 2377 is the standard swarm management port

# Join a node as manager
docker swarm join-token manager
# This command generates a different token specifically for manager nodes
# Then use the provided token to join as manager
docker swarm join --token <MANAGER-TOKEN> <MANAGER-IP>:2377
# Manager nodes participate in the Raft consensus and can manage the cluster

# List nodes in the swarm
docker node ls
# Shows all nodes with their roles, availability status, and manager status
# MANAGER STATUS column shows "Leader" for the primary manager node

Services and Tasks

Creating and Managing Services

# Create a replicated service
docker service create --name webserver \
  --replicas 3 \                        # Deploy 3 instances of the container
  --publish 80:80 \                     # Publish port 80 on all swarm nodes
  --update-delay 10s \                  # Wait 10s between updating each container
  --update-parallelism 1 \              # Update one container at a time
  --restart-condition on-failure \      # Automatically restart containers that exit non-zero
  nginx:latest                          # Container image to use

# List services
docker service ls
# Shows all services with their ID, name, mode (replicated/global), replicas, image, and ports

# Inspect a service
docker service inspect webserver
# Provides detailed JSON output of the service configuration, including:
# - Container configuration
# - Resource constraints
# - Networks
# - Update and rollback configuration
# - Placement constraints

# View service tasks (containers)
docker service ps webserver
# Shows all tasks (containers) for the service with their:
# - ID, name, image
# - Node assignment
# - Desired and current state
# - Error messages (if any)
# - When the task was created and updated

# Scale a service
docker service scale webserver=5
# Increases or decreases the number of replicas
# Swarm will create or remove containers to match the desired count
# Use comma-separated list to scale multiple services: webserver=5,api=3

# Update a service
docker service update --image nginx:1.21 \  # Change the image version
  --limit-cpu 0.5 \                         # Add CPU limits
  --limit-memory 512M \                     # Add memory limits  
  --rollback-parallelism 2 \                # Configure rollback behavior
  webserver

# Remove a service
docker service rm webserver
# Removes the service and stops all associated containers

Swarm Networking

Overlay Networks

# Create an overlay network
docker network create --driver overlay \     # Multi-host network driver
  --attachable \                             # Allow standalone containers to connect
  --subnet 10.0.9.0/24 \                     # Define custom subnet (optional)
  --gateway 10.0.9.1 \                       # Define custom gateway (optional)
  --opt encrypted \                           # Enable encryption for traffic (optional)
  backend-network                            # Network name

# Create service with network
docker service create --name api \
  --network backend-network \                # Attach to the overlay network
  --replicas 3 \                             # Run 3 instances
  --endpoint-mode dnsrr \                    # DNS round-robin mode (alternative to VIP)
  myapi:latest

Overlay networks provide:

  • Cross-host communication between containers
  • Isolated network segments for multi-tier applications
  • Internal DNS resolution for service discovery
  • Optional encryption for sensitive traffic

Ingress Network

  • Built-in load balancing: Transparent load balancing for published ports using Linux IPVS
  • Routing mesh for service ports: Expose services on every node, regardless of whether the node is running instances of that service
  • Distributes requests across nodes: Route incoming requests to any node to an active container, even if it's running on a different node
  • Automatic service discovery: Services can communicate by name without manual linking or IP configuration
  • Container-to-container communication: Containers can communicate securely across hosts within the same overlay network

The ingress network uses a stateless load balancing mechanism:

[External Client] → [Any Swarm Node Port 80] → [Routing Mesh] → [Container Running Service]

For more control over routing:

# Create service with specific publishing mode
docker service create --name web \
  --publish mode=host,target=80,published=8080 \
  nginx
# 'host' mode bypasses the routing mesh but requires manual port assignment per node

Service Discovery

Docker Swarm provides built-in service discovery, allowing containers to find and communicate with each other using service names:

# Service discovery example
docker service create --name db \
  --network backend-network \               # Join the overlay network
  --mount type=volume,source=db-data,target=/var/lib/postgresql/data \  # Persistent storage
  --env POSTGRES_PASSWORD_FILE=/run/secrets/db_password \  # Use secrets for security
  --replicas 1 \                            # Database typically runs a single instance
  postgres:13                               # Database image

docker service create --name api \
  --network backend-network \               # Same network as the db service
  --env DB_HOST=db \                        # Reference the database by service name
  --env DB_PORT=5432 \                      # Standard PostgreSQL port
  --replicas 3 \                            # Run multiple API instances
  --update-order start-first \              # Start new tasks before stopping old ones during updates
  myapi:latest                              # API image

Service discovery works through:

  1. Internal DNS server: Every container in Swarm can query the embedded DNS server
  2. Service VIPs (Virtual IPs): Each service gets a virtual IP in the overlay network
  3. DNS round-robin: Containers can resolve service names to all task IPs using DNS

Example connection from inside a container:

# Connect to database from inside the api container
docker exec -it $(docker ps -q -f name=api) bash
ping db       # Resolves to the service VIP
psql -h db -U postgres -d myapp  # Connect to database using service name

For services with multiple replicas, connections are automatically load-balanced:

# Create frontend service that connects to the API
docker service create --name frontend \
  --network backend-network \
  --env API_URL=http://api:8000/ \  # API service name is automatically resolved and load-balanced
  --publish 80:80 \
  frontend:latest

Swarm Deployment with Stacks

Docker Compose for Swarm

# docker-compose.yml for Swarm deployment
version: '3.8'  # Compose file format version with swarm support

services:
  web:
    image: nginx:latest
    ports:
      - "80:80"  # Published port (accessible from outside)
    deploy:
      mode: replicated       # 'replicated' (scaled) or 'global' (one per node)
      replicas: 3            # Number of container instances
      update_config:
        parallelism: 1       # Update one container at a time
        delay: 10s           # Wait 10s between updates
        order: start-first   # Start new tasks before stopping old ones
        failure_action: rollback  # Auto-rollback on failure
      restart_policy:
        condition: on-failure    # Restart if container exits non-zero
        max_attempts: 3          # Maximum restart attempts
        window: 120s             # Time window to evaluate restart attempts
      placement:
        constraints:
          - node.role == worker  # Only run on worker nodes
        preferences:
          - spread: node.labels.datacenter  # Spread across datacenters
      resources:
        limits:
          cpus: '0.5'       # Maximum CPU usage
          memory: 512M      # Maximum memory usage
        reservations:
          cpus: '0.1'       # Guaranteed CPU allocation
          memory: 128M      # Guaranteed memory allocation
  
  api:
    image: myapi:latest
    deploy:
      replicas: 3
      placement:
        constraints:
          - node.labels.zone == frontend  # Only run on nodes with this label
        max_replicas_per_node: 1          # Limit instances per node for HA
      update_config:
        parallelism: 2                    # Update two containers at once
      rollback_config:
        parallelism: 3                    # Faster rollback if needed
    environment:
      - DB_HOST=db                     # Service discovery by name
      - API_KEY_FILE=/run/secrets/api_key  # Reference a secret
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s                # Initial grace period
    networks:
      - backend                        # Connect to backend network
    secrets:
      - api_key                        # Reference to secret defined below
  
  db:
    image: postgres:13
    volumes:
      - db-data:/var/lib/postgresql/data  # Persistent storage
    deploy:
      placement:
        constraints:
          - node.labels.zone == database  # Only run on database nodes
      replicas: 1                         # Single database instance
      restart_policy:
        condition: any                    # Always restart
    environment:
      - POSTGRES_PASSWORD_FILE=/run/secrets/db_password
    networks:
      - backend
    secrets:
      - db_password

networks:
  backend:
    driver: overlay               # Multi-host network
    attachable: true              # Allow standalone containers to connect
    driver_opts:
      encrypted: "true"           # Encrypt traffic on this network

volumes:
  db-data:                        # Named volume for persistent data
    driver: local                 # Use local storage (default)
    # For production, consider using a volume driver that supports replication

secrets:
  api_key:
    file: ./secrets/api_key.txt   # Load from local file during deployment
  db_password:
    file: ./secrets/db_password.txt

High Availability and Fault Tolerance

Manager High Availability

  • Deploy multiple manager nodes (3, 5, or 7 recommended):
    • 3 managers tolerate 1 failure
    • 5 managers tolerate 2 failures
    • 7 managers tolerate 3 failures
    • More than 7 reduces performance without improving fault tolerance
  • Maintain quorum for consensus:
    • Requires majority (N/2+1) of managers to be operational
    • Loss of quorum prevents changes to cluster state
    • Critical operations require consensus:
      • Service creation/updates
      • Node additions/removals
      • Secret management
  • Distribute managers across failure domains:
    • Place on different physical hosts
    • Spread across availability zones
    • Use different network segments
    • Consider power and cooling redundancy
  • Automatic leader election:
    • Raft protocol selects a leader manager
    • Handles failover without manual intervention
    • Typically completes within seconds
    • Only one leader actively makes orchestration decisions
  • State replication between managers:
    • Raft consensus ensures consistent state
    • Each manager maintains a complete copy of cluster state
    • Changes are propagated to all managers
    • Persistent state stored in /var/lib/docker/swarm/

Worker Fault Tolerance

  • Automatic task rescheduling:
    • Tasks (containers) from failed nodes automatically rescheduled
    • New tasks maintain service configurations and constraints
    • System attempts to distribute load evenly
    • Honors placement constraints during rescheduling
  • Health checks:
    • Container health checks monitor application health
    • Node health monitoring detects infrastructure failures
    • Proactive replacement of unhealthy containers
    • Customizable health check parameters:
      healthcheck:
        test: ["CMD", "curl", "-f", "http://localhost/health"]
        interval: 30s
        timeout: 10s
        retries: 3
      
  • Service recreation:
    • Failed services automatically recreated
    • Maintains desired replica count
    • Preserves service configuration
    • Attempts to restart with exponential backoff
  • Rolling updates:
    • Update services without downtime
    • Control parallelism and delay between updates
    • Monitor health during updates
    • Automatic rollback on failure (when configured)
  • Restart policies:
    • always: Always restart containers
    • on-failure: Restart only on non-zero exit codes
    • unless-stopped: Restart unless explicitly stopped
    • none: Never automatically restart
    • Configurable maximum attempts and restart window

Swarm Secrets Management

# Create a secret
echo "mypassword" | docker secret create db_password -
# Secrets are stored encrypted in the Raft log
# They are never written to disk unencrypted
# `-` at the end indicates input from stdin

# Create a service with a secret
docker service create --name db \
  --secret db_password \                           # Reference the created secret
  --env POSTGRES_PASSWORD_FILE=/run/secrets/db_password \  # Tell application where to find it
  --secret source=ssl_cert,target=server.crt \     # Custom target path for a secret
  --secret source=ssl_key,target=server.key,mode=0400 \  # Custom file permissions
  postgres:13

# Inside the container, secrets appear as files in /run/secrets/
# Example:
# /run/secrets/db_password
# /run/secrets/server.crt
# /run/secrets/server.key

# List secrets
docker secret ls
# Shows all secrets with creation time and ID

# Inspect a secret (shows metadata only, not the actual content)
docker secret inspect db_password

# Remove a secret
docker secret rm db_password
# Note: Cannot remove secrets used by running services
# Must update or remove services first

# Create a secret from a file
docker secret create ssl_cert ./server.crt

Swarm Configs

Resource Constraints

# Create a service with resource constraints
docker service create --name resource-limited \
  --limit-cpu 0.5 \                   # Maximum CPU usage (50% of one core)
  --limit-memory 512M \               # Maximum memory usage (512MB)
  --reserve-cpu 0.2 \                 # Guaranteed CPU (20% of one core)
  --reserve-memory 256M \             # Guaranteed memory (256MB)
  --reserve-memory-swap 512M \        # Total memory+swap reservation
  --generic-resource "gpu=1" \        # Request 1 GPU (requires node labels)
  --ulimit nofile=65536:65536 \       # Set file descriptor limits
  --ulimit nproc=4096:4096 \          # Process limits
  nginx:latest

# Resource constraints ensure:
# 1. Fair resource distribution across services
# 2. Protection against noisy neighbors
# 3. Predictable performance
# 4. Efficient bin-packing of containers

# The scheduler uses reservations to make placement decisions
# Limits enforce maximum resource usage
# A node must have available resources matching the reservation

Rolling Updates and Rollbacks

Rolling Updates

# Update with rolling update strategy
docker service update \
  --image nginx:1.21 \                      # New image to deploy
  --update-parallelism 2 \                  # Update 2 tasks at a time
  --update-delay 20s \                      # Wait 20s between updating each batch
  --update-order start-first \              # Start new tasks before stopping old ones
  --update-failure-action pause \           # Pause updates if a task fails
  --update-monitor 30s \                    # Monitor new tasks for 30s before proceeding
  --update-max-failure-ratio 0.2 \          # Allow 20% of tasks to fail before pausing
  --health-cmd "curl -f http://localhost/ || exit 1" \  # Health check command
  --health-interval 5s \                    # Check health every 5s during update
  --health-retries 3 \                      # Number of retries for health check
  webserver

# The rolling update process:
# 1. Start 2 new containers with the updated image
# 2. Wait for them to be healthy (30s monitoring period)
# 3. If healthy, stop 2 old containers
# 4. Wait 20s (update-delay)
# 5. Repeat until all containers are updated

Rollbacks

# Rollback to previous version
docker service update --rollback webserver
# This reverts to the configuration before the last update

# Configure rollback behavior
docker service update \
  --rollback-parallelism 3 \                # Rollback 3 tasks at a time
  --rollback-delay 10s \                    # Wait 10s between batches
  --rollback-failure-action continue \      # Continue rollback even if tasks fail
  --rollback-order stop-first \             # Stop old tasks before starting new ones
  --rollback-monitor 10s \                  # Monitor new tasks for 10s before proceeding
  webserver

# Check update/rollback status
docker service inspect --pretty webserver
# Shows current update status, progress, and configuration

# View update history
docker service ps --no-trunc webserver
# Shows all previous versions of tasks with their images

Rolling updates and rollbacks allow you to:

  1. Deploy new versions without downtime
  2. Test updates with canary deployments (partial updates)
  3. Quickly revert problematic changes
  4. Maintain control over the update process and timing
  5. Implement blue-green deployment patterns

Node Management

# List nodes
docker node ls
# Output shows:
# - NODE ID: Unique identifier for each node
# - HOSTNAME: Node's hostname
# - STATUS: Ready, Down, or Disconnected
# - AVAILABILITY: Active, Pause, or Drain
# - MANAGER STATUS: Leader, Reachable, or Unreachable (for manager nodes)
# - ENGINE VERSION: Docker engine version

# Inspect node
docker node inspect node-1
# Returns detailed information about the node:
# - Labels and node attributes
# - Resource availability
# - Platform and architecture
# - Network addresses
# - Join tokens and certificates
# - Status and health

# Format inspect output for specific information
docker node inspect -f '{{.Status.Addr}}' node-1
# Returns just the node's IP address

# Set node availability
docker node update --availability drain node-1  # Prepare for maintenance
# Drain: Stops scheduling new tasks and removes existing tasks from the node
# Existing tasks are gracefully rescheduled to other nodes
# Perfect for maintenance operations and updates

docker node update --availability active node-1 # Return to service
# Active: Node accepts new tasks and retains existing tasks
# Pause: Node keeps existing tasks but won't receive new ones

# Add labels to nodes
docker node update --label-add zone=frontend node-1
docker node update --label-add datacenter=east --label-add cpu=high-performance node-1
# Labels enable sophisticated scheduling strategies:
# - Geographic distribution
# - Hardware differentiation
# - Environment separation (prod/staging)
# - Specialized workload targeting

# Filter nodes by labels
docker node ls --filter node.label=zone=frontend

# Promote worker to manager
docker node promote node-2
# Converts worker to manager role
# Joins the Raft consensus group
# Receives a copy of the cluster state
# Can now accept management commands

# Demote manager to worker
docker node demote node-2
# Removes node from management plane
# If this would break quorum, the operation fails
# Best practice: Demote before removing manager nodes

# Remove a node from the swarm
# First, on the node to remove:
docker swarm leave
# Then on a manager node:
docker node rm node-3

Monitoring Swarm

Swarm vs. Kubernetes

Swarm Advantages

  • Simpler learning curve: Easier to learn and deploy with minimal new concepts
  • Native Docker integration: Built into Docker Engine without additional components
  • Lightweight implementation: Lower resource overhead and simpler architecture
  • Consistent Docker CLI experience: Uses familiar Docker commands and syntax
  • Faster deployment for smaller clusters: Quicker setup time for small to medium deployments
  • Lower operational complexity: Fewer moving parts and configuration options
  • Seamless Docker Compose integration: Direct deployment of Compose files with stack deploy
  • Integrated secret management: Built-in handling of sensitive data

Kubernetes Advantages

  • More extensive ecosystem: Larger community with more tools, extensions, and integrations
  • Advanced scheduling capabilities: More sophisticated pod placement and affinity rules
  • Broader industry adoption: Wider use in production environments across various industries
  • More extensible architecture: Custom Resource Definitions and Operators for extending functionality
  • Greater scalability for large deployments: Better performance at very large scale (1000+ nodes)
  • More granular control: Fine-grained configuration of networking, security, and resources
  • Declarative configuration: Stronger emphasis on GitOps and infrastructure-as-code patterns
  • Robust self-healing: More advanced health checking and automatic recovery mechanisms
  • Standardized approach: Becoming the industry standard for container orchestration

Backup and Restore

# Backup swarm state (on manager node)
# First, create a full backup
tar -czvf swarm-backup.tar.gz /var/lib/docker/swarm
# This includes:
# - Raft logs and consensus data
# - TLS certificates and keys
# - Secret data (encrypted)
# - Node information and join tokens

# For a more comprehensive backup, include:
docker service ls > services.txt
docker service inspect $(docker service ls -q) > service-details.json
docker secret ls > secrets.txt
docker config ls > configs.txt
docker network ls --filter driver=overlay > networks.txt

# Restore swarm
# 1. Stop Docker on the manager
systemctl stop docker

# 2. Restore the backup
tar -xzvf swarm-backup.tar.gz -C /
# Backup should be restored to the same path structure

# 3. Start Docker
systemctl start docker
# Docker will initialize using the restored swarm state
# The node resumes its role (leader or follower)

# 4. Verify the restore
docker node ls
docker service ls

# Alternative backup approach using manager UCP
# Save the UCP backup file which includes swarm configuration
docker container run --log-driver none --rm -i --name ucp \
  -v /var/run/docker.sock:/var/run/docker.sock \
  docker/ucp:latest backup --id $(docker container ls -q --filter name=ucp-controller) \
  --passphrase "secret" > ucp-backup.tar.gz

For a complete disaster recovery plan:

  1. Regular automated backups of swarm state
  2. Documentation of all custom configurations
  3. Scripts to recreate services if backup is unavailable
  4. Testing of restore procedures in a sandbox environment
  5. Consideration of stateful services and their data

Best Practices

Troubleshooting

Common Issues

  • Manager node availability
    • Symptoms: Unable to create/update services, quorum loss
    • Causes: Network issues, hardware failure, improper scaling
    • Resolution: Restore manager nodes, ensure proper distribution
    # Check manager status
    docker node ls
    # Look for "Leader", "Reachable", or "Unreachable" in MANAGER STATUS column
    
  • Network connectivity problems
    • Symptoms: Services can't communicate, DNS resolution failures
    • Causes: Firewall rules, overlay network issues, DNS configuration
    • Resolution: Check firewall, verify network configuration
    # Inspect overlay network configuration
    docker network inspect service_network
    
    # Check if required ports are open between nodes
    # TCP port 2377 - cluster management
    # TCP/UDP port 7946 - node-to-node communication
    # UDP port 4789 - overlay network traffic
    nc -zv manager-node 2377
    
  • Task placement constraints
    • Symptoms: Tasks pending but not starting, "no suitable node" errors
    • Causes: Impossible constraints, resource unavailability
    • Resolution: Adjust constraints, add resources
    # View failed task details
    docker service ps --no-trunc service_name
    
    # Check placement constraints
    docker service inspect --format '{{.Spec.TaskTemplate.Placement}}' service_name
    
    # List nodes matching constraints
    docker node ls --filter node.label=region=east
    
  • Resource limitations
    • Symptoms: Services failing to start, OOM kills
    • Causes: Insufficient memory/CPU, incorrect resource specifications
    • Resolution: Adjust resource limits, scale infrastructure
    # Check node resources
    docker node inspect node_name --format '{{.Description.Resources}}'
    
    # View service resource requirements
    docker service inspect --format '{{.Spec.TaskTemplate.Resources}}' service_name
    
    # Monitor resource usage
    docker stats $(docker ps --format "{{.Names}}" --filter label=com.docker.swarm.service.name=service_name)
    
  • Image availability
    • Symptoms: "image not found" errors, services stuck in "preparing" state
    • Causes: Private registry auth issues, image not existing, network problems
    • Resolution: Verify registry access, check image path
    # Test image pull manually
    docker pull image_name:tag
    
    # Configure registry authentication
    docker login registry.example.com
    
    # Create registry credentials as a secret
    echo "$DOCKER_AUTH_CONFIG" | docker secret create registry-auth -
    
    # Use credentials with service
    docker service create --name myservice \
      --secret registry-auth \
      --env DOCKER_AUTH_CONFIG_FILE=/run/secrets/registry-auth \
      registry.example.com/myimage:latest
    
  • Service update/scaling issues
    • Symptoms: Updates hanging, inconsistent replica count
    • Causes: Health check failures, resource constraints
    • Resolution: Check health checks, adjust update config
    # Monitor update progress
    docker service inspect --format '{{.UpdateStatus}}' service_name
    
    # Reset service update if stuck
    docker service update --force service_name
    

Diagnostic Commands

# Check swarm status
docker info | grep Swarm
# Verify if node is in swarm mode and its role (manager/worker)

# View task logs
docker service logs service_name
# Add --details to see metadata including which node is running each task
docker service logs --details service_name

# Check failed tasks
docker service ps --filter "desired-state=running" --filter "actual-state=failed" service_name
# Shows tasks that should be running but failed, with error messages

# View task history
docker service ps --no-trunc service_name
# Shows all tasks including previously failed attempts

# Verify network connectivity
docker run --rm --network service_network alpine ping service_name
# Tests DNS resolution and connectivity within overlay network

# Check service configuration
docker service inspect --pretty service_name
# Human-readable service configuration

# Examine network configuration
docker network inspect service_network
# Shows network details, connected containers, and subnet information

# Check node status and health
docker node inspect --pretty node_name
# Human-readable node information including health status

# View service constraints
docker service inspect --format '{{.Spec.TaskTemplate.Placement.Constraints}}' service_name
# Shows placement constraints that might prevent scheduling

# Debug service discovery
docker run --rm --network service_network nicolaka/netshoot \
  dig service_name
# Advanced network debugging using specialized tools