Welcome to from-docker-to-kubernetes

Kubernetes Ephemeral Containers

Understanding and utilizing ephemeral containers for debugging and troubleshooting in Kubernetes

Introduction to Ephemeral Containers

Ephemeral containers are a powerful feature in Kubernetes that allows users to run temporary containers in an existing pod for debugging and troubleshooting purposes. Unlike regular containers in a pod, ephemeral containers are designed to be short-lived and don't restart automatically when they exit.

This feature addresses a common challenge in Kubernetes: how to debug running pods, especially those with minimal or distroless container images that lack debugging tools. Traditionally, debugging containerized applications required either adding debugging tools to production images (increasing their attack surface) or redeploying pods with modified images (disrupting the application). This limitation presented a significant operational challenge for teams adopting security best practices while still needing effective troubleshooting capabilities.

Distroless and minimal container images, which contain only the application and its runtime dependencies without shells, package managers, or debugging utilities, have become increasingly popular for security reasons. They offer several security benefits:

  1. Reduced attack surface: Fewer components mean fewer potential vulnerabilities
  2. Smaller image size: Faster deployments and reduced storage requirements
  3. Improved security posture: No shell access for potential attackers
  4. Compliance advantages: Easier to meet regulatory requirements for minimal components

However, these security benefits came with a significant operational cost: troubleshooting production issues became much more difficult. Ephemeral containers solve this dilemma by allowing operators to temporarily attach a debug container to a running pod without compromising the security benefits of minimal production images.

Ephemeral containers provide a solution by allowing you to temporarily add a debugging container to a running pod without modifying its definition or restarting it. This creates a powerful debugging experience while maintaining production security best practices.

How Ephemeral Containers Work

Ephemeral containers are added to running pods through the Kubernetes API and share the pod's namespaces, allowing them to access and troubleshoot the pod's environment:

Pod Namespace Sharing

  • Ephemeral containers share the pod's network namespace
  • They can communicate with other containers via localhost
  • They have access to the same DNS configuration as the pod
  • They can see and interact with all processes in the pod (if sharing PID namespace)
  • They share the pod's filesystem mounts (including volumes)
  • They can access the same ServiceAccount tokens as the pod
  • They can view the same environment variables (optionally)
  • They operate within the same security context constraints
  • This comprehensive sharing enables deep inspection of the pod's state while maintaining appropriate security boundaries

Lifecycle Management

  • Created by updating the pod's ephemeralcontainers subresource
  • Not defined in the pod's initial specification (PodSpec)
  • Won't automatically restart if they exit or crash
  • Remain in pod status even after termination for log examination
  • Can be created as long as the pod is running
  • Cannot be removed once added (until the pod is deleted)

Resource Constraints

  • Not subject to pod's resource guarantees (QoS class)
  • Not considered for pod scheduling decisions
  • Can be assigned their own resource limits and requests
  • Lower priority than regular containers for system resources
  • Might face eviction under node pressure conditions
  • Not counted in pod resource allocation during scheduling
  • Will not prevent pod termination even if still running
  • Do not affect the pod's QoS class or resource guarantees
  • Are not considered for HorizontalPodAutoscaler metrics
  • Ideal for lightweight debugging tools and operations

Security Context

  • Subject to the same security constraints as the pod
  • Can be further restricted with additional security contexts
  • Cannot elevate privileges beyond the pod's SecurityContext
  • Run with the same Service Account as the pod
  • Subject to the same admission controllers as regular pods
  • Can be controlled via RBAC policies to limit access
  • Cannot run with higher privileges than the pod
  • Subject to all pod-level security policies
  • Cannot bypass PodSecurityStandards restrictions
  • Respect the same SecurityContext constraints
  • Can be further restricted with container-specific SecurityContext
  • Subject to NetworkPolicies controlling the pod

Creating Ephemeral Containers

Ephemeral containers can be added to pods using the kubectl debug command or directly via the Kubernetes API:

# Basic usage with kubectl
kubectl debug -it pod/mypod --image=busybox --target=container-name

# Add a debug container sharing process namespace
kubectl debug -it pod/mypod --image=ubuntu --share-processes

# Copy the pod and add a debug container
kubectl debug pod/mypod -it --image=ubuntu --copy-to=mypod-debug

# Debug using a custom debugging image with tools
kubectl debug -it pod/mypod --image=nicolaka/netshoot --target=container-name

# Specify a container name for the ephemeral container
kubectl debug -it pod/mypod --image=ubuntu --container=debugger

# Set environment variables in the debug container
kubectl debug -it pod/mypod --image=ubuntu --env="DEBUG=true" --env="VERBOSITY=high"

# Debug with specific command instead of default shell
kubectl debug -it pod/mypod --image=ubuntu -- /bin/bash -c "ls -la /proc/1/root"

# Attach to a specific port for debugging applications
kubectl debug -it pod/mypod --image=ubuntu -- /bin/bash -c "apt update && apt install -y curl && curl localhost:8080/health"

# Debug with custom security context
kubectl debug -it pod/mypod --image=ubuntu --as-user=1000 --as-group=2000

Use Cases and Examples

Advanced Debugging Techniques

Ephemeral containers enable several advanced debugging techniques:

Process Inspection and Tracing

  • Share process namespace to inspect running processes
  • Use strace, ltrace to monitor system calls
  • Analyze process metrics and resource usage
  • Examine file descriptors and network connections
  • Inspect memory maps and resource consumption
  • Analyze thread activity and scheduling patterns
  • Monitor inter-process communication
  • Trace system calls across process boundaries
  • Examine process hierarchies and relationships
  • Example process debugging:
    # Add process tools and share process namespace
    kubectl debug -it pod/app --image=ubuntu --share-processes
    
    # Inside the debug container
    $ ps auxf  # Process tree format
    $ top -c   # Command with arguments
    $ strace -f -p 1  # Follow forks
    $ ltrace -f -p 1  # Trace library calls
    $ lsof -p 1  # Open files and sockets
    $ cat /proc/1/status  # Process status information
    $ cat /proc/1/maps  # Memory mappings
    $ pstree -p  # Process tree with PIDs
    $ pstack 1  # Stack trace for process
    $ perf record -p 1 -g -- sleep 10  # Collect performance data
    $ perf report  # Analyze collected data
    

Core Dump Analysis

  • Capture and analyze core dumps from crashed applications
  • Use debugging symbols and tools for post-mortem analysis
  • Example core dump analysis:
    # Add debugging tools
    kubectl debug -it pod/app --image=ubuntu --share-processes
    
    # Inside the debug container
    $ mkdir /tmp/cores
    $ echo "/tmp/cores/core.%e.%p" > /proc/sys/kernel/core_pattern
    $ kill -ABRT $(pgrep app)
    $ gdb /proc/1/root/app /tmp/cores/core.app.*
    

Specialized Debugging Containers

  • Create purpose-built debugging images
  • Include language-specific tools and profilers
  • Example specialized Java debugging container:
    FROM eclipse-temurin:17
    
    # Install debugging tools
    RUN apt-get update && apt-get install -y --no-install-recommends \
        curl unzip procps net-tools lsof htop sysstat \
        iproute2 tcpdump dnsutils netcat-openbsd \
        strace ltrace gdb binutils \
        jattach jcmd jstat \
        && rm -rf /var/lib/apt/lists/*
    
    # Add Java profiling tools
    RUN curl -sLO https://github.com/jvm-profiling-tools/async-profiler/releases/download/v2.9/async-profiler-2.9-linux-x64.tar.gz \
        && tar -xzf async-profiler-2.9-linux-x64.tar.gz -C /opt \
        && rm async-profiler-2.9-linux-x64.tar.gz
    
    # Install Java Flight Recorder tools
    RUN mkdir -p /opt/jfr-tools && \
        curl -sL https://github.com/adoptium/jmc-build/releases/download/8.3.0/org.openjdk.jmc-8.3.0-jdk17.tar.gz | \
        tar -xzf - -C /opt/jfr-tools
    
    # Install Arthas Java diagnostics tool
    RUN curl -sL https://arthas.aliyun.com/arthas-boot.jar -o /opt/arthas-boot.jar
    
    # Install VisualVM
    RUN curl -sL https://github.com/oracle/visualvm/releases/download/2.1.5/visualvm_215.zip -o visualvm.zip && \
        unzip visualvm.zip -d /opt && \
        rm visualvm.zip
    
    # Set environment variables
    ENV PATH="/opt/async-profiler-2.9-linux-x64/bin:/opt/visualvm_215/bin:${PATH}"
    ENV JAVA_TOOL_OPTIONS="-XX:+FlightRecorder -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints"
    
    # Create helper scripts
    RUN echo '#!/bin/bash\njava -jar /opt/arthas-boot.jar "$@"' > /usr/local/bin/arthas && chmod +x /usr/local/bin/arthas
    RUN echo '#!/bin/bash\njattach $(pgrep -f "java.*-jar") jcmd JFR.start duration=60s filename=/tmp/recording.jfr' > /usr/local/bin/start-jfr && chmod +x /usr/local/bin/start-jfr
    
    COPY debug-scripts/ /opt/debug-scripts/
    
    CMD ["bash"]
    

Custom Debug Scripts

  • Prepare debug scripts in advance
  • Automate common troubleshooting procedures
  • Example automated debug script:
    # Create a debug script
    cat > debug-app.sh << 'EOF'
    #!/bin/bash
    # Comprehensive diagnostic script for containerized applications
    
    echo "========================================================"
    echo "=== Kubernetes Pod Diagnostic Report $(date -u) ==="
    echo "========================================================"
    
    echo -e "\n=== System Information ==="
    uname -a
    echo "Hostname: $(hostname)"
    echo "Container Runtime: $(cat /proc/self/cgroup | grep -o 'docker\|containerd\|crio'|uniq)"
    uptime
    echo "CPU Info: $(cat /proc/cpuinfo | grep 'model name' | head -1)"
    echo "Memory Info: $(free -h)"
    echo "Disk Space: $(df -h / | tail -1)"
    
    echo -e "\n=== Container Processes ==="
    ps auxf
    echo -e "\nTop Processes by CPU:"
    ps aux --sort=-%cpu | head -10
    echo -e "\nTop Processes by Memory:"
    ps aux --sort=-%mem | head -10
    
    echo -e "\n=== Network Information ==="
    echo "Interfaces:"
    ip addr
    echo -e "\nRouting Table:"
    ip route
    echo -e "\nNetwork Connections:"
    netstat -tuln
    echo -e "\nEstablished Connections:"
    netstat -tn | grep ESTABLISHED
    echo -e "\nDNS Configuration:"
    cat /etc/resolv.conf
    echo -e "\nNameserver Check:"
    for ns in $(grep nameserver /etc/resolv.conf | awk '{print $2}'); do
      echo "Testing $ns: $(dig @$ns kubernetes.default.svc.cluster.local +short || echo 'Failed')"
    done
    
    echo -e "\n=== Kubernetes Service Discovery Test ==="
    echo "Resolving kubernetes service: $(dig kubernetes.default.svc.cluster.local +short || echo 'Failed')"
    echo "Connectivity to kubernetes API: $(curl -k -s https://kubernetes.default.svc.cluster.local/healthz || echo 'Failed')"
    
    echo -e "\n=== File System Information ==="
    echo "Mounted Volumes:"
    mount | grep -v "proc\|sysfs\|cgroup\|tmpfs"
    echo -e "\nLargest Directories:"
    du -hd1 / 2>/dev/null | sort -hr | head -10
    echo -e "\nRecent File Changes:"
    find / -type f -mmin -60 2>/dev/null | grep -v "proc\|sys\|tmp" | head -20
    
    echo -e "\n=== Application Logs ==="
    for logfile in $(find /var/log -name "*.log" 2>/dev/null); do
      echo -e "\n--- Last 30 lines of $logfile ---"
      tail -n 30 $logfile
    done
    
    echo -e "\n=== Application Configuration ==="
    for conffile in $(find /etc -name "*.conf" -o -name "*.yaml" -o -name "*.properties" 2>/dev/null | grep -v "fonts\|X11"); do
      echo -e "\n--- Configuration file: $conffile ---"
      cat $conffile | grep -v "^#" | grep -v "^$" | head -20
      [[ $(cat $conffile | wc -l) -gt 20 ]] && echo "... (truncated)"
    done
    
    echo -e "\n=== Environment Variables ==="
    env | sort
    
    echo -e "\n=== Security Information ==="
    echo "Current User: $(id)"
    echo "Capabilities: $(capsh --print)"
    echo "SecurityContext:"
    if [ -d /proc/1/root ]; then
      ls -la /proc/1/root
    fi
    
    echo -e "\n=== End of Diagnostic Report ==="
    echo "========================================================" 
    EOF
    
    # Run the debug script in an ephemeral container
    kubectl debug -it pod/app --image=ubuntu -- /bin/bash -c "$(cat debug-app.sh)"
    

Limiting and Controlling Ephemeral Containers

While ephemeral containers are powerful debugging tools, it's important to control their usage in production environments:

  1. RBAC controls
    • Restrict ephemeral container creation with RBAC
    • Limit permissions to specific users or groups
    • Example RBAC configuration:
      apiVersion: rbac.authorization.k8s.io/v1
      kind: Role
      metadata:
        name: ephemeral-container-creator
        namespace: production
      rules:
      - apiGroups: [""]
        resources: ["pods/ephemeralcontainers"]
        verbs: ["update", "patch"]
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: RoleBinding
      metadata:
        name: debug-team-ephemeral-containers
        namespace: production
      subjects:
      - kind: Group
        name: debug-team
        apiGroup: rbac.authorization.k8s.io
      roleRef:
        kind: Role
        name: ephemeral-container-creator
        apiGroup: rbac.authorization.k8s.io
      
  2. Audit logging
    • Enable audit logging for ephemeral container operations
    • Track who created ephemeral containers and when
    • Example audit policy:
      apiVersion: audit.k8s.io/v1
      kind: Policy
      rules:
      - level: RequestResponse
        resources:
        - group: ""
          resources: ["pods/ephemeralcontainers"]
      

Best Practices

Real-World Debugging Scenarios

Let's explore some real-world scenarios where ephemeral containers prove invaluable:

  1. Debugging a distroless application that's crashing
    • Problem: Application crashes with no debug information
    • Solution:
      # Add debugging container to inspect logs and processes
      kubectl debug -it pod/crashing-app --image=ubuntu --share-processes
      
      # Check for core dumps and analyze logs
      ls -la /proc/1/root/
      cat /proc/1/root/var/log/app.log
      dmesg | grep -i 'killed process'
      
  2. Investigating network connectivity issues
    • Problem: Application can't connect to backend service
    • Solution:
      # Add network debugging tools
      kubectl debug -it pod/frontend --image=nicolaka/netshoot
      
      # Inside the container
      # Check DNS resolution
      dig backend-service.namespace.svc.cluster.local
      dig +search backend-service
      
      # Examine DNS configuration
      cat /etc/resolv.conf
      
      # Test TCP connectivity
      curl -v telnet://backend-service:8080
      nc -zv backend-service 8080
      
      # Capture network traffic
      tcpdump -i eth0 port 8080 -nn -v
      
      # Check routing
      ip route
      traceroute backend-service.namespace.svc.cluster.local
      
      # Examine socket statistics
      ss -tuna
      
      # Test service endpoint directly (bypass DNS)
      # Get service ClusterIP first
      kubectl get svc backend-service -n namespace -o jsonpath='{.spec.clusterIP}'
      curl -v http://10.96.x.y:8080/healthz
      
      # Check if NetworkPolicies are blocking traffic
      kubectl get networkpolicies -n namespace
      
      # Verify if the service has endpoints
      kubectl get endpoints backend-service -n namespace
      
  3. Memory leak investigation
    • Problem: Java application consuming increasing memory
    • Solution:
      # Add Java debugging tools
      kubectl debug -it pod/java-app --image=openjdk:11 --target=app-container
      
      # Capture heap dumps
      jmap -dump:format=b,file=/tmp/heap.bin 1
      jcmd 1 GC.heap_info
      

Future Developments

As Kubernetes continues to evolve, ephemeral containers are likely to see enhancements:

  1. Enhanced security controls
    • More granular permissions for ephemeral containers
    • Enhanced isolation options
    • Additional security context options
    • Container-specific RBAC for ephemeral containers
    • Scoped access tokens for debugging sessions
    • Time-limited debugging privileges
    • Enhanced audit logging for debug activities
    • Debug session recording for compliance
    • Fine-grained control over debug capabilities
    • Network isolation for debugging containers
    • Automated cleanup of debugging artifacts
  2. Improved user experience
    • Simplified debugging workflows
    • IDE integration for Kubernetes debugging
    • Automated diagnostic collection
    • GUI-based debugging interfaces
    • Pre-packaged debugging containers for common scenarios
    • Built-in diagnostic wizards for common problems
    • Auto-recommendation of debugging approaches
    • Seamless local-to-cluster debugging transitions
    • Cross-container debugging coordination
    • Standardized debug protocol implementations
    • Interactive troubleshooting tutorials
    • AI-assisted debugging suggestions
  3. Standardized debugging patterns
    • Common patterns for language-specific debugging
    • Integration with service mesh observability
    • Advanced profiling capabilities
    • Framework-aware debugging extensions
    • Runtime-specific debugging tools
    • Integrated APM (Application Performance Monitoring)
    • Continuous debugging telemetry
    • Distributed tracing integration
    • Automated correlation between metrics, logs, and traces
    • Chaos engineering integration for proactive debugging
    • GitOps for debugging configuration
    • Debugging as Code (DaC) patterns
    • Federated debugging across multi-cluster environments

Ephemeral containers represent a significant advancement in Kubernetes troubleshooting capabilities, enabling developers and operators to debug applications in production environments without compromising security or stability. By understanding and effectively utilizing this feature, teams can significantly improve their ability to diagnose and resolve issues in containerized applications.

As container security best practices continue to evolve towards minimal, distroless images in production, ephemeral containers bridge the critical gap between security and operability. This technology allows organizations to embrace the principle of least privilege in their production environments while maintaining the ability to perform effective troubleshooting when issues arise.

The ephemeral container pattern also aligns perfectly with modern immutable infrastructure approaches, where production artifacts are never modified after deployment. Instead of modifying running containers or rebuilding images with debugging tools, operations teams can temporarily augment existing pods with the precise debugging capabilities needed for a specific situation, then remove them when the investigation is complete.

Organizations implementing Kubernetes at scale should develop comprehensive debugging strategies that incorporate ephemeral containers, along with appropriate security controls, documentation, and training. By treating debugging as a first-class operational concern rather than an afterthought, teams can significantly reduce mean time to resolution (MTTR) for production incidents while maintaining robust security posture.

In essence, ephemeral containers exemplify the Kubernetes philosophy of providing flexible building blocks that can be composed to solve complex operational challenges, enabling teams to strike the right balance between security, performance, and operational excellence.