Kubernetes Ephemeral Containers

Understanding and utilizing ephemeral containers for debugging and troubleshooting in Kubernetes

Ephemeral containers are a powerful feature in Kubernetes that allows users to run temporary containers in an existing pod for debugging and troubleshooting purposes. Unlike regular containers in a pod, ephemeral containers are designed to be short-lived and don't restart automatically when they exit.

This feature addresses a common challenge in Kubernetes: how to debug running pods, especially those with minimal or distroless container images that lack debugging tools. Traditionally, debugging containerized applications required either adding debugging tools to production images (increasing their attack surface) or redeploying pods with modified images (disrupting the application). This limitation presented a significant operational challenge for teams adopting security best practices while still needing effective troubleshooting capabilities.

Distroless and minimal container images, which contain only the application and its runtime dependencies without shells, package managers, or debugging utilities, have become increasingly popular for security reasons. They offer several security benefits:

Reduced attack surface: Fewer components mean fewer potential vulnerabilities
Smaller image size: Faster deployments and reduced storage requirements
Improved security posture: No shell access for potential attackers
Compliance advantages: Easier to meet regulatory requirements for minimal components

However, these security benefits came with a significant operational cost: troubleshooting production issues became much more difficult. Ephemeral containers solve this dilemma by allowing operators to temporarily attach a debug container to a running pod without compromising the security benefits of minimal production images.

Ephemeral containers provide a solution by allowing you to temporarily add a debugging container to a running pod without modifying its definition or restarting it. This creates a powerful debugging experience while maintaining production security best practices.

How Ephemeral Containers Work

Ephemeral containers are added to running pods through the Kubernetes API and share the pod's namespaces, allowing them to access and troubleshoot the pod's environment:

Ephemeral containers share the pod's network namespace
They can communicate with other containers via localhost
They have access to the same DNS configuration as the pod
They can see and interact with all processes in the pod (if sharing PID namespace)
They share the pod's filesystem mounts (including volumes)
They can access the same ServiceAccount tokens as the pod
They can view the same environment variables (optionally)
They operate within the same security context constraints
This comprehensive sharing enables deep inspection of the pod's state while maintaining appropriate security boundaries

Lifecycle Management

Created by updating the pod's ephemeralcontainers subresource
Not defined in the pod's initial specification (PodSpec)
Won't automatically restart if they exit or crash
Remain in pod status even after termination for log examination
Can be created as long as the pod is running
Cannot be removed once added (until the pod is deleted)

Resource Constraints

Not subject to pod's resource guarantees (QoS class)
Not considered for pod scheduling decisions
Can be assigned their own resource limits and requests
Lower priority than regular containers for system resources
Might face eviction under node pressure conditions
Not counted in pod resource allocation during scheduling
Will not prevent pod termination even if still running
Do not affect the pod's QoS class or resource guarantees
Are not considered for HorizontalPodAutoscaler metrics
Ideal for lightweight debugging tools and operations

Security Context

Subject to the same security constraints as the pod
Can be further restricted with additional security contexts
Cannot elevate privileges beyond the pod's SecurityContext
Run with the same Service Account as the pod
Subject to the same admission controllers as regular pods
Can be controlled via RBAC policies to limit access
Cannot run with higher privileges than the pod
Subject to all pod-level security policies
Cannot bypass PodSecurityStandards restrictions
Respect the same SecurityContext constraints
Can be further restricted with container-specific SecurityContext
Subject to NetworkPolicies controlling the pod

Creating Ephemeral Containers

Ephemeral containers can be added to pods using the kubectl debug command or directly via the Kubernetes API:

# Basic usage with kubectl
kubectl debug -it pod/mypod --image=busybox --target=container-name

# Add a debug container sharing process namespace
kubectl debug -it pod/mypod --image=ubuntu --share-processes

# Copy the pod and add a debug container
kubectl debug pod/mypod -it --image=ubuntu --copy-to=mypod-debug

# Debug using a custom debugging image with tools
kubectl debug -it pod/mypod --image=nicolaka/netshoot --target=container-name

# Specify a container name for the ephemeral container
kubectl debug -it pod/mypod --image=ubuntu --container=debugger

# Set environment variables in the debug container
kubectl debug -it pod/mypod --image=ubuntu --env="DEBUG=true" --env="VERBOSITY=high"

# Debug with specific command instead of default shell
kubectl debug -it pod/mypod --image=ubuntu -- /bin/bash -c "ls -la /proc/1/root"

# Attach to a specific port for debugging applications
kubectl debug -it pod/mypod --image=ubuntu -- /bin/bash -c "apt update && apt install -y curl && curl localhost:8080/health"

# Debug with custom security context
kubectl debug -it pod/mypod --image=ubuntu --as-user=1000 --as-group=2000

Use Cases and Examples

Debugging distroless containers
- Distroless containers lack shell and debugging tools
- Add an ephemeral container with necessary tools
- Examine files, processes, and network state
- Example debugging a distroless Java application:
  # Add a debugging container to a distroless Java pod kubectl debug -it pod/java-app --image=eclipse-temurin:17 --target=java-app # Inside the debug container $ jcmd 1 GC.heap_info $ jcmd 1 Thread.print $ ls -la /proc/1/root/app/
Network troubleshooting
- Diagnose networking issues without modifying applications
- Analyze DNS resolution, connectivity, and routing
- Capture network traffic for detailed analysis
- Example network debugging session:
  # Add network troubleshooting tools kubectl debug -it pod/web-frontend --image=nicolaka/netshoot # Inside the debug container $ ping backend-service $ curl -v backend-service:8080 $ tcpdump -i eth0 -n port 8080 $ dig backend-service.default.svc.cluster.local

Memory and performance analysis

Analyze memory usage and performance issues
Profile running applications without instrumentation
Collect diagnostic information for troubleshooting

Example memory analysis of a Node.js application:

# Add Node.js debugging tools
kubectl debug -it pod/nodejs-app --image=node:16 --target=app

# Inside the debug container
$ node --inspect=0.0.0.0:9229
$ node -e 'console.log(process._getActiveHandles())'
$ ps aux | grep node
$ node --prof -e "console.log(process.memoryUsage())"
$ node --trace-gc -e "global.gc(); console.log(process.memoryUsage())"
$ node --heapsnapshot-signal=SIGUSR2 server.js &  # In another session, kill -USR2 <pid>
$ node --inspect-brk=0.0.0.0:9229 inspect-heap.js  # Connect Chrome DevTools to analyze
$ node -e 'console.log(require("v8").getHeapStatistics())'

File system inspection
- Examine logs, configuration files, and data
- Check file permissions and ownership
- Validate mounted volumes and their contents
- Example file system debugging:
  # Add file system tools kubectl debug -it pod/app --image=ubuntu # Inside the debug container $ find /var/log -name "*.log" -exec ls -la {} \; $ cat /etc/app/config.json | jq $ ls -la /data/shared

Advanced Debugging Techniques

Ephemeral containers enable several advanced debugging techniques:

Process Inspection and Tracing

Share process namespace to inspect running processes
Use strace, ltrace to monitor system calls
Analyze process metrics and resource usage
Examine file descriptors and network connections
Inspect memory maps and resource consumption
Analyze thread activity and scheduling patterns
Monitor inter-process communication
Trace system calls across process boundaries
Examine process hierarchies and relationships

Example process debugging:

# Add process tools and share process namespace
kubectl debug -it pod/app --image=ubuntu --share-processes

# Inside the debug container
$ ps auxf  # Process tree format
$ top -c   # Command with arguments
$ strace -f -p 1  # Follow forks
$ ltrace -f -p 1  # Trace library calls
$ lsof -p 1  # Open files and sockets
$ cat /proc/1/status  # Process status information
$ cat /proc/1/maps  # Memory mappings
$ pstree -p  # Process tree with PIDs
$ pstack 1  # Stack trace for process
$ perf record -p 1 -g -- sleep 10  # Collect performance data
$ perf report  # Analyze collected data

Core Dump Analysis

Capture and analyze core dumps from crashed applications
Use debugging symbols and tools for post-mortem analysis
Example core dump analysis:
# Add debugging tools kubectl debug -it pod/app --image=ubuntu --share-processes # Inside the debug container $ mkdir /tmp/cores $ echo "/tmp/cores/core.%e.%p" > /proc/sys/kernel/core_pattern $ kill -ABRT $(pgrep app) $ gdb /proc/1/root/app /tmp/cores/core.app.*

Specialized Debugging Containers

Create purpose-built debugging images
Include language-specific tools and profilers

Example specialized Java debugging container:

FROM eclipse-temurin:17

# Install debugging tools
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl unzip procps net-tools lsof htop sysstat \
    iproute2 tcpdump dnsutils netcat-openbsd \
    strace ltrace gdb binutils \
    jattach jcmd jstat \
    && rm -rf /var/lib/apt/lists/*

# Add Java profiling tools
RUN curl -sLO https://github.com/jvm-profiling-tools/async-profiler/releases/download/v2.9/async-profiler-2.9-linux-x64.tar.gz \
    && tar -xzf async-profiler-2.9-linux-x64.tar.gz -C /opt \
    && rm async-profiler-2.9-linux-x64.tar.gz

# Install Java Flight Recorder tools
RUN mkdir -p /opt/jfr-tools && \
    curl -sL https://github.com/adoptium/jmc-build/releases/download/8.3.0/org.openjdk.jmc-8.3.0-jdk17.tar.gz | \
    tar -xzf - -C /opt/jfr-tools

# Install Arthas Java diagnostics tool
RUN curl -sL https://arthas.aliyun.com/arthas-boot.jar -o /opt/arthas-boot.jar

# Install VisualVM
RUN curl -sL https://github.com/oracle/visualvm/releases/download/2.1.5/visualvm_215.zip -o visualvm.zip && \
    unzip visualvm.zip -d /opt && \
    rm visualvm.zip

# Set environment variables
ENV PATH="/opt/async-profiler-2.9-linux-x64/bin:/opt/visualvm_215/bin:${PATH}"
ENV JAVA_TOOL_OPTIONS="-XX:+FlightRecorder -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints"

# Create helper scripts
RUN echo '#!/bin/bash\njava -jar /opt/arthas-boot.jar "$@"' > /usr/local/bin/arthas && chmod +x /usr/local/bin/arthas
RUN echo '#!/bin/bash\njattach $(pgrep -f "java.*-jar") jcmd JFR.start duration=60s filename=/tmp/recording.jfr' > /usr/local/bin/start-jfr && chmod +x /usr/local/bin/start-jfr

COPY debug-scripts/ /opt/debug-scripts/

CMD ["bash"]

Custom Debug Scripts

Prepare debug scripts in advance
Automate common troubleshooting procedures

Example automated debug script:

# Create a debug script
cat > debug-app.sh << 'EOF'
#!/bin/bash
# Comprehensive diagnostic script for containerized applications

echo "========================================================"
echo "=== Kubernetes Pod Diagnostic Report $(date -u) ==="
echo "========================================================"

echo -e "\n=== System Information ==="
uname -a
echo "Hostname: $(hostname)"
echo "Container Runtime: $(cat /proc/self/cgroup | grep -o 'docker\|containerd\|crio'|uniq)"
uptime
echo "CPU Info: $(cat /proc/cpuinfo | grep 'model name' | head -1)"
echo "Memory Info: $(free -h)"
echo "Disk Space: $(df -h / | tail -1)"

echo -e "\n=== Container Processes ==="
ps auxf
echo -e "\nTop Processes by CPU:"
ps aux --sort=-%cpu | head -10
echo -e "\nTop Processes by Memory:"
ps aux --sort=-%mem | head -10

echo -e "\n=== Network Information ==="
echo "Interfaces:"
ip addr
echo -e "\nRouting Table:"
ip route
echo -e "\nNetwork Connections:"
netstat -tuln
echo -e "\nEstablished Connections:"
netstat -tn | grep ESTABLISHED
echo -e "\nDNS Configuration:"
cat /etc/resolv.conf
echo -e "\nNameserver Check:"
for ns in $(grep nameserver /etc/resolv.conf | awk '{print $2}'); do
  echo "Testing $ns: $(dig @$ns kubernetes.default.svc.cluster.local +short || echo 'Failed')"
done

echo -e "\n=== Kubernetes Service Discovery Test ==="
echo "Resolving kubernetes service: $(dig kubernetes.default.svc.cluster.local +short || echo 'Failed')"
echo "Connectivity to kubernetes API: $(curl -k -s https://kubernetes.default.svc.cluster.local/healthz || echo 'Failed')"

echo -e "\n=== File System Information ==="
echo "Mounted Volumes:"
mount | grep -v "proc\|sysfs\|cgroup\|tmpfs"
echo -e "\nLargest Directories:"
du -hd1 / 2>/dev/null | sort -hr | head -10
echo -e "\nRecent File Changes:"
find / -type f -mmin -60 2>/dev/null | grep -v "proc\|sys\|tmp" | head -20

echo -e "\n=== Application Logs ==="
for logfile in $(find /var/log -name "*.log" 2>/dev/null); do
  echo -e "\n--- Last 30 lines of $logfile ---"
  tail -n 30 $logfile
done

echo -e "\n=== Application Configuration ==="
for conffile in $(find /etc -name "*.conf" -o -name "*.yaml" -o -name "*.properties" 2>/dev/null | grep -v "fonts\|X11"); do
  echo -e "\n--- Configuration file: $conffile ---"
  cat $conffile | grep -v "^#" | grep -v "^$" | head -20
  [[ $(cat $conffile | wc -l) -gt 20 ]] && echo "... (truncated)"
done

echo -e "\n=== Environment Variables ==="
env | sort

echo -e "\n=== Security Information ==="
echo "Current User: $(id)"
echo "Capabilities: $(capsh --print)"
echo "SecurityContext:"
if [ -d /proc/1/root ]; then
  ls -la /proc/1/root
fi

echo -e "\n=== End of Diagnostic Report ==="
echo "========================================================" 
EOF

# Run the debug script in an ephemeral container
kubectl debug -it pod/app --image=ubuntu -- /bin/bash -c "$(cat debug-app.sh)"

Limiting and Controlling Ephemeral Containers

While ephemeral containers are powerful debugging tools, it's important to control their usage in production environments:

RBAC controls

Restrict ephemeral container creation with RBAC
Limit permissions to specific users or groups

Example RBAC configuration:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ephemeral-container-creator
  namespace: production
rules:
- apiGroups: [""]
  resources: ["pods/ephemeralcontainers"]
  verbs: ["update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: debug-team-ephemeral-containers
  namespace: production
subjects:
- kind: Group
  name: debug-team
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: ephemeral-container-creator
  apiGroup: rbac.authorization.k8s.io

Audit logging
- Enable audit logging for ephemeral container operations
- Track who created ephemeral containers and when
- Example audit policy:
  apiVersion: audit.k8s.io/v1 kind: Policy rules: - level: RequestResponse resources: - group: "" resources: ["pods/ephemeralcontainers"]

Best Practices

Use purpose-built debug images
- Create minimal debug images with only necessary tools
- Tag and version debug images properly
- Scan debug images for vulnerabilities
- Example purpose-built debug image:
  FROM alpine:3.16 RUN apk add --no-cache \ curl \ busybox-extras \ tcpdump \ netcat-openbsd \ bind-tools \ jq \ vim ENTRYPOINT ["sh"]
Prefer non-root debugging
- Run ephemeral containers as non-root users when possible
- Add specific capabilities rather than using privileged mode
- Example non-root debug command:
  kubectl debug -it pod/app --image=debugger:latest \ -- --securityContext.runAsUser=1000 \ --securityContext.runAsGroup=1000
Document debugging procedures
- Create standardized debugging playbooks
- Document common troubleshooting scenarios
- Provide examples for team reference
- Maintain a library of useful debug commands
- Create service-specific debugging guides
- Include information about service dependencies
- Document expected output for healthy systems
- Provide post-debugging cleanup procedures
- Update documentation based on real incidents
- Include security considerations and limitations
- Example documentation structure:
  # Comprehensive Application Debugging Guide ## Network Connectivity Issues 1. Add network debugging container ```bash kubectl debug -it pod/app --image=nicolaka/netshoot --share-processes
  1. Perform DNS checks
    # Inside debug container dig +short service-name.namespace.svc.cluster.local dig +trace service-name.namespace.svc.cluster.local cat /etc/resolv.conf
  2. Test service connectivity
    # Inside debug container curl -v telnet://service-name.namespace:8080 nc -zv service-name.namespace 8080 tcpdump -i any port 8080 -n
  3. Analyze network policies
    # From your workstation kubectl get networkpolicies -n namespace kubectl describe networkpolicy policy-name -n namespace
  4. Verify service endpoints
    # From your workstation kubectl get endpoints service-name -n namespace kubectl get endpointslices -l kubernetes.io/service-name=service-name -n namespace
  Memory Leak Investigation - Java Applications
  1. Add memory analysis container
    kubectl debug -it pod/java-app --image=eclipse-temurin:17 --share-processes
  2. Collect heap dumps
    # Inside debug container PID=$(pgrep -f java) jcmd $PID GC.heap_info jcmd $PID GC.class_histogram | head -20 jmap -dump:live,format=b,file=/tmp/heapdump.hprof $PID
  3. Analyze heap dumps
    # Inside debug container apt-get update && apt-get install -y openjdk-17-jdk jhat -J-Xmx512m /tmp/heapdump.hprof # Access on port 7000
  4. Collect thread dumps for deadlock analysis
    # Inside debug container jstack $PID > /tmp/threaddump.txt # Collect multiple thread dumps 30 seconds apart for i in {1..5}; do jstack $PID > /tmp/threaddump_$i.txt; sleep 30; done
  5. Enable garbage collection logging
    # Update Java startup parameters # Add -Xlog:gc*=info:file=gc.log:time,uptime,level,tags:filecount=5,filesize=10M
  Application Deadlock or Freeze
  1. Add debugging container
    kubectl debug -it pod/app --image=ubuntu --share-processes
  2. Check process state
    # Inside debug container ps auxf top -c
  3. Examine system resources
    # Inside debug container vmstat 1 10 iostat -x 1 10 free -m
  4. Capture stack traces
    # Inside debug container apt-get update && apt-get install -y gdb gdb -p $(pgrep main_process) -batch -ex "thread apply all bt" > /tmp/stack_trace.txt
  5. Check for file descriptor exhaustion
    # Inside debug container lsof -p $(pgrep main_process) | wc -l cat /proc/sys/fs/file-max cat /proc/$(pgrep main_process)/limits | grep 'open files'
  Database Connection Issues
  1. Add database client container
    kubectl debug -it pod/app --image=postgres:14 --share-processes
  2. Test direct database connection
    # Inside debug container PGPASSWORD=mypassword psql -h db-host -U username -d dbname -c "SELECT 1"
  3. Check connection pool status
    # If using PostgreSQL PGPASSWORD=mypassword psql -h db-host -U username -d dbname -c "SELECT * FROM pg_stat_activity"
  4. Verify network path
    # Inside debug container traceroute db-host nc -zv db-host 5432
  5. Review application database configuration
    # Inside debug container cat /proc/1/root/etc/app/database.yaml # or check environment variables strings /proc/$(pgrep java)/environ | grep DB_
Clean up after debugging
- Delete pods once debugging is complete
- Don't leave debug pods running unnecessarily
- Implement automated cleanup for forgotten debug pods
- Example cleanup script:
  #!/bin/bash # Find and delete pods with debug suffix older than 24 hours kubectl get pods --all-namespaces -o json | \ jq '.items[] | select(.metadata.name | endswith("-debug")) | select(.metadata.creationTimestamp | fromdateiso8601 | now - . > 86400) | .metadata.namespace + " " + .metadata.name' | \ xargs -n2 kubectl delete pod -n

Real-World Debugging Scenarios

Let's explore some real-world scenarios where ephemeral containers prove invaluable:

Debugging a distroless application that's crashing
- Problem: Application crashes with no debug information
- Solution:
  # Add debugging container to inspect logs and processes kubectl debug -it pod/crashing-app --image=ubuntu --share-processes # Check for core dumps and analyze logs ls -la /proc/1/root/ cat /proc/1/root/var/log/app.log dmesg | grep -i 'killed process'

Investigating network connectivity issues

Problem: Application can't connect to backend service

Solution:

# Add network debugging tools
kubectl debug -it pod/frontend --image=nicolaka/netshoot

# Inside the container
# Check DNS resolution
dig backend-service.namespace.svc.cluster.local
dig +search backend-service

# Examine DNS configuration
cat /etc/resolv.conf

# Test TCP connectivity
curl -v telnet://backend-service:8080
nc -zv backend-service 8080

# Capture network traffic
tcpdump -i eth0 port 8080 -nn -v

# Check routing
ip route
traceroute backend-service.namespace.svc.cluster.local

# Examine socket statistics
ss -tuna

# Test service endpoint directly (bypass DNS)
# Get service ClusterIP first
kubectl get svc backend-service -n namespace -o jsonpath='{.spec.clusterIP}'
curl -v http://10.96.x.y:8080/healthz

# Check if NetworkPolicies are blocking traffic
kubectl get networkpolicies -n namespace

# Verify if the service has endpoints
kubectl get endpoints backend-service -n namespace

Memory leak investigation
- Problem: Java application consuming increasing memory
- Solution:
  # Add Java debugging tools kubectl debug -it pod/java-app --image=openjdk:11 --target=app-container # Capture heap dumps jmap -dump:format=b,file=/tmp/heap.bin 1 jcmd 1 GC.heap_info

Future Developments

As Kubernetes continues to evolve, ephemeral containers are likely to see enhancements:

Enhanced security controls
- More granular permissions for ephemeral containers
- Enhanced isolation options
- Additional security context options
- Container-specific RBAC for ephemeral containers
- Scoped access tokens for debugging sessions
- Time-limited debugging privileges
- Enhanced audit logging for debug activities
- Debug session recording for compliance
- Fine-grained control over debug capabilities
- Network isolation for debugging containers
- Automated cleanup of debugging artifacts
Improved user experience
- Simplified debugging workflows
- IDE integration for Kubernetes debugging
- Automated diagnostic collection
- GUI-based debugging interfaces
- Pre-packaged debugging containers for common scenarios
- Built-in diagnostic wizards for common problems
- Auto-recommendation of debugging approaches
- Seamless local-to-cluster debugging transitions
- Cross-container debugging coordination
- Standardized debug protocol implementations
- Interactive troubleshooting tutorials
- AI-assisted debugging suggestions
Standardized debugging patterns
- Common patterns for language-specific debugging
- Integration with service mesh observability
- Advanced profiling capabilities
- Framework-aware debugging extensions
- Runtime-specific debugging tools
- Integrated APM (Application Performance Monitoring)
- Continuous debugging telemetry
- Distributed tracing integration
- Automated correlation between metrics, logs, and traces
- Chaos engineering integration for proactive debugging
- GitOps for debugging configuration
- Debugging as Code (DaC) patterns
- Federated debugging across multi-cluster environments

Ephemeral containers represent a significant advancement in Kubernetes troubleshooting capabilities, enabling developers and operators to debug applications in production environments without compromising security or stability. By understanding and effectively utilizing this feature, teams can significantly improve their ability to diagnose and resolve issues in containerized applications.

As container security best practices continue to evolve towards minimal, distroless images in production, ephemeral containers bridge the critical gap between security and operability. This technology allows organizations to embrace the principle of least privilege in their production environments while maintaining the ability to perform effective troubleshooting when issues arise.

The ephemeral container pattern also aligns perfectly with modern immutable infrastructure approaches, where production artifacts are never modified after deployment. Instead of modifying running containers or rebuilding images with debugging tools, operations teams can temporarily augment existing pods with the precise debugging capabilities needed for a specific situation, then remove them when the investigation is complete.

Organizations implementing Kubernetes at scale should develop comprehensive debugging strategies that incorporate ephemeral containers, along with appropriate security controls, documentation, and training. By treating debugging as a first-class operational concern rather than an afterthought, teams can significantly reduce mean time to resolution (MTTR) for production incidents while maintaining robust security posture.

In essence, ephemeral containers exemplify the Kubernetes philosophy of providing flexible building blocks that can be composed to solve complex operational challenges, enabling teams to strike the right balance between security, performance, and operational excellence.

Edit this page

Kubernetes Pod Security Standards

Understanding and implementing Pod Security Standards in Kubernetes

Welcome to From Docker to Kubernetes - Your Free Learning Journey Begins

Introducing our open-source platform for mastering containerization and orchestration.