Welcome to from-docker-to-kubernetes

Kubernetes Resource Requests & Limits

Comprehensive guide to configuring, optimizing, and managing compute resources in Kubernetes clusters for improved performance and cost efficiency

Introduction to Kubernetes Resource Management

Resource management is a fundamental aspect of Kubernetes that determines how compute resources (CPU, memory, and other types) are allocated, utilized, and constrained across workloads. Proper resource configuration directly impacts:

  • Cluster stability: Prevent resource starvation and cascade failures
  • Application performance: Ensure workloads have sufficient resources to meet requirements
  • Cost efficiency: Optimize resource utilization and reduce cloud spending
  • Scaling predictability: Create consistent, reliable scaling behavior
  • Multi-tenant isolation: Fairly distribute resources across teams and applications

This comprehensive guide explores the concepts, implementation patterns, and best practices for configuring resource requests and limits in Kubernetes environments, helping you achieve the right balance between performance, reliability, and cost efficiency.

Resource Requests and Limits Fundamentals

Core Concepts

Kubernetes provides two primary mechanisms for resource management:

  1. Requests: The minimum amount of resources that the scheduler guarantees to a container
  2. Limits: The maximum amount of resources that a container can consume
# Pod with resource requests and limits
apiVersion: v1
kind: Pod
metadata:
  name: resource-demo
spec:
  containers:
  - name: app
    image: nginx:latest
    resources:
      requests:
        memory: "128Mi"
        cpu: "100m"
      limits:
        memory: "256Mi"
        cpu: "500m"

The differences between requests and limits are critical to understand:

Resource RequestsResource Limits
Used by scheduler to find a suitable nodeUsed by kubelet to enforce upper bounds
Never exceeds node capacity during schedulingCan be set higher than actual node capacity
Guaranteed to be available to the containerContainer cannot exceed these values
Determines pod QoS classAffects OOM score and termination priority
Used for initial resource allocationUsed for resource constraint enforcement

Resource Units and Notation

Kubernetes uses specific units for resource quantities:

QoS Classes and Pod Priority

Quality of Service (QoS) Classes

Kubernetes automatically assigns one of three QoS classes to pods based on their resource configurations:

# Guaranteed QoS example
apiVersion: v1
kind: Pod
metadata:
  name: guaranteed-pod
spec:
  containers:
  - name: app
    image: nginx
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "256Mi"  # Must match requests
        cpu: "500m"      # Must match requests
# Burstable QoS example
apiVersion: v1
kind: Pod
metadata:
  name: burstable-pod
spec:
  containers:
  - name: app
    image: nginx
    resources:
      requests:
        memory: "128Mi"
        cpu: "100m"
      limits:
        memory: "256Mi"  # Higher than requests
        cpu: "500m"      # Higher than requests
# BestEffort QoS example
apiVersion: v1
kind: Pod
metadata:
  name: besteffort-pod
spec:
  containers:
  - name: app
    image: nginx
    # No resource requests or limits defined

These QoS classes determine eviction order during node resource pressure:

  1. Guaranteed: Highest priority, only killed if exceeding memory limits or in critical system failures
  2. Burstable: Medium priority, killed before Guaranteed pods when node is under memory pressure
  3. BestEffort: Lowest priority, first to be killed when node is under memory pressure

Pod Priority and Preemption

For finer control over pod importance, use Pod Priority:

# Priority Class definition
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Critical production workloads"
---
# Pod using priority class
apiVersion: v1
kind: Pod
metadata:
  name: critical-app
spec:
  containers:
  - name: app
    image: critical-app:latest
  priorityClassName: high-priority

Higher priority pods can preempt (evict) lower priority pods during scheduling constraints, ensuring critical workloads get resources first.

Resource Management in Practice

Setting Appropriate Requests and Limits

Determining optimal resource settings requires a methodical approach:

  1. Performance testing: Measure actual resource usage under various loads
  2. Monitoring data analysis: Review historical resource utilization patterns
  3. Application profiling: Understand the resource consumption characteristics
  4. Overhead accounting: Add buffer for JVM, runtime environment, etc.
  5. Growth projection: Account for expected usage increases

Guidelines for common workload types:

Workload TypeCPU RequestCPU LimitMemory RequestMemory Limit
Web servicep50 + 20%2-3x requestp90 + 30%1.5-2x request
Batch jobAverage2x requestPeak + 10%1.2x request
Databasep90None or 2xp991.5x request
Cachep502x requestFixed sizeSame as request

Resource Quotas

ResourceQuotas limit the total resources used within a namespace:

# Namespace ResourceQuota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "8"
    requests.memory: 16Gi
    limits.cpu: "16"
    limits.memory: 32Gi
    pods: "20"
    persistentvolumeclaims: "10"

This creates a resource boundary for a team or application, preventing one group from consuming all cluster resources.

LimitRanges

LimitRanges define default, minimum, and maximum resource values at the namespace level:

# LimitRange for containers
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-a
spec:
  limits:
  - default:  # Default limits if not specified
      cpu: "500m"
      memory: 256Mi
    defaultRequest:  # Default requests if not specified
      cpu: "100m"
      memory: 128Mi
    min:  # Minimum allowed values
      cpu: "50m"
      memory: 64Mi
    max:  # Maximum allowed values
      cpu: "2"
      memory: 1Gi
    type: Container

LimitRanges ensure consistent resource configurations across a namespace and prevent extremes (too small or too large).

Resource Monitoring and Optimization

Monitoring Resource Usage

Effective resource management requires comprehensive monitoring:

# View pod resource usage
kubectl top pod -n <namespace>

# View node resource usage
kubectl top node

For deeper insights, implement dedicated monitoring tools:

  1. Prometheus + Grafana: Industry standard for Kubernetes monitoring
  2. Kubernetes Dashboard: Simple built-in visualization
  3. Kubernetes Resource Report: Cost and utilization analysis
  4. Vertical Pod Autoscaler in recommendation mode: Automated resource suggestion

Vertical Pod Autoscaler

The Vertical Pod Autoscaler (VPA) automatically adjusts resource requests:

# VPA configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"  # or "Off", "Initial", "Recreate"
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 1
        memory: 512Mi

VPA modes provide flexibility in how recommendations are applied:

  • Off: Only generates recommendations, no automatic updates
  • Initial: Sets resources only at pod creation time
  • Auto: Automatically updates pod resources (requires pod restart)
  • Recreate: Like Auto, but forces pod recreation for updates

Resource Optimization Strategies

Implement these strategies to optimize resource usage:

  1. Right-sizing: Adjust requests and limits to match actual usage patterns
  2. Bin-packing improvement: Set appropriate requests to optimize node utilization
  3. Workload segregation: Separate resource-intensive and lightweight workloads
  4. Node affinity tuning: Place workloads on appropriate node types
  5. Cost-aware scheduling: Consider resource costs in placement decisions

Advanced Resource Management

CPU Management Policies

Configure CPU management for performance-sensitive workloads:

# Pod with guaranteed CPU cores
apiVersion: v1
kind: Pod
metadata:
  name: cpu-sensitive-app
spec:
  containers:
  - name: app
    image: performance-app:latest
    resources:
      requests:
        cpu: "2"
      limits:
        cpu: "2"  # Integer value for exclusive cores
  restartPolicy: Never

Kubelet CPU manager policies:

  • none: Default policy, shared CPU scheduling
  • static: Exclusive allocation of entire CPU cores for Guaranteed QoS pods with integer CPU requests

Enable with kubelet configuration:

# kubelet configuration
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cpuManagerPolicy: static

Memory Management Policies

Optimize memory allocation and utilization:

# Pod with memory optimization
apiVersion: v1
kind: Pod
metadata:
  name: memory-optimized
spec:
  containers:
  - name: app
    image: memory-intensive:latest
    resources:
      requests:
        memory: "8Gi"
      limits:
        memory: "8Gi"
    env:
    - name: MALLOC_ARENA_MAX
      value: "2"  # Reduce glibc memory overhead

Memory management strategies:

  1. Huge pages: Allocate large memory pages for performance
    spec:
      containers:
      - name: app
        resources:
          requests:
            memory.hugepages-2Mi: "1Gi"
          limits:
            memory.hugepages-2Mi: "1Gi"
    
  2. Memory QoS: Fine-tune memory quality of service with cgroups v2
    # kubelet configuration
    apiVersion: kubelet.config.k8s.io/v1beta1
    kind: KubeletConfiguration
    memoryManagerPolicy: Static
    

Resource Overcommitment

Strategic overcommitment can improve resource utilization:

# Node with overcommitment settings
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: overcommit-class
handler: runc
scheduling:
  nodeSelector:
    node-role.kubernetes.io/worker: ""
overhead:
  podFixed:
    cpu: "100m"
    memory: "128Mi"

Overcommitment considerations:

  1. Risk assessment: Evaluate the impact of resource contention
  2. Workload compatibility: Identify which workloads tolerate resource pressure
  3. Safety mechanisms: Implement monitoring and alerts for resource saturation
  4. QoS differentiation: Use QoS classes to protect critical workloads
  5. Node segregation: Isolate overcommitted workloads on specific nodes

Resource-Based Autoscaling

Horizontal Pod Autoscaler

Scale replicas based on resource utilization:

# HPA based on CPU utilization
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

HPA best practices:

  1. Appropriate thresholds: Set realistic utilization targets (typically 60-80%)
  2. Scale behavior tuning: Configure stabilization windows to prevent thrashing
  3. Multiple metrics: Use both CPU and memory metrics for balanced scaling
  4. Custom metrics: Add application-specific metrics for more precise scaling

Cluster Autoscaler

Automatically adjust cluster size based on pod resource requirements:

# Deploy Cluster Autoscaler
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      containers:
      - name: cluster-autoscaler
        image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.23.0
        command:
        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --nodes=2:10:my-node-group
        - --scale-down-utilization-threshold=0.5

Cluster Autoscaler optimizes node resources by:

  1. Adding nodes: When pods can't be scheduled due to insufficient resources
  2. Removing nodes: When node utilization is below threshold for an extended period
  3. Respecting pod disruption budgets: Ensuring availability during scale-down
  4. Considering node taints/affinities: Respecting workload placement constraints

Resource-Aware Scheduling

Node Affinity and Anti-Affinity

Place pods on nodes with appropriate resources:

# Pod with resource-based node affinity
apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node.kubernetes.io/instance-type
            operator: In
            values:
            - g4dn.xlarge
            - p3.2xlarge
  containers:
  - name: gpu-container
    image: gpu-workload:latest
    resources:
      requests:
        nvidia.com/gpu: 1
      limits:
        nvidia.com/gpu: 1

Affinity rules help optimize resource placement by:

  1. Hardware targeting: Place workloads on nodes with specialized hardware
  2. Resource efficiency: Match workload requirements to appropriate node types
  3. Cost optimization: Direct high-resource pods to cost-optimized nodes
  4. Performance isolation: Separate resource-intensive workloads

Taints and Tolerations

Reserve nodes for specific resource-intensive workloads:

# Taint a node for memory-intensive workloads
kubectl taint nodes node1 workload=memory-intensive:NoSchedule

# Pod with toleration for memory-intensive nodes
apiVersion: v1
kind: Pod
metadata:
  name: large-memory-app
spec:
  tolerations:
  - key: "workload"
    operator: "Equal"
    value: "memory-intensive"
    effect: "NoSchedule"
  containers:
  - name: app
    image: memory-app:latest
    resources:
      requests:
        memory: "24Gi"
      limits:
        memory: "24Gi"

This approach reserves specialized nodes for workloads that need them, preventing resource contention.

Custom Schedulers

For complex resource allocation needs, implement custom schedulers:

# Pod using custom scheduler
apiVersion: v1
kind: Pod
metadata:
  name: custom-scheduled-pod
spec:
  schedulerName: resource-optimizing-scheduler
  containers:
  - name: app
    image: app:latest
    resources:
      requests:
        cpu: "2"
        memory: "4Gi"

Custom schedulers can implement advanced resource strategies:

  1. Topology-aware placement: Consider hardware topology (NUMA, cache)
  2. Resource fragmentation prevention: Optimize bin-packing algorithms
  3. Workload-specific optimizations: Special handling for specific application types
  4. Cost-aware scheduling: Balance resource needs against infrastructure costs

Resource Management Best Practices

Production Guidelines

Follow these best practices for production environments:

  1. Always set requests and limits: Never deploy pods without resource specifications
  2. Be conservative with requests: Set requests to p95 of normal usage
  3. Set memory limits carefully: Memory limits that are too low cause terminations
  4. Consider not setting CPU limits: CPU is compressible and limits can reduce burst capacity
  5. Implement namespace quotas: Prevent resource monopolization by teams/apps
  6. Use appropriate QoS classes: Match QoS to workload criticality
  7. Review and adjust regularly: Resource needs change as applications evolve
  8. Implement robust monitoring: Watch for resource saturation and OOM events
  9. Document resource decisions: Record rationale for resource settings

Common Pitfalls

Avoid these common resource management mistakes:

  1. Identical requests and limits for CPU: Eliminates ability to burst during load spikes
  2. Memory limits too close to requests: Increases OOM kill risk
  3. Setting resources without measurement: Guessing leads to inefficiency
  4. Ignoring resource overhead: Container runtime and system components need resources too
  5. Overcommitting critical nodes: Control plane nodes need stable resources
  6. Resource fragmentation: Small requests can waste resources due to fragmentation
  7. Ignoring pod startup resources: Some applications need more resources during startup
  8. Insufficient headroom: Nodes running at high utilization have no room for bursting

Conclusion

Effective resource management is essential for running reliable, performant, and cost-efficient Kubernetes workloads. By understanding the nuances of resource requests and limits, QoS classes, and scheduling mechanisms, you can create a resource strategy that balances competing priorities.

Start with conservative resource settings based on actual measurements, implement appropriate monitoring, and continuously refine your approach as you gain operational experience. Remember that resource management is not a one-time configuration but an ongoing process that evolves with your applications and infrastructure.

By applying the principles and practices outlined in this guide, you can optimize resource utilization while maintaining reliability, ultimately delivering better performance at lower cost across your Kubernetes environments.