Kubernetes Resource Requests & Limits

Comprehensive guide to configuring, optimizing, and managing compute resources in Kubernetes clusters for improved performance and cost efficiency

H A R S H H A A

@NotHarshhaa

Introduction to Kubernetes Resource Management

Resource management is a fundamental aspect of Kubernetes that determines how compute resources (CPU, memory, and other types) are allocated, utilized, and constrained across workloads. Proper resource configuration directly impacts:

Cluster stability: Prevent resource starvation and cascade failures
Application performance: Ensure workloads have sufficient resources to meet requirements
Cost efficiency: Optimize resource utilization and reduce cloud spending
Scaling predictability: Create consistent, reliable scaling behavior
Multi-tenant isolation: Fairly distribute resources across teams and applications

This comprehensive guide explores the concepts, implementation patterns, and best practices for configuring resource requests and limits in Kubernetes environments, helping you achieve the right balance between performance, reliability, and cost efficiency.

Resource Requests and Limits Fundamentals

Core Concepts

Kubernetes provides two primary mechanisms for resource management:

Requests: The minimum amount of resources that the scheduler guarantees to a container
Limits: The maximum amount of resources that a container can consume

# Pod with resource requests and limits
apiVersion: v1
kind: Pod
metadata:
  name: resource-demo
spec:
  containers:
  - name: app
    image: nginx:latest
    resources:
      requests:
        memory: "128Mi"
        cpu: "100m"
      limits:
        memory: "256Mi"
        cpu: "500m"

The differences between requests and limits are critical to understand:

Resource Requests	Resource Limits
Used by scheduler to find a suitable node	Used by kubelet to enforce upper bounds
Never exceeds node capacity during scheduling	Can be set higher than actual node capacity
Guaranteed to be available to the container	Container cannot exceed these values
Determines pod QoS class	Affects OOM score and termination priority
Used for initial resource allocation	Used for resource constraint enforcement

Resource Units and Notation

Kubernetes uses specific units for resource quantities:

CPU Resources

QoS Classes and Pod Priority

Quality of Service (QoS) Classes

Kubernetes automatically assigns one of three QoS classes to pods based on their resource configurations:

# Guaranteed QoS example
apiVersion: v1
kind: Pod
metadata:
  name: guaranteed-pod
spec:
  containers:
  - name: app
    image: nginx
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "256Mi"  # Must match requests
        cpu: "500m"      # Must match requests

# Burstable QoS example
apiVersion: v1
kind: Pod
metadata:
  name: burstable-pod
spec:
  containers:
  - name: app
    image: nginx
    resources:
      requests:
        memory: "128Mi"
        cpu: "100m"
      limits:
        memory: "256Mi"  # Higher than requests
        cpu: "500m"      # Higher than requests

# BestEffort QoS example
apiVersion: v1
kind: Pod
metadata:
  name: besteffort-pod
spec:
  containers:
  - name: app
    image: nginx
    # No resource requests or limits defined

These QoS classes determine eviction order during node resource pressure:

Guaranteed: Highest priority, only killed if exceeding memory limits or in critical system failures
Burstable: Medium priority, killed before Guaranteed pods when node is under memory pressure
BestEffort: Lowest priority, first to be killed when node is under memory pressure

Pod Priority and Preemption

For finer control over pod importance, use Pod Priority:

# Priority Class definition
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Critical production workloads"
---
# Pod using priority class
apiVersion: v1
kind: Pod
metadata:
  name: critical-app
spec:
  containers:
  - name: app
    image: critical-app:latest
  priorityClassName: high-priority

Higher priority pods can preempt (evict) lower priority pods during scheduling constraints, ensuring critical workloads get resources first.

Resource Management in Practice

Setting Appropriate Requests and Limits

Determining optimal resource settings requires a methodical approach:

Performance testing: Measure actual resource usage under various loads
Monitoring data analysis: Review historical resource utilization patterns
Application profiling: Understand the resource consumption characteristics
Overhead accounting: Add buffer for JVM, runtime environment, etc.
Growth projection: Account for expected usage increases

Guidelines for common workload types:

Workload Type	CPU Request	CPU Limit	Memory Request	Memory Limit
Web service	p50 + 20%	2-3x request	p90 + 30%	1.5-2x request
Batch job	Average	2x request	Peak + 10%	1.2x request
Database	p90	None or 2x	p99	1.5x request
Cache	p50	2x request	Fixed size	Same as request

Resource Quotas

ResourceQuotas limit the total resources used within a namespace:

# Namespace ResourceQuota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "8"
    requests.memory: 16Gi
    limits.cpu: "16"
    limits.memory: 32Gi
    pods: "20"
    persistentvolumeclaims: "10"

This creates a resource boundary for a team or application, preventing one group from consuming all cluster resources.

LimitRanges

LimitRanges define default, minimum, and maximum resource values at the namespace level:

# LimitRange for containers
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-a
spec:
  limits:
  - default:  # Default limits if not specified
      cpu: "500m"
      memory: 256Mi
    defaultRequest:  # Default requests if not specified
      cpu: "100m"
      memory: 128Mi
    min:  # Minimum allowed values
      cpu: "50m"
      memory: 64Mi
    max:  # Maximum allowed values
      cpu: "2"
      memory: 1Gi
    type: Container

LimitRanges ensure consistent resource configurations across a namespace and prevent extremes (too small or too large).

Resource Monitoring and Optimization

Monitoring Resource Usage

Effective resource management requires comprehensive monitoring:

# View pod resource usage
kubectl top pod -n <namespace>

# View node resource usage
kubectl top node

For deeper insights, implement dedicated monitoring tools:

Prometheus + Grafana: Industry standard for Kubernetes monitoring
Kubernetes Dashboard: Simple built-in visualization
Kubernetes Resource Report: Cost and utilization analysis
Vertical Pod Autoscaler in recommendation mode: Automated resource suggestion

Vertical Pod Autoscaler

The Vertical Pod Autoscaler (VPA) automatically adjusts resource requests:

# VPA configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"  # or "Off", "Initial", "Recreate"
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 1
        memory: 512Mi

VPA modes provide flexibility in how recommendations are applied:

Off: Only generates recommendations, no automatic updates
Initial: Sets resources only at pod creation time
Auto: Automatically updates pod resources (requires pod restart)
Recreate: Like Auto, but forces pod recreation for updates

Resource Optimization Strategies

Implement these strategies to optimize resource usage:

Right-sizing: Adjust requests and limits to match actual usage patterns
Bin-packing improvement: Set appropriate requests to optimize node utilization
Workload segregation: Separate resource-intensive and lightweight workloads
Node affinity tuning: Place workloads on appropriate node types
Cost-aware scheduling: Consider resource costs in placement decisions

Advanced Resource Management

CPU Management Policies

Configure CPU management for performance-sensitive workloads:

# Pod with guaranteed CPU cores
apiVersion: v1
kind: Pod
metadata:
  name: cpu-sensitive-app
spec:
  containers:
  - name: app
    image: performance-app:latest
    resources:
      requests:
        cpu: "2"
      limits:
        cpu: "2"  # Integer value for exclusive cores
  restartPolicy: Never

Kubelet CPU manager policies:

none: Default policy, shared CPU scheduling
static: Exclusive allocation of entire CPU cores for Guaranteed QoS pods with integer CPU requests

Enable with kubelet configuration:

# kubelet configuration
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cpuManagerPolicy: static

Memory Management Policies

Optimize memory allocation and utilization:

# Pod with memory optimization
apiVersion: v1
kind: Pod
metadata:
  name: memory-optimized
spec:
  containers:
  - name: app
    image: memory-intensive:latest
    resources:
      requests:
        memory: "8Gi"
      limits:
        memory: "8Gi"
    env:
    - name: MALLOC_ARENA_MAX
      value: "2"  # Reduce glibc memory overhead

Memory management strategies:

Huge pages: Allocate large memory pages for performance
spec: containers: - name: app resources: requests: memory.hugepages-2Mi: "1Gi" limits: memory.hugepages-2Mi: "1Gi"
Memory QoS: Fine-tune memory quality of service with cgroups v2
# kubelet configuration apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration memoryManagerPolicy: Static

Resource Overcommitment

Strategic overcommitment can improve resource utilization:

# Node with overcommitment settings
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: overcommit-class
handler: runc
scheduling:
  nodeSelector:
    node-role.kubernetes.io/worker: ""
overhead:
  podFixed:
    cpu: "100m"
    memory: "128Mi"

Overcommitment considerations:

Risk assessment: Evaluate the impact of resource contention
Workload compatibility: Identify which workloads tolerate resource pressure
Safety mechanisms: Implement monitoring and alerts for resource saturation
QoS differentiation: Use QoS classes to protect critical workloads
Node segregation: Isolate overcommitted workloads on specific nodes

Resource-Based Autoscaling

Horizontal Pod Autoscaler

Scale replicas based on resource utilization:

# HPA based on CPU utilization
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

HPA best practices:

Appropriate thresholds: Set realistic utilization targets (typically 60-80%)
Scale behavior tuning: Configure stabilization windows to prevent thrashing
Multiple metrics: Use both CPU and memory metrics for balanced scaling
Custom metrics: Add application-specific metrics for more precise scaling

Cluster Autoscaler

Automatically adjust cluster size based on pod resource requirements:

# Deploy Cluster Autoscaler
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      containers:
      - name: cluster-autoscaler
        image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.23.0
        command:
        - ./cluster-autoscaler
        - --cloud-provider=aws
        - --nodes=2:10:my-node-group
        - --scale-down-utilization-threshold=0.5

Cluster Autoscaler optimizes node resources by:

Adding nodes: When pods can't be scheduled due to insufficient resources
Removing nodes: When node utilization is below threshold for an extended period
Respecting pod disruption budgets: Ensuring availability during scale-down
Considering node taints/affinities: Respecting workload placement constraints

Resource-Aware Scheduling

Node Affinity and Anti-Affinity

Place pods on nodes with appropriate resources:

# Pod with resource-based node affinity
apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node.kubernetes.io/instance-type
            operator: In
            values:
            - g4dn.xlarge
            - p3.2xlarge
  containers:
  - name: gpu-container
    image: gpu-workload:latest
    resources:
      requests:
        nvidia.com/gpu: 1
      limits:
        nvidia.com/gpu: 1

Affinity rules help optimize resource placement by:

Hardware targeting: Place workloads on nodes with specialized hardware
Resource efficiency: Match workload requirements to appropriate node types
Cost optimization: Direct high-resource pods to cost-optimized nodes
Performance isolation: Separate resource-intensive workloads

Taints and Tolerations

Reserve nodes for specific resource-intensive workloads:

# Taint a node for memory-intensive workloads
kubectl taint nodes node1 workload=memory-intensive:NoSchedule

# Pod with toleration for memory-intensive nodes
apiVersion: v1
kind: Pod
metadata:
  name: large-memory-app
spec:
  tolerations:
  - key: "workload"
    operator: "Equal"
    value: "memory-intensive"
    effect: "NoSchedule"
  containers:
  - name: app
    image: memory-app:latest
    resources:
      requests:
        memory: "24Gi"
      limits:
        memory: "24Gi"

This approach reserves specialized nodes for workloads that need them, preventing resource contention.

Custom Schedulers

For complex resource allocation needs, implement custom schedulers:

# Pod using custom scheduler
apiVersion: v1
kind: Pod
metadata:
  name: custom-scheduled-pod
spec:
  schedulerName: resource-optimizing-scheduler
  containers:
  - name: app
    image: app:latest
    resources:
      requests:
        cpu: "2"
        memory: "4Gi"

Custom schedulers can implement advanced resource strategies:

Topology-aware placement: Consider hardware topology (NUMA, cache)
Resource fragmentation prevention: Optimize bin-packing algorithms
Workload-specific optimizations: Special handling for specific application types
Cost-aware scheduling: Balance resource needs against infrastructure costs

Resource Management Best Practices

Production Guidelines

Follow these best practices for production environments:

Always set requests and limits: Never deploy pods without resource specifications
Be conservative with requests: Set requests to p95 of normal usage
Set memory limits carefully: Memory limits that are too low cause terminations
Consider not setting CPU limits: CPU is compressible and limits can reduce burst capacity
Implement namespace quotas: Prevent resource monopolization by teams/apps
Use appropriate QoS classes: Match QoS to workload criticality
Review and adjust regularly: Resource needs change as applications evolve
Implement robust monitoring: Watch for resource saturation and OOM events
Document resource decisions: Record rationale for resource settings

Common Pitfalls

Avoid these common resource management mistakes:

Identical requests and limits for CPU: Eliminates ability to burst during load spikes
Memory limits too close to requests: Increases OOM kill risk
Setting resources without measurement: Guessing leads to inefficiency
Ignoring resource overhead: Container runtime and system components need resources too
Overcommitting critical nodes: Control plane nodes need stable resources
Resource fragmentation: Small requests can waste resources due to fragmentation
Ignoring pod startup resources: Some applications need more resources during startup
Insufficient headroom: Nodes running at high utilization have no room for bursting

Conclusion

Effective resource management is essential for running reliable, performant, and cost-efficient Kubernetes workloads. By understanding the nuances of resource requests and limits, QoS classes, and scheduling mechanisms, you can create a resource strategy that balances competing priorities.

Start with conservative resource settings based on actual measurements, implement appropriate monitoring, and continuously refine your approach as you gain operational experience. Remember that resource management is not a one-time configuration but an ongoing process that evolves with your applications and infrastructure.

By applying the principles and practices outlined in this guide, you can optimize resource utilization while maintaining reliability, ultimately delivering better performance at lower cost across your Kubernetes environments.

Edit this page

Kubernetes Admission Controllers

Comprehensive guide to Kubernetes Admission Controllers for implementing dynamic security policies and governance controls

Kubernetes Cluster API

Comprehensive guide to Kubernetes Cluster API for declarative infrastructure management and cluster lifecycle operations