Kubernetes Resource Requests & Limits
Comprehensive guide to configuring, optimizing, and managing compute resources in Kubernetes clusters for improved performance and cost efficiency
Introduction to Kubernetes Resource Management
Resource management is a fundamental aspect of Kubernetes that determines how compute resources (CPU, memory, and other types) are allocated, utilized, and constrained across workloads. Proper resource configuration directly impacts:
- Cluster stability: Prevent resource starvation and cascade failures
- Application performance: Ensure workloads have sufficient resources to meet requirements
- Cost efficiency: Optimize resource utilization and reduce cloud spending
- Scaling predictability: Create consistent, reliable scaling behavior
- Multi-tenant isolation: Fairly distribute resources across teams and applications
This comprehensive guide explores the concepts, implementation patterns, and best practices for configuring resource requests and limits in Kubernetes environments, helping you achieve the right balance between performance, reliability, and cost efficiency.
Resource Requests and Limits Fundamentals
Core Concepts
Kubernetes provides two primary mechanisms for resource management:
- Requests: The minimum amount of resources that the scheduler guarantees to a container
- Limits: The maximum amount of resources that a container can consume
The differences between requests and limits are critical to understand:
Resource Requests | Resource Limits |
---|---|
Used by scheduler to find a suitable node | Used by kubelet to enforce upper bounds |
Never exceeds node capacity during scheduling | Can be set higher than actual node capacity |
Guaranteed to be available to the container | Container cannot exceed these values |
Determines pod QoS class | Affects OOM score and termination priority |
Used for initial resource allocation | Used for resource constraint enforcement |
Resource Units and Notation
Kubernetes uses specific units for resource quantities:
QoS Classes and Pod Priority
Quality of Service (QoS) Classes
Kubernetes automatically assigns one of three QoS classes to pods based on their resource configurations:
These QoS classes determine eviction order during node resource pressure:
- Guaranteed: Highest priority, only killed if exceeding memory limits or in critical system failures
- Burstable: Medium priority, killed before Guaranteed pods when node is under memory pressure
- BestEffort: Lowest priority, first to be killed when node is under memory pressure
Pod Priority and Preemption
For finer control over pod importance, use Pod Priority:
Higher priority pods can preempt (evict) lower priority pods during scheduling constraints, ensuring critical workloads get resources first.
Resource Management in Practice
Setting Appropriate Requests and Limits
Determining optimal resource settings requires a methodical approach:
- Performance testing: Measure actual resource usage under various loads
- Monitoring data analysis: Review historical resource utilization patterns
- Application profiling: Understand the resource consumption characteristics
- Overhead accounting: Add buffer for JVM, runtime environment, etc.
- Growth projection: Account for expected usage increases
Guidelines for common workload types:
Workload Type | CPU Request | CPU Limit | Memory Request | Memory Limit |
---|---|---|---|---|
Web service | p50 + 20% | 2-3x request | p90 + 30% | 1.5-2x request |
Batch job | Average | 2x request | Peak + 10% | 1.2x request |
Database | p90 | None or 2x | p99 | 1.5x request |
Cache | p50 | 2x request | Fixed size | Same as request |
Resource Quotas
ResourceQuotas limit the total resources used within a namespace:
This creates a resource boundary for a team or application, preventing one group from consuming all cluster resources.
LimitRanges
LimitRanges define default, minimum, and maximum resource values at the namespace level:
LimitRanges ensure consistent resource configurations across a namespace and prevent extremes (too small or too large).
Resource Monitoring and Optimization
Monitoring Resource Usage
Effective resource management requires comprehensive monitoring:
For deeper insights, implement dedicated monitoring tools:
- Prometheus + Grafana: Industry standard for Kubernetes monitoring
- Kubernetes Dashboard: Simple built-in visualization
- Kubernetes Resource Report: Cost and utilization analysis
- Vertical Pod Autoscaler in recommendation mode: Automated resource suggestion
Vertical Pod Autoscaler
The Vertical Pod Autoscaler (VPA) automatically adjusts resource requests:
VPA modes provide flexibility in how recommendations are applied:
- Off: Only generates recommendations, no automatic updates
- Initial: Sets resources only at pod creation time
- Auto: Automatically updates pod resources (requires pod restart)
- Recreate: Like Auto, but forces pod recreation for updates
Resource Optimization Strategies
Implement these strategies to optimize resource usage:
- Right-sizing: Adjust requests and limits to match actual usage patterns
- Bin-packing improvement: Set appropriate requests to optimize node utilization
- Workload segregation: Separate resource-intensive and lightweight workloads
- Node affinity tuning: Place workloads on appropriate node types
- Cost-aware scheduling: Consider resource costs in placement decisions
Advanced Resource Management
CPU Management Policies
Configure CPU management for performance-sensitive workloads:
Kubelet CPU manager policies:
- none: Default policy, shared CPU scheduling
- static: Exclusive allocation of entire CPU cores for Guaranteed QoS pods with integer CPU requests
Enable with kubelet configuration:
Memory Management Policies
Optimize memory allocation and utilization:
Memory management strategies:
- Huge pages: Allocate large memory pages for performance
- Memory QoS: Fine-tune memory quality of service with cgroups v2
Resource Overcommitment
Strategic overcommitment can improve resource utilization:
Overcommitment considerations:
- Risk assessment: Evaluate the impact of resource contention
- Workload compatibility: Identify which workloads tolerate resource pressure
- Safety mechanisms: Implement monitoring and alerts for resource saturation
- QoS differentiation: Use QoS classes to protect critical workloads
- Node segregation: Isolate overcommitted workloads on specific nodes
Resource-Based Autoscaling
Horizontal Pod Autoscaler
Scale replicas based on resource utilization:
HPA best practices:
- Appropriate thresholds: Set realistic utilization targets (typically 60-80%)
- Scale behavior tuning: Configure stabilization windows to prevent thrashing
- Multiple metrics: Use both CPU and memory metrics for balanced scaling
- Custom metrics: Add application-specific metrics for more precise scaling
Cluster Autoscaler
Automatically adjust cluster size based on pod resource requirements:
Cluster Autoscaler optimizes node resources by:
- Adding nodes: When pods can't be scheduled due to insufficient resources
- Removing nodes: When node utilization is below threshold for an extended period
- Respecting pod disruption budgets: Ensuring availability during scale-down
- Considering node taints/affinities: Respecting workload placement constraints
Resource-Aware Scheduling
Node Affinity and Anti-Affinity
Place pods on nodes with appropriate resources:
Affinity rules help optimize resource placement by:
- Hardware targeting: Place workloads on nodes with specialized hardware
- Resource efficiency: Match workload requirements to appropriate node types
- Cost optimization: Direct high-resource pods to cost-optimized nodes
- Performance isolation: Separate resource-intensive workloads
Taints and Tolerations
Reserve nodes for specific resource-intensive workloads:
This approach reserves specialized nodes for workloads that need them, preventing resource contention.
Custom Schedulers
For complex resource allocation needs, implement custom schedulers:
Custom schedulers can implement advanced resource strategies:
- Topology-aware placement: Consider hardware topology (NUMA, cache)
- Resource fragmentation prevention: Optimize bin-packing algorithms
- Workload-specific optimizations: Special handling for specific application types
- Cost-aware scheduling: Balance resource needs against infrastructure costs
Resource Management Best Practices
Production Guidelines
Follow these best practices for production environments:
- Always set requests and limits: Never deploy pods without resource specifications
- Be conservative with requests: Set requests to p95 of normal usage
- Set memory limits carefully: Memory limits that are too low cause terminations
- Consider not setting CPU limits: CPU is compressible and limits can reduce burst capacity
- Implement namespace quotas: Prevent resource monopolization by teams/apps
- Use appropriate QoS classes: Match QoS to workload criticality
- Review and adjust regularly: Resource needs change as applications evolve
- Implement robust monitoring: Watch for resource saturation and OOM events
- Document resource decisions: Record rationale for resource settings
Common Pitfalls
Avoid these common resource management mistakes:
- Identical requests and limits for CPU: Eliminates ability to burst during load spikes
- Memory limits too close to requests: Increases OOM kill risk
- Setting resources without measurement: Guessing leads to inefficiency
- Ignoring resource overhead: Container runtime and system components need resources too
- Overcommitting critical nodes: Control plane nodes need stable resources
- Resource fragmentation: Small requests can waste resources due to fragmentation
- Ignoring pod startup resources: Some applications need more resources during startup
- Insufficient headroom: Nodes running at high utilization have no room for bursting
Conclusion
Effective resource management is essential for running reliable, performant, and cost-efficient Kubernetes workloads. By understanding the nuances of resource requests and limits, QoS classes, and scheduling mechanisms, you can create a resource strategy that balances competing priorities.
Start with conservative resource settings based on actual measurements, implement appropriate monitoring, and continuously refine your approach as you gain operational experience. Remember that resource management is not a one-time configuration but an ongoing process that evolves with your applications and infrastructure.
By applying the principles and practices outlined in this guide, you can optimize resource utilization while maintaining reliability, ultimately delivering better performance at lower cost across your Kubernetes environments.