Traditional Kubernetes autoscaling mechanisms like Horizontal Pod Autoscaler (HPA) primarily rely on CPU and memory metrics to scale workloads. However, modern cloud-native applications often require more sophisticated scaling based on application-specific metrics and event patterns. Kubernetes Event-Driven Autoscaling (KEDA) addresses this need by enabling autoscaling based on event sources and custom metrics:
Event-based scaling : Scale based on the number of events or messages in queues and streamsApplication-specific metrics : Use metrics that directly reflect application workloadZero-to-many scaling : Scale from zero to handle workloads efficientlyDiverse event sources : Support for a wide range of messaging systems and data sourcesCustom metrics : Flexibility to define custom scaling metricsThis guide explores how to implement event-driven autoscaling in Kubernetes environments, enabling more responsive and efficient scaling for diverse workloads.
KEDA consists of several key components that work together to provide event-driven autoscaling:
Controller : The central component that monitors ScaledObjects and manages scaling operationsMetrics Server : Exposes external metrics to the Kubernetes Metrics APIScaledObject : Custom resource that defines scaling rules and triggersScaledJob : Custom resource for event-driven jobs (similar to CronJobs but event-triggered)Scalers : Adapters for different event sources (Kafka, RabbitMQ, Prometheus, etc.)The architecture follows a Kubernetes-native approach:
+-----------------+ +------------------+ +----------------+
| Kubernetes API |<---->| KEDA Controller |<---->| Event Sources |
+-----------------+ +------------------+ +----------------+
^ |
| v
| +------------------+
+--------------->| Metrics Server |
+------------------+
Installing KEDA using Helm:
# Add the KEDA Helm repository
helm repo add kedacore https://kedacore.github.io/charts
# Update your Helm chart repository
helm repo update
# Install KEDA in your cluster
helm install keda kedacore/keda --namespace keda --create-namespace
Alternatively, using YAML manifests:
# Apply the KEDA CRDs and components
kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.10.1/keda-2.10.1.yaml
Verifying the installation:
The ScaledObject is the primary custom resource for defining how a deployment should scale based on event sources:
apiVersion : keda.sh/v1alpha1
kind : ScaledObject
metadata :
name : rabbitmq-scaler
namespace : default
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : consumer-app
pollingInterval : 15
cooldownPeriod : 30
minReplicaCount : 0
maxReplicaCount : 30
triggers :
- type : rabbitmq
metadata :
protocol : amqp
queueName : orders
host : rabbitmq.default.svc.cluster.local
queueLength : '50'
Key fields in the ScaledObject:
scaleTargetRef : References the deployment to scalepollingInterval : How frequently to check the event source (in seconds)cooldownPeriod : Time to wait before scaling down (in seconds)minReplicaCount/maxReplicaCount : Scaling boundariestriggers : Array of event sources that trigger scalingFor batch-oriented workloads, ScaledJob creates Kubernetes Jobs based on events:
apiVersion : keda.sh/v1alpha1
kind : ScaledJob
metadata :
name : kafka-batch-processor
namespace : default
spec :
jobTargetRef :
template :
spec :
containers :
- name : processor
image : my-processor:latest
resources :
requests :
memory : "64Mi"
cpu : "100m"
limits :
memory : "128Mi"
cpu : "200m"
restartPolicy : Never
pollingInterval : 30
maxReplicaCount : 50
successfulJobsHistoryLimit : 10
failedJobsHistoryLimit : 10
triggers :
- type : kafka
metadata :
bootstrapServers : kafka.default.svc.cluster.local:9092
consumerGroup : batch-processor
topic : batch-tasks
lagThreshold : '100'
Scaling based on message queues is one of the most common KEDA use cases:
triggers :
- type : rabbitmq
metadata :
protocol : amqp
queueName : orders
host : rabbitmq.default.svc.cluster.local
queueLength : '50'
# Optional authentication
vhost : '/'
username : user
passwordFromEnv : RABBITMQ_PASSWORD
triggers :
- type : kafka
metadata :
bootstrapServers : kafka.default.svc.cluster.local:9092
consumerGroup : order-processor
topic : orders
lagThreshold : '100'
offsetResetPolicy : latest
triggers :
- type : azure-servicebus
metadata :
queueName : orders
connectionFromEnv : AzureServiceBusConnection
messageCount : '5'
Scaling based on database metrics and storage systems:
triggers :
- type : postgresql
metadata :
connectionFromEnv : POSTGRESQL_CONN_STR
query : "SELECT COUNT(*) FROM tasks WHERE status='pending'"
targetQueryValue : "10"
activationTargetQueryValue : "1"
triggers :
- type : mongodb
metadata :
connectionStringFromEnv : MONGODB_CONN_STR
collection : tasks
database : taskdb
query : '{"status": "pending"}'
queryValue : "10"
triggers :
- type : redis
metadata :
address : redis.default.svc.cluster.local:6379
listName : pending-tasks
listLength : "10"
passwordFromEnv : REDIS_PASSWORD
Using custom Prometheus metrics for scaling:
triggers :
- type : prometheus
metadata :
serverAddress : http://prometheus.monitoring.svc.cluster.local:9090
metricName : http_requests_total
threshold : '100'
query : sum(rate(http_requests_total{app="my-app"}[2m]))
Combining event-driven scaling with time-based patterns:
triggers :
- type : cron
metadata :
timezone : UTC
start : 30 * * * *
end : 45 * * * *
desiredReplicas : "10"
KEDA supports multiple triggers with logical OR behavior - scaling up occurs if any trigger activates:
apiVersion : keda.sh/v1alpha1
kind : ScaledObject
metadata :
name : multi-trigger-scaler
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : processing-service
triggers :
- type : kafka
metadata :
bootstrapServers : kafka.default.svc.cluster.local:9092
consumerGroup : processor
topic : high-priority
lagThreshold : '10'
- type : kafka
metadata :
bootstrapServers : kafka.default.svc.cluster.local:9092
consumerGroup : processor
topic : standard
lagThreshold : '100'
- type : prometheus
metadata :
serverAddress : http://prometheus.monitoring.svc.cluster.local:9090
threshold : '50'
query : sum(rate(processing_queue_length{service="processor"}[2m]))
Secure authentication for various scalers:
Reference secrets via environment variables
triggers :
- type : kafka
metadata :
bootstrapServers : kafka.svc:9092
consumerGroup : order-processor
topic : orders
lagThreshold : '100'
sasl : plaintext
username : user
passwordFromEnv : KAFKA_PASSWORD
Direct reference to Kubernetes secrets
triggers :
- type : postgresql
metadata :
connectionFromEnv : POSTGRESQL_CONN_STR
query : "SELECT COUNT(*) FROM tasks WHERE status='pending'"
targetQueryValue : "10"
passwordFromSecret : postgresql-password
passwordFromSecretKey : password
Secure connections with TLS
triggers :
- type : rabbitmq
metadata :
protocol : amqps
host : rabbitmq.svc:5671
queueName : orders
queueLength : '50'
tls : "enable"
ca : "/mnt/certs/ca.crt"
cert : "/mnt/certs/tls.crt"
key : "/mnt/certs/tls.key"
One of KEDA's powerful features is scaling from zero, which requires special consideration:
apiVersion : keda.sh/v1alpha1
kind : ScaledObject
metadata :
name : zero-to-scale
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : event-processor
minReplicaCount : 0
maxReplicaCount : 10
advanced :
restoreToOriginalReplicaCount : true
horizontalPodAutoscalerConfig :
behavior :
scaleDown :
stabilizationWindowSeconds : 300
policies :
- type : Percent
value : 100
periodSeconds : 15
triggers :
- type : kafka
metadata :
bootstrapServers : kafka.svc:9092
consumerGroup : processor
topic : events
lagThreshold : '1'
Key considerations for scaling from zero:
Activation triggers : Set appropriate threshold for activationCold start time : Account for application startup timeResource provisioning : Ensure resources are available for rapid scalingState management : Handle stateful applications carefullyKEDA works alongside the Horizontal Pod Autoscaler, extending its capabilities:
apiVersion : keda.sh/v1alpha1
kind : ScaledObject
metadata :
name : hybrid-scaling
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : web-api
advanced :
horizontalPodAutoscalerConfig :
name : web-api-hpa
minReplicas : 1
maxReplicas : 20
metrics :
- type : Resource
resource :
name : cpu
target :
type : Utilization
averageUtilization : 50
triggers :
- type : prometheus
metadata :
serverAddress : http://prometheus.monitoring.svc.cluster.local:9090
threshold : '100'
query : sum(rate(http_requests_total{service="web-api"}[2m]))
KEDA integrates with various Kubernetes operators and custom resources:
apiVersion : keda.sh/v1alpha1
kind : TriggerAuthentication
metadata :
name : kafka-trigger-auth
spec :
secretTargetRef :
- parameter : sasl
name : kafka-secrets
key : sasl
- parameter : username
name : kafka-secrets
key : username
- parameter : password
name : kafka-secrets
key : password
---
apiVersion : keda.sh/v1alpha1
kind : ScaledObject
metadata :
name : kafka-scaledobject
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : kafka-consumer
triggers :
- type : kafka
metadata :
bootstrapServers : kafka.svc:9092
consumerGroup : order-processor
topic : orders
lagThreshold : '50'
authenticationRef :
name : kafka-trigger-auth
Monitoring KEDA operations with Prometheus and Grafana:
apiVersion : monitoring.coreos.com/v1
kind : ServiceMonitor
metadata :
name : keda-metrics
namespace : monitoring
spec :
selector :
matchLabels :
app : keda-operator
endpoints :
- port : metrics
Example Prometheus queries for KEDA monitoring:
# Active scaled objects
keda_scaled_object_active_total
# Scale decisions by trigger type
sum(keda_scaled_object_metrics_value) by (type)
# Scaling errors
keda_scaled_object_errors_total
Fine-tuning KEDA for optimal performance:
Balance between responsiveness and resource consumption Default is 30 seconds, reduce for latency-sensitive applications Consider scaler-specific limitations (API rate limits) Prevent scaling thrashing Typically 300 seconds for scale down Shorter for dynamic workloads, longer for stable patterns Set appropriate thresholds based on workload characteristics Consider baseline and peak patterns Implement gradual scaling with multiple trigger points Example of optimized scaling parameters:
apiVersion : keda.sh/v1alpha1
kind : ScaledObject
metadata :
name : optimized-scaler
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : consumer-service
pollingInterval : 15 # Check every 15 seconds
cooldownPeriod : 300 # 5 minutes cooldown before scaling down
advanced :
horizontalPodAutoscalerConfig :
behavior :
scaleUp :
stabilizationWindowSeconds : 0
policies :
- type : Percent
value : 100
periodSeconds : 15
scaleDown :
stabilizationWindowSeconds : 300
policies :
- type : Percent
value : 20
periodSeconds : 60
triggers :
- type : prometheus
metadata :
serverAddress : http://prometheus.monitoring.svc.cluster.local:9090
threshold : '100'
query : sum(rate(processing_queue_length{service="consumer"}[2m]))
Ensuring appropriate resources for KEDA components:
apiVersion : apps/v1
kind : Deployment
metadata :
name : keda-operator
namespace : keda
spec :
# ... other fields
template :
spec :
containers :
- name : keda-operator
image : ghcr.io/kedacore/keda:2.10.1
resources :
requests :
cpu : 100m
memory : 128Mi
limits :
cpu : 1000m
memory : 1Gi
Setting appropriate scaling boundaries with namespace quotas:
apiVersion : v1
kind : ResourceQuota
metadata :
name : pods-high
namespace : scaling-apps
spec :
hard :
pods : "100"
requests.cpu : "10"
requests.memory : 20Gi
limits.cpu : "20"
limits.memory : 40Gi
Event-driven scaling for microservices architecture:
apiVersion : keda.sh/v1alpha1
kind : ScaledObject
metadata :
name : order-processor
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : order-processing-service
minReplicaCount : 0
maxReplicaCount : 20
triggers :
- type : kafka
metadata :
bootstrapServers : kafka.svc:9092
consumerGroup : order-processor
topic : orders
lagThreshold : '10'
Using ScaledJobs for batch processing:
apiVersion : keda.sh/v1alpha1
kind : ScaledJob
metadata :
name : data-processor
spec :
jobTargetRef :
template :
spec :
containers :
- name : data-processor
image : data-processor:latest
env :
- name : BATCH_SIZE
value : "100"
restartPolicy : Never
pollingInterval : 30
maxReplicaCount : 50
successfulJobsHistoryLimit : 5
failedJobsHistoryLimit : 10
triggers :
- type : aws-sqs-queue
metadata :
queueURL : https://sqs.us-east-1.amazonaws.com/123456789012/data-processing-queue
queueLength : "100"
awsRegion : us-east-1
identityOwner : pod
Creating serverless-like experience with KEDA:
apiVersion : keda.sh/v1alpha1
kind : ScaledObject
metadata :
name : api-service
annotations :
autoscaling.keda.sh/cooldown-period : "300"
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : api-service
minReplicaCount : 0
maxReplicaCount : 10
advanced :
restoreToOriginalReplicaCount : true
triggers :
- type : prometheus
metadata :
serverAddress : http://prometheus.monitoring.svc.cluster.local:9090
threshold : '1'
query : sum(rate(http_requests_total{service="api-gateway",path=~"/api/.*"}[2m]))
Diagnosing and resolving KEDA issues:
Check ScaledObject status Verify trigger metrics and thresholds Confirm connectivity to event sources Check authentication credentials Review polling intervals Check for resource constraints Examine scaling behavior configuration Verify event source latency Adjust thresholds based on workload Implement more granular scaling policies Consider adding stabilization windows Evaluate multiple triggers for complex scenarios Example debugging commands:
# Check ScaledObject status
kubectl get scaledobject order-processor -o yaml
# Examine KEDA logs
kubectl logs -n keda -l app=keda-operator
# Check HPA created by KEDA
kubectl get hpa
# Inspect metrics
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/kafka-lag"
Enhanced logging configuration for troubleshooting:
apiVersion : v1
kind : ConfigMap
metadata :
name : keda-logging-config
namespace : keda
data :
KEDA_LOG_LEVEL : debug
---
apiVersion : apps/v1
kind : Deployment
metadata :
name : keda-operator
namespace : keda
spec :
template :
spec :
containers :
- name : keda-operator
envFrom :
- configMapRef :
name : keda-logging-config
The KEDA ecosystem continues to evolve with new capabilities:
HTTP-based scaling : Direct scaling based on HTTP traffic patternsML-based predictive scaling : Using machine learning to predict scaling needsCross-cluster scaling : Coordinating scaling across multiple Kubernetes clustersComposite metrics : Combining multiple metrics with weighted importanceScaling profiles : Time or condition-based scaling profiles for different scenariosKEDA's integration with emerging cloud-native technologies:
# Example of KEDA with Knative
apiVersion : keda.sh/v1alpha1
kind : ScaledObject
metadata :
name : knative-function
spec :
scaleTargetRef :
apiVersion : serving.knative.dev/v1
kind : Service
name : my-function
advanced :
targetPodController : knative-deployment
triggers :
- type : prometheus
metadata :
serverAddress : http://prometheus.monitoring.svc.cluster.local:9090
threshold : '1'
query : sum(rate(function_invocations_total{service="my-function"}[1m]))
Kubernetes Event-Driven Autoscaling represents a significant advancement in the way applications scale in cloud-native environments. By providing a Kubernetes-native way to scale workloads based on actual application demand signals rather than just resource utilization, KEDA enables more responsive, efficient, and cost-effective scaling strategies.
As organizations continue to adopt event-driven architectures and message-based systems, KEDA's ability to scale based on queue depths, custom metrics, and diverse event sources becomes increasingly valuable. The project's growing ecosystem of scalers and integrations ensures that it can adapt to a wide range of application patterns and infrastructures.
By implementing event-driven autoscaling with KEDA, teams can build more resilient applications that efficiently handle variable workloads, scale to zero when idle, and rapidly respond to demand spikes – ultimately delivering better performance and resource utilization in Kubernetes environments.