Kubernetes Event-Driven Autoscaling (KEDA)

Implementing advanced event-driven autoscaling solutions in Kubernetes to scale workloads based on external metrics and event sources

Traditional Kubernetes autoscaling mechanisms like Horizontal Pod Autoscaler (HPA) primarily rely on CPU and memory metrics to scale workloads. However, modern cloud-native applications often require more sophisticated scaling based on application-specific metrics and event patterns. Kubernetes Event-Driven Autoscaling (KEDA) addresses this need by enabling autoscaling based on event sources and custom metrics:

Event-based scaling: Scale based on the number of events or messages in queues and streams
Application-specific metrics: Use metrics that directly reflect application workload
Zero-to-many scaling: Scale from zero to handle workloads efficiently
Diverse event sources: Support for a wide range of messaging systems and data sources
Custom metrics: Flexibility to define custom scaling metrics

This guide explores how to implement event-driven autoscaling in Kubernetes environments, enabling more responsive and efficient scaling for diverse workloads.

Understanding KEDA Architecture

Core Components and Concepts

KEDA consists of several key components that work together to provide event-driven autoscaling:

Controller: The central component that monitors ScaledObjects and manages scaling operations
Metrics Server: Exposes external metrics to the Kubernetes Metrics API
ScaledObject: Custom resource that defines scaling rules and triggers
ScaledJob: Custom resource for event-driven jobs (similar to CronJobs but event-triggered)
Scalers: Adapters for different event sources (Kafka, RabbitMQ, Prometheus, etc.)

The architecture follows a Kubernetes-native approach:

+-----------------+      +------------------+      +----------------+
| Kubernetes API  |<---->| KEDA Controller  |<---->| Event Sources  |
+-----------------+      +------------------+      +----------------+
        ^                        |
        |                        v
        |                +------------------+
        +--------------->| Metrics Server   |
                         +------------------+

Installation and Setup

Installing KEDA using Helm:

# Add the KEDA Helm repository
helm repo add kedacore https://kedacore.github.io/charts

# Update your Helm chart repository
helm repo update

# Install KEDA in your cluster
helm install keda kedacore/keda --namespace keda --create-namespace

Alternatively, using YAML manifests:

# Apply the KEDA CRDs and components
kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.10.1/keda-2.10.1.yaml

Verifying the installation:

kubectl get pods -n keda

Configuring Scalers and Triggers

ScaledObject Resource

The ScaledObject is the primary custom resource for defining how a deployment should scale based on event sources:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-scaler
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: consumer-app
  pollingInterval: 15
  cooldownPeriod: 30
  minReplicaCount: 0
  maxReplicaCount: 30
  triggers:
  - type: rabbitmq
    metadata:
      protocol: amqp
      queueName: orders
      host: rabbitmq.default.svc.cluster.local
      queueLength: '50'

Key fields in the ScaledObject:

scaleTargetRef: References the deployment to scale
pollingInterval: How frequently to check the event source (in seconds)
cooldownPeriod: Time to wait before scaling down (in seconds)
minReplicaCount/maxReplicaCount: Scaling boundaries
triggers: Array of event sources that trigger scaling

ScaledJob Resource

For batch-oriented workloads, ScaledJob creates Kubernetes Jobs based on events:

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: kafka-batch-processor
  namespace: default
spec:
  jobTargetRef:
    template:
      spec:
        containers:
        - name: processor
          image: my-processor:latest
          resources:
            requests:
              memory: "64Mi"
              cpu: "100m"
            limits:
              memory: "128Mi"
              cpu: "200m"
        restartPolicy: Never
  pollingInterval: 30
  maxReplicaCount: 50
  successfulJobsHistoryLimit: 10
  failedJobsHistoryLimit: 10
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.default.svc.cluster.local:9092
      consumerGroup: batch-processor
      topic: batch-tasks
      lagThreshold: '100'

Common Scaling Triggers and Use Cases

Message Queue-Based Scaling

Scaling based on message queues is one of the most common KEDA use cases:

RabbitMQ

Database and Storage-Based Scaling

Scaling based on database metrics and storage systems:

PostgreSQL

Prometheus Metrics-Based Scaling

Using custom Prometheus metrics for scaling:

triggers:
- type: prometheus
  metadata:
    serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
    metricName: http_requests_total
    threshold: '100'
    query: sum(rate(http_requests_total{app="my-app"}[2m]))

Cron-Based Scaling

Combining event-driven scaling with time-based patterns:

triggers:
- type: cron
  metadata:
    timezone: UTC
    start: 30 * * * *
    end: 45 * * * *
    desiredReplicas: "10"

Advanced Configurations

Multiple Triggers and Scaling Rules

KEDA supports multiple triggers with logical OR behavior - scaling up occurs if any trigger activates:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: multi-trigger-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: processing-service
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.default.svc.cluster.local:9092
      consumerGroup: processor
      topic: high-priority
      lagThreshold: '10'
  - type: kafka
    metadata:
      bootstrapServers: kafka.default.svc.cluster.local:9092
      consumerGroup: processor
      topic: standard
      lagThreshold: '100'
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
      threshold: '50'
      query: sum(rate(processing_queue_length{service="processor"}[2m]))

Authentication Patterns

Secure authentication for various scalers:

Environment Variables

Reference secrets via environment variables

triggers:
- type: kafka
  metadata:
    bootstrapServers: kafka.svc:9092
    consumerGroup: order-processor
    topic: orders
    lagThreshold: '100'
    sasl: plaintext
    username: user
    passwordFromEnv: KAFKA_PASSWORD

Kubernetes Secrets

Direct reference to Kubernetes secrets

triggers:
- type: postgresql
  metadata:
    connectionFromEnv: POSTGRESQL_CONN_STR
    query: "SELECT COUNT(*) FROM tasks WHERE status='pending'"
    targetQueryValue: "10"
    passwordFromSecret: postgresql-password
    passwordFromSecretKey: password

TLS Configuration

Secure connections with TLS

triggers:
- type: rabbitmq
  metadata:
    protocol: amqps
    host: rabbitmq.svc:5671
    queueName: orders
    queueLength: '50'
    tls: "enable"
    ca: "/mnt/certs/ca.crt"
    cert: "/mnt/certs/tls.crt"
    key: "/mnt/certs/tls.key"

Scaling from Zero

One of KEDA's powerful features is scaling from zero, which requires special consideration:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: zero-to-scale
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: event-processor
  minReplicaCount: 0
  maxReplicaCount: 10
  advanced:
    restoreToOriginalReplicaCount: true
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 100
            periodSeconds: 15
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.svc:9092
      consumerGroup: processor
      topic: events
      lagThreshold: '1'

Key considerations for scaling from zero:

Activation triggers: Set appropriate threshold for activation
Cold start time: Account for application startup time
Resource provisioning: Ensure resources are available for rapid scaling
State management: Handle stateful applications carefully

Integration with Kubernetes Ecosystem

HPA Compatibility and Interaction

KEDA works alongside the Horizontal Pod Autoscaler, extending its capabilities:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: hybrid-scaling
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  advanced:
    horizontalPodAutoscalerConfig:
      name: web-api-hpa
      minReplicas: 1
      maxReplicas: 20
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 50
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
      threshold: '100'
      query: sum(rate(http_requests_total{service="web-api"}[2m]))

Operators and Custom Resources

KEDA integrates with various Kubernetes operators and custom resources:

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: kafka-trigger-auth
spec:
  secretTargetRef:
  - parameter: sasl
    name: kafka-secrets
    key: sasl
  - parameter: username
    name: kafka-secrets
    key: username
  - parameter: password
    name: kafka-secrets
    key: password
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-scaledobject
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: kafka-consumer
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.svc:9092
      consumerGroup: order-processor
      topic: orders
      lagThreshold: '50'
    authenticationRef:
      name: kafka-trigger-auth

Monitoring and Observability

Monitoring KEDA operations with Prometheus and Grafana:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: keda-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: keda-operator
  endpoints:
  - port: metrics

Example Prometheus queries for KEDA monitoring:

# Active scaled objects
keda_scaled_object_active_total

# Scale decisions by trigger type
sum(keda_scaled_object_metrics_value) by (type)

# Scaling errors
keda_scaled_object_errors_total

Performance Optimization and Best Practices

Tuning Scaling Parameters

Fine-tuning KEDA for optimal performance:

Polling Intervals

Balance between responsiveness and resource consumption
Default is 30 seconds, reduce for latency-sensitive applications
Consider scaler-specific limitations (API rate limits)

Cooldown Periods

Prevent scaling thrashing
Typically 300 seconds for scale down
Shorter for dynamic workloads, longer for stable patterns

Scaling Thresholds

Set appropriate thresholds based on workload characteristics
Consider baseline and peak patterns
Implement gradual scaling with multiple trigger points

Example of optimized scaling parameters:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: optimized-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: consumer-service
  pollingInterval: 15  # Check every 15 seconds
  cooldownPeriod: 300  # 5 minutes cooldown before scaling down
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 0
          policies:
          - type: Percent
            value: 100
            periodSeconds: 15
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 20
            periodSeconds: 60
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
      threshold: '100'
      query: sum(rate(processing_queue_length{service="consumer"}[2m]))

Resource Management

Ensuring appropriate resources for KEDA components:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: keda-operator
  namespace: keda
spec:
  # ... other fields
  template:
    spec:
      containers:
      - name: keda-operator
        image: ghcr.io/kedacore/keda:2.10.1
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 1000m
            memory: 1Gi

Scaling Limits and Quotas

Setting appropriate scaling boundaries with namespace quotas:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: pods-high
  namespace: scaling-apps
spec:
  hard:
    pods: "100"
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi

Real-world Use Cases and Patterns

Microservices Event Processing

Event-driven scaling for microservices architecture:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-processing-service
  minReplicaCount: 0
  maxReplicaCount: 20
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.svc:9092
      consumerGroup: order-processor
      topic: orders
      lagThreshold: '10'

Batch Processing and ETL Pipelines

Using ScaledJobs for batch processing:

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: data-processor
spec:
  jobTargetRef:
    template:
      spec:
        containers:
        - name: data-processor
          image: data-processor:latest
          env:
          - name: BATCH_SIZE
            value: "100"
        restartPolicy: Never
  pollingInterval: 30
  maxReplicaCount: 50
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 10
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789012/data-processing-queue
      queueLength: "100"
      awsRegion: us-east-1
      identityOwner: pod

Serverless-like Workloads

Creating serverless-like experience with KEDA:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: api-service
  annotations:
    autoscaling.keda.sh/cooldown-period: "300"
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicaCount: 0
  maxReplicaCount: 10
  advanced:
    restoreToOriginalReplicaCount: true
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
      threshold: '1'
      query: sum(rate(http_requests_total{service="api-gateway",path=~"/api/.*"}[2m]))

Troubleshooting and Debugging

Common Issues and Solutions

Diagnosing and resolving KEDA issues:

Scaling Not Triggered

Check ScaledObject status
Verify trigger metrics and thresholds
Confirm connectivity to event sources
Check authentication credentials

Scaling Delays

Review polling intervals
Check for resource constraints
Examine scaling behavior configuration
Verify event source latency

Over or Under Scaling

Adjust thresholds based on workload
Implement more granular scaling policies
Consider adding stabilization windows
Evaluate multiple triggers for complex scenarios

Example debugging commands:

# Check ScaledObject status
kubectl get scaledobject order-processor -o yaml

# Examine KEDA logs
kubectl logs -n keda -l app=keda-operator

# Check HPA created by KEDA
kubectl get hpa

# Inspect metrics
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/kafka-lag"

Logging and Diagnostics

Enhanced logging configuration for troubleshooting:

apiVersion: v1
kind: ConfigMap
metadata:
  name: keda-logging-config
  namespace: keda
data:
  KEDA_LOG_LEVEL: debug
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: keda-operator
  namespace: keda
spec:
  template:
    spec:
      containers:
      - name: keda-operator
        envFrom:
        - configMapRef:
            name: keda-logging-config

Future Trends and Ecosystem Evolution

Emerging Patterns

The KEDA ecosystem continues to evolve with new capabilities:

HTTP-based scaling: Direct scaling based on HTTP traffic patterns
ML-based predictive scaling: Using machine learning to predict scaling needs
Cross-cluster scaling: Coordinating scaling across multiple Kubernetes clusters
Composite metrics: Combining multiple metrics with weighted importance
Scaling profiles: Time or condition-based scaling profiles for different scenarios

Integration with Cloud-Native Ecosystem

KEDA's integration with emerging cloud-native technologies:

# Example of KEDA with Knative
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: knative-function
spec:
  scaleTargetRef:
    apiVersion: serving.knative.dev/v1
    kind: Service
    name: my-function
  advanced:
    targetPodController: knative-deployment
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
      threshold: '1'
      query: sum(rate(function_invocations_total{service="my-function"}[1m]))

Conclusion

Kubernetes Event-Driven Autoscaling represents a significant advancement in the way applications scale in cloud-native environments. By providing a Kubernetes-native way to scale workloads based on actual application demand signals rather than just resource utilization, KEDA enables more responsive, efficient, and cost-effective scaling strategies.

As organizations continue to adopt event-driven architectures and message-based systems, KEDA's ability to scale based on queue depths, custom metrics, and diverse event sources becomes increasingly valuable. The project's growing ecosystem of scalers and integrations ensures that it can adapt to a wide range of application patterns and infrastructures.

By implementing event-driven autoscaling with KEDA, teams can build more resilient applications that efficiently handle variable workloads, scale to zero when idle, and rapidly respond to demand spikes – ultimately delivering better performance and resource utilization in Kubernetes environments.

Edit this page

Kubernetes for AI/ML Workloads

Deploying, managing, and scaling AI and machine learning workloads on Kubernetes

Kubernetes FinOps and Cost Management

Implementing effective financial operations and cost optimization strategies for Kubernetes environments

On this page

Introduction to Event-Driven Autoscaling
Understanding KEDA Architecture
- Core Components and Concepts
- Installation and Setup
Configuring Scalers and Triggers
- ScaledObject Resource
- ScaledJob Resource
Common Scaling Triggers and Use Cases
Advanced Configurations
Integration with Kubernetes Ecosystem
Performance Optimization and Best Practices
Real-world Use Cases and Patterns
Troubleshooting and Debugging
Future Trends and Ecosystem Evolution
- Emerging Patterns
- Integration with Cloud-Native Ecosystem
Conclusion

Star on GitHub Create Issues