KEDA is a fantastic fit for this — it’s basically purpose-built for scaling based on external event sources. Here’s a full breakdown.
1. KEDA vs. built-in HPA
The built-in Horizontal Pod Autoscaler only scales based on metrics already inside Kubernetes (CPU, memory, or custom metrics you’ve already piped into the metrics API). KEDA acts as a metrics adapter that connects external systems — Redis, Kafka, RabbitMQ, PostgreSQL, AWS SQS, and dozens more — directly to the HPA machinery. Under the hood, KEDA creates and manages an HPA for you.
The killer feature: KEDA can scale to zero and back. The standard HPA has a minimum of 1 replica.
2. Setup
First, install KEDA on your EKS cluster:
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --namespace keda --create-namespace
Then create a ScaledObject that targets your Deployment:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: job-processor-scaler
namespace: default
spec:
scaleTargetRef:
name: job-processor # your Deployment name
minReplicaCount: 0 # scale to zero when idle
maxReplicaCount: 20 # cap during spikes
pollingInterval: 15 # check Redis every 15s
cooldownPeriod: 120 # wait 2min before scaling down
triggers:
- type: redis-streams
metadata:
address: redis.default.svc.cluster.local:6379
stream: task-queue # your Redis Stream name
consumerGroup: processors # your consumer group
pendingEntriesCount: "5" # scale up when >5 pending entries per replica
The pendingEntriesCount is the target metric — KEDA will try to maintain roughly 5 pending entries per pod. If there are 50 pending entries, KEDA scales to 10 pods.
3. Scale-to-zero behaviour
Yes, KEDA can scale to zero. When minReplicaCount: 0 and the queue is empty for the duration of cooldownPeriod, KEDA removes all pods. When new messages arrive, KEDA detects them during the next pollingInterval and spins up pods.
One thing to know: scale-from-zero latency includes pod scheduling + container startup + your app’s boot time. For jobs where a 15-30 second delay on the first message is acceptable, this is great. If you need sub-second response, keep minReplicaCount: 1.
4. Avoiding thundering herd
A few strategies that work well in production:
- Set
maxReplicaCount conservatively — better to process a bit slower than overwhelm your database or downstream services.
- Use
advanced.horizontalPodAutoscalerConfig to tune HPA behaviour:
spec:
advanced:
horizontalPodAutoscalerConfig:
behavior:
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 3
periodSeconds: 60
This limits KEDA to adding at most 3 pods per 60 seconds, with a 30-second stabilisation window. Prevents going from 0 to 20 pods instantly.
- Use pod startup probes so new pods don’t receive work until they’re genuinely ready.
- Add a rate limiter in your consumer code so each pod processes at a controlled rate regardless of how many messages are pending.
We’ve been running this pattern in production for about a year with Redis Streams and it’s been rock solid. Happy to answer follow-up questions about monitoring or alerting around this setup.