Skip to main content

Command Palette

Search for a command to run...

The Thundering Herd Problem in Distributed Systems

Updated
6 min read
J
Passionate about designing robust APIs and scalable backend systems that bring ideas to life Always Learning and exploring new technologies Let's build something amazing together! As a backend web developer, I'm driven by a passion for crafting powerful and efficient server-side systems that deliver seamless digital experiences.

In distributed systems, performance bottlenecks are not easily caused by one request. They normally take place when numerous clients are connecting to a single resource at the same time and flood common infrastructure. This is what is referred to as the Thundering Herd Problem.

This is a design problem that is repeatedly encountered in caching systems, load balancers, databases, message queues and operating systems.

A Simple Analogy: Store Opening and Crowd Rush

Imagine a store that opens at 10:00 AM.

A crowd of 5,000 people is waiting outside.

The moment the shutter opens everyone rushes in simultaneously.

  • The entrance gets blocked

  • The billing counter gets overwhelmed

  • The system slows down

The problem is not the number of people.

The problem is that they moved together.

That is exactly what happens in distributed systems.

What Is the Thundering Herd Problem?

Thundering Herd Problem happens when several processes, threads or even clients are woken up or restarting and contend over a shared and limited resource.

The result is typically:

  • Sudden traffic spikes

  • Resource contention

  • Increased latency

  • Cascading failures

  • System instability

It was named after the operating systems, in which the term is used to mean the simultaneous awakening of multiple processes that are waiting on the same event, then most of them immediately block again.

A Small Architecture Example

Client → Application → Cache → Database

Normal flow:

  1. Client requests data

  2. App checks cache

  3. If found → return

  4. If not → fetch from DB and store in cache

What Happens When Cache TTL Expires?

Let’s assume:

  • A popular key expires

  • 100,000 users request it simultaneously

What happens?

  • All miss the cache

  • All hit the database

  • DB gets overwhelmed

  • Latency increases

  • Clients retry

  • System collapses

This specific case is called a Cache Stampede a practical form of the Thundering Herd Problem.

Canonical Example: Cache Stampede

Take a normal read heavy architecture:

  1. Clients request data.

  2. Application checks (e.g., Redis) cache.

  3. In the event that it is missing, it makes a query to the database.

  4. Response has been stored to be used in the future.

Now assume:

  • A popular key expires.

  • It is ordered by 100,000 clients simultaneously.

  • All of them miss the cache.

  • They all struck the database simultaneously.

The database gets congested. Latency increases. Downstream service deteriorates.

This particular expression is commonly referred to as a Cache Stampede a practical example of the Thundering Herd Problem.

Normal Traffic Spike vs Thundering Herd

Normal Spike Thundering Herd
Gradual increase Sudden burst
Traffic distributed over time Traffic synchronized
System can autoscale System collapses quickly
Predictable Often triggered by event (expiry/recovery)

The difference is correlation.

Other Real-World Manifestations

Retry Storm After Service Recovery

When a service fails, clients may retry aggressively. Once the service recovers, a synchronized retry wave can overwhelm it again.

Lock Contention in Multithreaded Systems

In operating systems, if multiple threads are blocked on a mutex or I/O event and all are awakened simultaneously, CPU contention spikes and context switching increases.

Leader Election and Distributed Coordination

In systems using consensus protocols (e.g., Raft-like patterns), simultaneous retries or re-election attempts can generate herd behavior under network instability.

Why It Is Dangerous

The core issue is synchronized behavior.

Distributed systems are resilient when load is smooth and predictable. They fail when:

  • Load increases non-linearly

  • Requests become correlated

  • Backpressure mechanisms are absent

A thundering herd often leads to cascading failures:

  1. Cache expires

  2. DB overloaded

  3. Response times increase

  4. Clients retry

  5. System collapses

This feedback loop is particularly dangerous in microservices architectures.

Mitigation Strategies

Professional systems rarely rely on a single mitigation. Instead, they apply multiple defensive layers.

Request Coalescing (Single-Flight Pattern)

Only one request rebuilds the missing resource. Other concurrent requests wait for the same result.

Pseudo-logic:

if value exists in cache:
    return value
else:
    acquire per-key lock
    if value still missing:
        rebuild value
    release lock
    return value

This pattern ensures the database receives only one regeneration request.

Probabilistic Expiration (Jitter)

Instead of fixed TTL values:

TTL = 60 seconds

Use:

TTL = 60 + random(0–10 seconds)

This prevents synchronized expiry across multiple keys.

Soft Expiration + Background Refresh

Instead of expiring data immediately:

  • Serve slightly stale data.

  • Trigger asynchronous refresh in background.

  • Avoid blocking user requests.

This is widely used in CDN architectures and large-scale e-commerce platforms.

Exponential Backoff with Jitter

For retries:

1s → 2s → 4s → 8s → 16s (with randomization)

Backoff ensures retries are distributed over time rather than synchronized.

Rate Limiting and Load Shedding

When load exceeds capacity:

  • Reject excess requests early.

  • Use circuit breakers.

  • Provide fallback responses.

Failing fast is often safer than degrading gradually.

Distributed Locks (With Caution)

Using systems like Redis-based locks can prevent multiple rebuilds. However:

  • Lock contention itself can become a bottleneck.

  • Incorrect implementation may cause deadlocks.

  • Distributed locks require strict timeouts.

Architectural Perspective

In well-designed systems:

  • Read spikes are absorbed on the caching layers.

  • Load balancers spread the traffic equally.

  • Rate limiters offer security to downstream services.

  • Circuit breakers do not allow cascading failures.

  • Observability identifies coordinating spikes in time.

It is not aimed at removing spikes it is aimed at de-correlating traffic and adding controlled degradation.

Interview-Ready Explanation

The Thundering Herd Problem arises when several clients are competing on the same resource as it becomes available, causing a sharp increase in the load and resulting in the possible system failure.

It usually occurs on cache expiry, service recovery or a release of lock.

The mitigation strategies are request coalescing, probabilistic expiration, background refresh, exponential backoff and rate limiting.

Key Takeaways

  • The root cause is synchronized behavior.

  • Cache expiration is the most common trigger.

  • Retries without backoff amplify the problem.

  • Mitigation requires layered defense, not a single fix.

  • Observability and proactive load management are critical.