How WhatsApp's Typing Indicator Actually Works
That tiny "typing…" bubble seems trivial. Underneath it is a presence system serving 2 billion users — with sub-100ms latency, mobile reconnect storms, and Redis TTLs doing the heavy lifting.
The tiny bubble hiding a distributed system
You open WhatsApp. A friend starts typing. Within milliseconds, three animated dots appear on your screen — even though your phone and theirs might be on different continents, different networks, different carriers.
That "typing…" indicator feels instant and effortless. But underneath it, WhatsApp has to solve a genuinely hard problem: how do you broadcast a transient, real-time event to a specific person across a system serving 100+ million concurrent connections?
The core challenge: A typing event lasts ~2 seconds. It can't be stored in a database. It can't wait for HTTP polling. It needs to travel from Alice's phone → WhatsApp servers → Bob's phone in under 100ms — and silently expire if Alice stops typing.
What makes this hard at scale
The typing indicator isn't just a notification — it's a presence event. And presence systems are famously one of the hardest things to scale in distributed systems.
Three constraints working against you
Latency: Users expect to see the indicator appear within ~100ms of their contact starting to type. HTTP polling every 100ms for 2 billion users would destroy any backend.
Ephemerality: Typing state isn't permanent. If Alice closes WhatsApp without sending her message, Bob should stop seeing "typing…" within ~3 seconds — even if no explicit "stopped typing" event fires.
Scale: WhatsApp handles 100M+ concurrent users. Each user might be in a dozen active chats. You can't have one server track everyone's presence — that server would be a single point of failure with petabytes of state.
The system that makes it work
WhatsApp's typing indicator runs on a WebSocket + Redis Pub/Sub foundation. Here's the full flow — click any node to inspect it, or watch the packet animation.
How each component works
WebSockets — the persistent pipe
WhatsApp keeps a persistent WebSocket connection between your phone and the nearest edge server. Unlike HTTP, WebSockets maintain an open TCP connection — meaning the server can push events to your device at any time without you asking.
When Alice types, her phone sends a tiny COMPOSING event over this connection. No HTTP request, no headers overhead — just a few bytes over an already-open socket.
Redis Pub/Sub — the broadcast layer
Alice and Bob are almost certainly connected to different gateway servers. So how does Alice's typing event reach Bob's server? Redis Pub/Sub.
Each chat conversation has a Redis channel. When Alice's gateway receives a COMPOSING event, it publishes to that channel. Bob's gateway — subscribed to the same channel — instantly receives it and pushes it to Bob's WebSocket.
TTL — the silent expiry
This is the elegant part. When Alice's gateway publishes the composing event, it also sets a Redis key with a ~3 second TTL: presence:alice:chatId. When Alice sends the message or stops typing, the key is deleted. If she just drops her phone, the key expires automatically — and Bob's client sees the indicator disappear after 3 seconds.
No explicit "stopped typing" event needed. The TTL handles it.
Sharding the presence layer
One Redis instance can't hold presence for 2 billion users. WhatsApp shards by conversation ID — each shard owns a range of chat IDs. The gateway hashes the conversation ID to find the right Redis shard, ensuring no single shard is a bottleneck.
What breaks — and how it recovers
Click a failure to simulate how the system responds:
Redis shard failure
If a Redis shard goes down, presence events for conversations on that shard stop propagating. WhatsApp's mitigation: typing indicators simply don't appear during the outage. Message delivery is unaffected — messages use a separate durable store (likely a custom fork of Mnesia or Cassandra). Presence is treated as best-effort.
Reconnect storms
When a mobile network drops and millions of phones reconnect simultaneously (after a subway exit, a concert ending), every device tries to re-establish its WebSocket. This creates a thundering herd against the gateway layer. WhatsApp uses exponential backoff with jitter — each client waits a random delay before reconnecting, spreading the load over 30–60 seconds.
Mobile network instability
On cellular, connections drop constantly. The client maintains a local "composing" state machine — if the WebSocket drops mid-typing, it re-sends the COMPOSING event on reconnect. The server-side TTL ensures correctness: worst case, Bob sees "typing…" for an extra 3 seconds.
How the architecture evolves with scale
Drag the slider to see which components get introduced as user count grows:
Why not just use polling?
| Approach | Latency | Server load | Battery impact | Used for |
|---|---|---|---|---|
| Short polling | High (1–5s) | Very high | High | Old-school notifications |
| Long polling | Medium (0.5–2s) | High | Medium | Legacy chat apps |
| SSE | Low | Low | Low | One-way push (news feeds) |
| WebSockets ✓ | Very low (<50ms) | Low | Low | Chat, gaming, realtime |
SSE (Server-Sent Events) is compelling — it's simpler than WebSockets and handles one-way push well. But typing indicators need bidirectional communication: Alice's phone sends composing events AND receives Bob's. SSE is unidirectional — you'd need a hybrid approach. WebSockets win.
How to ace this in a system design interview
-
Design a typing indicator for WhatsApp at scaleStart with the data model: a composing event is ephemeral, not persistent — this immediately points you away from a database and toward an in-memory store with TTL. Then reason about the connection model: 2B users need persistent connections, which means WebSockets. Then tackle fan-out: Alice's server ≠ Bob's server, so you need a pub/sub layer between gateways. Redis Pub/Sub per conversation ID is the clean answer. Mention TTL-based expiry as the elegant solution to "stopped typing" detection. Finish with sharding: hash conversation IDs across Redis clusters.
-
How would you scale a presence system to 100M concurrent users?Key insight: presence data is read-heavy and write-heavy simultaneously — every user generates heartbeats, every read fan-outs to multiple subscribers. The answer: shard by user ID or conversation ID across a Redis cluster. Use consistent hashing so resharding doesn't require full redistribution. Add a presence aggregation layer that batches heartbeats before writing to Redis. For read scaling, consider local presence caches on gateways with short TTLs (500ms) to avoid Redis for every query.
-
What happens if the Redis presence shard goes down?Presence is best-effort by design. If a shard goes down, typing indicators stop working for conversations on that shard — but message delivery is unaffected (messages live in a durable store). Recovery: Redis Sentinel or Redis Cluster handles automatic failover in ~30 seconds. During the outage, clients gracefully degrade — typing indicators just don't appear. This is a deliberate product decision: it's better to silently drop presence events than to delay messages or fail loudly.
-
How do you handle the reconnect storm problem?When a mass reconnect happens (subway exit, stadium event), millions of devices hit your gateways simultaneously. Three mitigations: (1) Exponential backoff with jitter on the client — each device waits a random 0–30 seconds before reconnecting, spreading load over time. (2) Gateway auto-scaling with pre-warmed instances — the gateway layer should scale horizontally and have capacity headroom for reconnect bursts. (3) Rate limiting per IP at the load balancer — reject excess reconnect attempts with a Retry-After header so clients know when to retry.
What actually gets stored — and what doesn't
The typing indicator is deliberately stateless in the database. Nothing about "Alice is typing" ever touches disk. Here's the full schema picture — click any table to explore its fields and design rationale.
Key insight: The presence_events store is Redis — not a relational table. It lives entirely in memory with a TTL. If Redis restarts, all presence data is gone. That's intentional — stale presence is worse than no presence.
Redis key structure
-- Presence key (expires in 3s) SET presence:{conv_id}:{user_id} "composing" EX 3 -- Gateway subscription (per conversation) SUBSCRIBE chat:{conv_id} -- Publishing a composing event PUBLISH chat:{conv_id} '{"type":"composing","from":"alice_id","ts":1700000000}' -- Paused / message sent → delete key DEL presence:{conv_id}:{user_id}
The full system at a glance
The HLD shows every major service boundary, how data flows across them, and which components can fail independently. Use the layer toggle to inspect each tier.
Inside the gateway — sequence by sequence
The LLD walks through the exact method calls, state transitions, and message formats when a COMPOSING event travels end to end. Step through each phase:
Every design decision has a cost
The choices WhatsApp made aren't universally correct — they're optimised for their specific constraints. Use the radar chart to compare approaches, then explore each decision below.