Realtime Systems WebSockets Redis

How WhatsApp's Typing Indicator Actually Works

That tiny "typing…" bubble seems trivial. Underneath it is a presence system serving 2 billion users — with sub-100ms latency, mobile reconnect storms, and Redis TTLs doing the heavy lifting.

8 min read WebSockets · Redis Pub/Sub · Presence systems System design depth

01 · Hook

The tiny bubble hiding a distributed system

You open WhatsApp. A friend starts typing. Within milliseconds, three animated dots appear on your screen — even though your phone and theirs might be on different continents, different networks, different carriers.

That "typing…" indicator feels instant and effortless. But underneath it, WhatsApp has to solve a genuinely hard problem: how do you broadcast a transient, real-time event to a specific person across a system serving 100+ million concurrent connections?

The core challenge: A typing event lasts ~2 seconds. It can't be stored in a database. It can't wait for HTTP polling. It needs to travel from Alice's phone → WhatsApp servers → Bob's phone in under 100ms — and silently expire if Alice stops typing.

02 · Problem Statement

What makes this hard at scale

The typing indicator isn't just a notification — it's a presence event. And presence systems are famously one of the hardest things to scale in distributed systems.

Three constraints working against you

Latency: Users expect to see the indicator appear within ~100ms of their contact starting to type. HTTP polling every 100ms for 2 billion users would destroy any backend.

Ephemerality: Typing state isn't permanent. If Alice closes WhatsApp without sending her message, Bob should stop seeing "typing…" within ~3 seconds — even if no explicit "stopped typing" event fires.

Scale: WhatsApp handles 100M+ concurrent users. Each user might be in a dozen active chats. You can't have one server track everyone's presence — that server would be a single point of failure with petabytes of state.

03 · Architecture

The system that makes it work

WhatsApp's typing indicator runs on a WebSocket + Redis Pub/Sub foundation. Here's the full flow — click any node to inspect it, or watch the packet animation.

Click any node to inspect · Watch packets flow in real-time

04 · Engineering Breakdown

How each component works

WebSockets — the persistent pipe

WhatsApp keeps a persistent WebSocket connection between your phone and the nearest edge server. Unlike HTTP, WebSockets maintain an open TCP connection — meaning the server can push events to your device at any time without you asking.

When Alice types, her phone sends a tiny COMPOSING event over this connection. No HTTP request, no headers overhead — just a few bytes over an already-open socket.

Redis Pub/Sub — the broadcast layer

Alice and Bob are almost certainly connected to different gateway servers. So how does Alice's typing event reach Bob's server? Redis Pub/Sub.

Each chat conversation has a Redis channel. When Alice's gateway receives a COMPOSING event, it publishes to that channel. Bob's gateway — subscribed to the same channel — instantly receives it and pushes it to Bob's WebSocket.

TTL — the silent expiry

This is the elegant part. When Alice's gateway publishes the composing event, it also sets a Redis key with a ~3 second TTL: presence:alice:chatId. When Alice sends the message or stops typing, the key is deleted. If she just drops her phone, the key expires automatically — and Bob's client sees the indicator disappear after 3 seconds.

No explicit "stopped typing" event needed. The TTL handles it.

Sharding the presence layer

One Redis instance can't hold presence for 2 billion users. WhatsApp shards by conversation ID — each shard owns a range of chat IDs. The gateway hashes the conversation ID to find the right Redis shard, ensuring no single shard is a bottleneck.

05 · Failure Scenarios

What breaks — and how it recovers

Click a failure to simulate how the system responds:

— Select a failure scenario

Redis shard failure

If a Redis shard goes down, presence events for conversations on that shard stop propagating. WhatsApp's mitigation: typing indicators simply don't appear during the outage. Message delivery is unaffected — messages use a separate durable store (likely a custom fork of Mnesia or Cassandra). Presence is treated as best-effort.

Reconnect storms

When a mobile network drops and millions of phones reconnect simultaneously (after a subway exit, a concert ending), every device tries to re-establish its WebSocket. This creates a thundering herd against the gateway layer. WhatsApp uses exponential backoff with jitter — each client waits a random delay before reconnecting, spreading the load over 30–60 seconds.

Mobile network instability

On cellular, connections drop constantly. The client maintains a local "composing" state machine — if the WebSocket drops mid-typing, it re-sends the COMPOSING event on reconnect. The server-side TTL ensures correctness: worst case, Bob sees "typing…" for an extra 3 seconds.

06 · Scale Simulator

How the architecture evolves with scale

Drag the slider to see which components get introduced as user count grows:

1K users100K10M100M2B users

Architecture

Single server

Presence store

Single Redis

Gateway layer

1 gateway

Single server handles WebSocket connections directly. Redis runs on the same machine. Fine for a side project.

07 · Alternative Designs

Why not just use polling?

Approach	Latency	Server load	Battery impact	Used for
Short polling	High (1–5s)	Very high	High	Old-school notifications
Long polling	Medium (0.5–2s)	High	Medium	Legacy chat apps
SSE	Low	Low	Low	One-way push (news feeds)
WebSockets ✓	Very low (<50ms)	Low	Low	Chat, gaming, realtime

SSE (Server-Sent Events) is compelling — it's simpler than WebSockets and handles one-way push well. But typing indicators need bidirectional communication: Alice's phone sends composing events AND receives Bob's. SSE is unidirectional — you'd need a hybrid approach. WebSockets win.

13 · Interview Questions

How to ace this in a system design interview

Design a typing indicator for WhatsApp at scale

Start with the data model: a composing event is ephemeral, not persistent — this immediately points you away from a database and toward an in-memory store with TTL. Then reason about the connection model: 2B users need persistent connections, which means WebSockets. Then tackle fan-out: Alice's server ≠ Bob's server, so you need a pub/sub layer between gateways. Redis Pub/Sub per conversation ID is the clean answer. Mention TTL-based expiry as the elegant solution to "stopped typing" detection. Finish with sharding: hash conversation IDs across Redis clusters.
How would you scale a presence system to 100M concurrent users?

Key insight: presence data is read-heavy and write-heavy simultaneously — every user generates heartbeats, every read fan-outs to multiple subscribers. The answer: shard by user ID or conversation ID across a Redis cluster. Use consistent hashing so resharding doesn't require full redistribution. Add a presence aggregation layer that batches heartbeats before writing to Redis. For read scaling, consider local presence caches on gateways with short TTLs (500ms) to avoid Redis for every query.
What happens if the Redis presence shard goes down?

Presence is best-effort by design. If a shard goes down, typing indicators stop working for conversations on that shard — but message delivery is unaffected (messages live in a durable store). Recovery: Redis Sentinel or Redis Cluster handles automatic failover in ~30 seconds. During the outage, clients gracefully degrade — typing indicators just don't appear. This is a deliberate product decision: it's better to silently drop presence events than to delay messages or fail loudly.
How do you handle the reconnect storm problem?

When a mass reconnect happens (subway exit, stadium event), millions of devices hit your gateways simultaneously. Three mitigations: (1) Exponential backoff with jitter on the client — each device waits a random 0–30 seconds before reconnecting, spreading load over time. (2) Gateway auto-scaling with pre-warmed instances — the gateway layer should scale horizontally and have capacity headroom for reconnect bursts. (3) Rate limiting per IP at the load balancer — reject excess reconnect attempts with a Retry-After header so clients know when to retry.

09 · Data Model

What actually gets stored — and what doesn't

The typing indicator is deliberately stateless in the database. Nothing about "Alice is typing" ever touches disk. Here's the full schema picture — click any table to explore its fields and design rationale.

← Click a table to inspect its schema

Key insight: The presence_events store is Redis — not a relational table. It lives entirely in memory with a TTL. If Redis restarts, all presence data is gone. That's intentional — stale presence is worse than no presence.

Redis key structure

-- Presence key (expires in 3s)
SET presence:{conv_id}:{user_id} "composing" EX 3

-- Gateway subscription (per conversation)
SUBSCRIBE chat:{conv_id}

-- Publishing a composing event
PUBLISH chat:{conv_id} '{"type":"composing","from":"alice_id","ts":1700000000}'

-- Paused / message sent → delete key
DEL presence:{conv_id}:{user_id}

10 · High-Level Design

The full system at a glance

The HLD shows every major service boundary, how data flows across them, and which components can fail independently. Use the layer toggle to inspect each tier.

Client layer Edge / Gateway Presence layer Storage layer Async / best-effort

11 · Low-Level Design

Inside the gateway — sequence by sequence

The LLD walks through the exact method calls, state transitions, and message formats when a COMPOSING event travels end to end. Step through each phase:

1 / 6

12 · Tradeoffs

Every design decision has a cost

The choices WhatsApp made aren't universally correct — they're optimised for their specific constraints. Use the radar chart to compare approaches, then explore each decision below.

WebSocket + Redis SSE + Polling Pure Polling

Next deep dive →

How Slack's Active Status Works

→