Realtime Systems WebSockets Redis

How WhatsApp's Typing Indicator Actually Works

That tiny "typing…" bubble seems trivial. Underneath it is a presence system serving 2 billion users — with sub-100ms latency, mobile reconnect storms, and Redis TTLs doing the heavy lifting.

8 min read WebSockets · Redis Pub/Sub · Presence systems System design depth

The tiny bubble hiding a distributed system

You open WhatsApp. A friend starts typing. Within milliseconds, three animated dots appear on your screen — even though your phone and theirs might be on different continents, different networks, different carriers.

That "typing…" indicator feels instant and effortless. But underneath it, WhatsApp has to solve a genuinely hard problem: how do you broadcast a transient, real-time event to a specific person across a system serving 100+ million concurrent connections?

The core challenge: A typing event lasts ~2 seconds. It can't be stored in a database. It can't wait for HTTP polling. It needs to travel from Alice's phone → WhatsApp servers → Bob's phone in under 100ms — and silently expire if Alice stops typing.

What makes this hard at scale

The typing indicator isn't just a notification — it's a presence event. And presence systems are famously one of the hardest things to scale in distributed systems.

Three constraints working against you

Latency: Users expect to see the indicator appear within ~100ms of their contact starting to type. HTTP polling every 100ms for 2 billion users would destroy any backend.

Ephemerality: Typing state isn't permanent. If Alice closes WhatsApp without sending her message, Bob should stop seeing "typing…" within ~3 seconds — even if no explicit "stopped typing" event fires.

Scale: WhatsApp handles 100M+ concurrent users. Each user might be in a dozen active chats. You can't have one server track everyone's presence — that server would be a single point of failure with petabytes of state.

The system that makes it work

WhatsApp's typing indicator runs on a WebSocket + Redis Pub/Sub foundation. Here's the full flow — click any node to inspect it, or watch the packet animation.

typing-indicator-architecture.flow
Click any node to inspect · Watch packets flow in real-time

How each component works

WebSockets — the persistent pipe

WhatsApp keeps a persistent WebSocket connection between your phone and the nearest edge server. Unlike HTTP, WebSockets maintain an open TCP connection — meaning the server can push events to your device at any time without you asking.

When Alice types, her phone sends a tiny COMPOSING event over this connection. No HTTP request, no headers overhead — just a few bytes over an already-open socket.

Redis Pub/Sub — the broadcast layer

Alice and Bob are almost certainly connected to different gateway servers. So how does Alice's typing event reach Bob's server? Redis Pub/Sub.

Each chat conversation has a Redis channel. When Alice's gateway receives a COMPOSING event, it publishes to that channel. Bob's gateway — subscribed to the same channel — instantly receives it and pushes it to Bob's WebSocket.

TTL — the silent expiry

This is the elegant part. When Alice's gateway publishes the composing event, it also sets a Redis key with a ~3 second TTL: presence:alice:chatId. When Alice sends the message or stops typing, the key is deleted. If she just drops her phone, the key expires automatically — and Bob's client sees the indicator disappear after 3 seconds.

No explicit "stopped typing" event needed. The TTL handles it.

Sharding the presence layer

One Redis instance can't hold presence for 2 billion users. WhatsApp shards by conversation ID — each shard owns a range of chat IDs. The gateway hashes the conversation ID to find the right Redis shard, ensuring no single shard is a bottleneck.

What breaks — and how it recovers

Click a failure to simulate how the system responds:

— Select a failure scenario

Redis shard failure

If a Redis shard goes down, presence events for conversations on that shard stop propagating. WhatsApp's mitigation: typing indicators simply don't appear during the outage. Message delivery is unaffected — messages use a separate durable store (likely a custom fork of Mnesia or Cassandra). Presence is treated as best-effort.

Reconnect storms

When a mobile network drops and millions of phones reconnect simultaneously (after a subway exit, a concert ending), every device tries to re-establish its WebSocket. This creates a thundering herd against the gateway layer. WhatsApp uses exponential backoff with jitter — each client waits a random delay before reconnecting, spreading the load over 30–60 seconds.

Mobile network instability

On cellular, connections drop constantly. The client maintains a local "composing" state machine — if the WebSocket drops mid-typing, it re-sends the COMPOSING event on reconnect. The server-side TTL ensures correctness: worst case, Bob sees "typing…" for an extra 3 seconds.

How the architecture evolves with scale

Drag the slider to see which components get introduced as user count grows:

1K users100K10M100M2B users
Architecture
Single server
Presence store
Single Redis
Gateway layer
1 gateway
Single server handles WebSocket connections directly. Redis runs on the same machine. Fine for a side project.

Why not just use polling?

Approach Latency Server load Battery impact Used for
Short polling High (1–5s) Very high High Old-school notifications
Long polling Medium (0.5–2s) High Medium Legacy chat apps
SSE Low Low Low One-way push (news feeds)
WebSockets ✓ Very low (<50ms) Low Low Chat, gaming, realtime

SSE (Server-Sent Events) is compelling — it's simpler than WebSockets and handles one-way push well. But typing indicators need bidirectional communication: Alice's phone sends composing events AND receives Bob's. SSE is unidirectional — you'd need a hybrid approach. WebSockets win.

How to ace this in a system design interview

What actually gets stored — and what doesn't

The typing indicator is deliberately stateless in the database. Nothing about "Alice is typing" ever touches disk. Here's the full schema picture — click any table to explore its fields and design rationale.

← Click a table to inspect its schema

Key insight: The presence_events store is Redis — not a relational table. It lives entirely in memory with a TTL. If Redis restarts, all presence data is gone. That's intentional — stale presence is worse than no presence.

Redis key structure

-- Presence key (expires in 3s)
SET presence:{conv_id}:{user_id} "composing" EX 3

-- Gateway subscription (per conversation)
SUBSCRIBE chat:{conv_id}

-- Publishing a composing event
PUBLISH chat:{conv_id} '{"type":"composing","from":"alice_id","ts":1700000000}'

-- Paused / message sent → delete key
DEL presence:{conv_id}:{user_id}

The full system at a glance

The HLD shows every major service boundary, how data flows across them, and which components can fail independently. Use the layer toggle to inspect each tier.

Client layer Edge / Gateway Presence layer Storage layer Async / best-effort

Inside the gateway — sequence by sequence

The LLD walks through the exact method calls, state transitions, and message formats when a COMPOSING event travels end to end. Step through each phase:

1 / 6

Every design decision has a cost

The choices WhatsApp made aren't universally correct — they're optimised for their specific constraints. Use the radar chart to compare approaches, then explore each decision below.

WebSocket + Redis SSE + Polling Pure Polling
Next deep dive →
How Slack's Active Status Works