WebSocket compression explained (sort of)

In a previous article, I talked about how the WebSocket protocol works and how simple it is. But since no good thing can last forever, some people decided to complicate things a bit.

To be fair, this isn't a part of the base WebSocket protocol, but rather an extension to the protocol. The most common extension by far is permessage-deflate compression.

Let's quickly revisit how the WebSocket protocol works before looking at extensions. If you just want to click around, there's a small demo at the bottom.

Recap#

WebSocket communication is structured as messages exchanged between a client and server. Each message is made up of one or more frames. Each frame has a one-byte header, information around the length of the payload, and the payload itself. Here's a simple WebSocket message that says "Hello":

81
header
05
len
48 65 6c 6c 6f
payload ("Hello")

Why bother compressing?#

Let's say you're sending JSON messages over WebSockets. Each message follows a similar structure, which makes them highly compressible.

As an example, imagine synchronizing the state of two grids using WebSocket messages. Pick a color and click on a cell in either grid to fill it in. Below the grids you'll see the messages that might be sent/received on each side to achieve synchronization.

That's a lot of repeated bytes each time: the "row", "col", and "color" strings are transmitted every message. Depending on how compressible the data is, the reduced payload sizes may be worth the compression/decompression overhead.

permessage-deflate compression#

Let's start by looking at a WebSocket handshake since that's where compression is negotiated as a part of the Sec-WebSocket-Extensions header:

Request
GET /websocket HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: U8/Zkk/B5hjr+jx2lXMcCQ==
Sec-WebSocket-Version: 13
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits=15; client_no_context_takeover
Response
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: 0W3rkxpf+uZNg3rYkDWr6Jj8/gU=
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits=15; server_max_window_bits=15; client_no_context_takeover; server_no_context_takeover

The client and server agree on using the permessage-deflate extension (it's the only one with a published spec for now). The compression algorithm used by permessage-deflate is called DEFLATE. The details of how it works aren't important, but a couple of parameters are relevant from the handshake:

Since WebSockets are bidirectional, both these parameters have client and server variants. That gives us what we actually see in the handshake: client_max_window_bits, client_no_context_takeover, server_max_window_bits, and server_no_context_takeover.

How is the sliding window used?

When compressing, DEFLATE consumes the input one byte at a time and keeps track of the last b bytes read. If the next l bytes in the stream are a substring1 of this sliding window, DEFLATE will encode it as the pair (l, d), where d is the distance from the cursor to the start of the match in the sliding window.

Here's a demo showing this in action. Use the and buttons to see how the input is encoded, one step at a time. At each step you can see the sliding window, what's currently being read, and a match if there is one.

Larger matches are found as the sliding window gets bigger. For example, with a sliding window of size 9, there's a match of length 6 at the end. At shorter SW lengths, only matches of length 3 are present.

For WebSockets, the max_window_bits parameters MUST lie between 8 and 15 (inclusive) so the actual sliding windows can range from 28 to 215 bytes long.

1Technically this isn't always true, and the match could be longer than the sliding window! With a SW of size 11, the string
Developers,developers,developers!
can be encoded as
D e v e l o p e r s , d (l=20,d=11) !
The idea is that generating the match allows you to reference it as you go, which to me feels a bit like doing this.

After establishing these parameters, we can finally send a compressed message. Let's send "hahahaha" from the server to the client.

Over the wire#

Here's what that looks like as a WebSocket frame. If this doesn't make sense, I explain the structure of WebSocket frames here.
c1
header
1
FIN
1
RSV1
0
RSV2
0
RSV3
0001
opcode
07
len
ca 48 cc 00 43 00 00
compressed payload
("hahahaha")

The RSV1 bit is set in the header, which indicates that compression is enabled for this message. WebSocket libaries can choose whether to enable compression on a per-message basis (it's called permessage-deflate after all).

As for the payload, the spec says to append 00 00 ff ff (a sort of delimiter) and decompress using DEFLATE. This is a little involved so you'll have to trust me when I say that doing this gives back the string "hahahaha".

The part that's interactive#

This simulates sending 5 WebSocket messages over the wire, with the sliding window reused across messages. Hover or tap on a message to see its plaintext.

Play around with the data distribution, message length and sliding window size to see the effects they have on compression.

I know I didn't explain everything about WebSocket compression, but I think this is a good place to stop before I lose your attention :)

If you're interested in seeing some code, I wrote a WebSocket parser for Subtrace and added support for permessage-deflate compression here.