In a previous article, I talked about how the WebSocket protocol works and how simple it is. But since no good thing can last forever, some people decided to complicate things a bit.
To be fair, this isn't a part of the base WebSocket protocol, but rather an extension to the protocol. The most
common extension by far is permessage-deflate
compression.
Let's quickly revisit how the WebSocket protocol works before looking at extensions. If you just want to click around, there's a small demo at the bottom.
WebSocket communication is structured as messages exchanged between a client and server. Each message is made up of one or more frames. Each frame has a one-byte header, information around the length of the payload, and the payload itself. Here's a simple WebSocket message that says "Hello":
Let's say you're sending JSON messages over WebSockets. Each message follows a similar structure, which makes them highly compressible.
As an example, imagine synchronizing the state of two grids using WebSocket messages. Pick a color and click on a cell in either grid to fill it in. Below the grids you'll see the messages that might be sent/received on each side to achieve synchronization.
That's a lot of repeated bytes each time: the "row", "col", and "color" strings are transmitted every message. Depending on how compressible the data is, the reduced payload sizes may be worth the compression/decompression overhead.
permessage-deflate
compression#
Let's start by looking at a WebSocket handshake since that's where compression is negotiated as a part of the
Sec-WebSocket-Extensions
header:
The client and server agree on using the permessage-deflate
extension (it's the only one with a
published spec for now). The compression algorithm used by permessage-deflate
is called DEFLATE.
The details of how it works aren't important, but a couple of parameters are relevant from the handshake:
This means a larger window could allow for better compression if repeated patterns span a larger distance.
Since WebSockets are bidirectional, both these parameters have client and server variants. That gives us what we actually see in the handshake: client_max_window_bits, client_no_context_takeover, server_max_window_bits, and server_no_context_takeover.
When compressing, DEFLATE consumes the input one byte at a time and keeps track of the last b bytes read. If the next l bytes in the stream are a substring1 of this sliding window, DEFLATE will encode it as the pair (l, d), where d is the distance from the cursor to the start of the match in the sliding window.
Here's a demo showing this in action. Use the ← and → buttons to see how the input is encoded, one step at a time. At each step you can see the sliding window, what's currently being read, and a match if there is one.
Larger matches are found as the sliding window gets bigger. For example, with a sliding window of size 9, there's a match of length 6 at the end. At shorter SW lengths, only matches of length 3 are present.
For WebSockets, the max_window_bits parameters MUST lie between 8 and 15 (inclusive) so the actual sliding windows can range from 28 to 215 bytes long.
After establishing these parameters, we can finally send a compressed message. Let's send "hahahaha" from the server to the client.
The RSV1 bit is set in the header, which indicates that compression is enabled for this message. WebSocket
libaries can choose whether to enable compression on a per-message basis (it's called
permessage-deflate
after all).
As for the payload, the spec says to append 00 00 ff ff (a sort of delimiter) and decompress using DEFLATE. This is a little involved so you'll have to trust me when I say that doing this gives back the string "hahahaha".
This simulates sending 5 WebSocket messages over the wire, with the sliding window reused across messages. Hover or tap on a message to see its plaintext.
Play around with the data distribution, message length and sliding window size to see the effects they have on compression.
I know I didn't explain everything about WebSocket compression, but I think this is a good place to stop before I lose your attention :)
If you're interested in seeing some code, I wrote a WebSocket parser for Subtrace and added support for
permessage-deflate
compression here.