Diving into WebSockets with Go: A Journey of Learning and Rabbit Holes
Introduction
I wanted to learn a new language to add one to my toolbox and build my street cred; something that big boy developers use 👀 (it's not rust or C, but it's a step in the right direction...). This would be my first compiled language, and it had the promise of:
- speed ✅,
- productivity ✅
- concurrency and parallelism are core to the language ✅.
- cute mascot ✅.
I took the course, learned the syntax, wrote a few toy programs along the way, but I knew that unless I built something "real" otherwise, I wouldn’t retain much.
Then I had the bright idea that I also don't know anything about websockets. Of course, I used them extensively before but mostly ascribed their inner workings to magic 🪄.
They power real-time applications, but I had never looked under the hood to understand how they actually work. What better way to learn about it than building one as a first project with a language I barely know!
What followed was a roller-coaster of operating on bits, shoddy error handling, and AI-assisted learning.
Learning with AI: No Dumb Questions
One of the invaluable parts of this journey was having AI as my teacher. I could ask it questions without feeling dumb, get instant explanations, and have concepts broken down in different ways until they finally clicked.
Here's a couple of questions I asked besides the tsunami of questions I had about the WebSocket protocol:
What is a big-endian?
It's a way of storing or transmitting data. Big-endian means the most significant byte comes first. Little-endian means the least significant comes first. For example:Big-endian | Little-endian |
---|---|
00010010 00110100, 01010110 01111000 | 01111000 01010110, 00110100 00010010 |
They both represent the same 16-bit number, but the order of the bytes is different.
In practice, it meant I had to rethink how I read multibyte values when parsing WebSocket frames.
What is a XOR binary operation?
If you have two bits, the XOR operation will return 1 if the bits are different and 0 if they are the same.bit1 | bit2 | XOR-ed bit |
---|---|---|
1 | 1 | 0 |
0 | 0 | 0 |
1 | 0 | 1 |
0 | 1 | 1 |
This means the original bits can be restored by applying XOR again.
What are these 0x
, 0b
prefixes?
Hexadecimal (base 16) and binary (base 2) prefixes. 0x
indicates a hexadecimal number, while 0b
indicates a binary number.
For example, 0xFF
is 255 in decimal, and 0b11111111
is also 255 in decimal.
In practice, it meant I had to rethink how I read multibyte values when parsing WebSocket frames.
Bits, Bytes, and my barely functioning Brain
Since I never had formal computer science training, working with raw bits didn’t come naturally. WebSockets, unfortunately, require a lot of bitwise operations.
Some of the tricky concepts I ran into:
XOR Masking
WebSocket payloads from clients are XOR-masked with a key. Something WebSockets require:
- to avoid misinterpretation by intermediaries and mistaking it for HTTP traffic
- for obfuscation, mitigating cross-protocol attacks
Bitwise AND (&) and OR (|)
Used for (frame parsing).
Just like XOR it's used to combine bits, but there are some creative uses. Take for example: byte2 & 0x80 != 0
.
This combines our byte with the hex 0x80
(which is 10000000
in binary) to check if the most significant bit is set.
Note the number we're combining with; that is how e check which bit we're interested in.
Hexadecimal Conversions – Constantly switching between hex, decimal, and binary.
I had to look up conversions more times than I care to admit. But I did learn some things in the process, such as going from 16-bit to 8-bit means you lose the most significant bits. While this seems self-evident now, I would not have assumed this when I started.
Learning Go
I had little difficulty picking up the syntax given that I have experience in other languages; however, there are some parts that are still not sat well in my mind:
- arrays are passed by value, while slices are passed by reference
- structs are passed by value but if you passed around a pointer to the struct, that would be passed by reference
I also found it a little odd how interfaces are implemented implicitly, so I don't know right away by looking at my value. But thankfully, with modern IDEs, this is a non-issue. Another thing I must admit, is that I definitely need to do more with channels because I feel there are possible foot guns there I am still yet to discover.
Here's a further tidbit I came across: Go doesn't have enums. Coming from languages where there are some enums or resemblance of enums, I was surprised. No matter! It's not something I reach for often, and the following was a simple stand in for this working just fine.
type Opcode = int
const (
Continuation = 0x0
Text = 0x1
Binary = 0x2
Close = 0x8
Ping = 0x9
Pong = 0xA
)
Building the WebSocket Server in Go
Given that this was a learning exercise, I only wanted to use the standard library. And I must admit, I am warming up
to it. I was surprised by how much functionality is built-in and how powerful it is. For example, the net/http
package
handles requests in separate goroutines, Which is an instant win for performance.
I will not go into the nitty-gritty of the implementation as you can see it for yourself, but I will highlight some of the more interesting parts. My project isn't the full WebSocket protocol as defined in RFC 6455 but rather the MVP.
The Handshake
The first thing that happens when you would like to establish a websocket connection is a handshake. This is a
run-of-the-mill HTTP request with some headers indicating you want to upgrade to a WebSocket connection. There is a
complication here in the form of the Sec-WebSocket-Key
header. It serves multiple purposes: prevent caching, tracking
connections and ensures the server actually understands the assignment. It is also to prevent cross-protocol attacks(?),
but I'm not going down that rabbit hole. This is generated by the client, and the server must
hash it along with a pre-specified "magic string" before sending it back. If all goes well, the server sends back a
rather succinct response of the following:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
This is where the crux of the logic comes in. So far we had nice *http.Request
to work with, but now we will want to
read straight from the connection so we have to 🏴☠️ hijack the http.ResponseWriter
to get to the connection this was
abstracting away
conn, _, err := w.(http.Hijacker).Hijack()
Reading the Connection
Next, we'll break out into a goroutine to handle the connection. (💥 blazingly fast! ) In short, we'll continuously
read the connection and handle the result. However, the devil is in the bits details and reading the connection is a
more involved process. This data we're getting is called a "frame", and it has a specific structure.
First, we read from the connection
// read only the first two bytes
header := make([]byte, 2)
_, err = io.ReadFull(conn, header)
And this will also move the pointer on the connection forwards, so when we next read it, we won't ready the same bytes (handy!). To read all these bits from these two bytes, we have to do a bit of bitwise gymnastics to get their values.
The first bit of the first byte determines if this is the final fragment (used with fragmented frames, we're not worried about those in this project). Then we have three (RSV) bits which are reserved for extensions/future use. The next four bits are mark the opcode (what kind of frame this is). Simple! We're already done with a byte!
Moving on, the first bit of the second byte is special in a similar fashion to the first. If 1
then the payload is
masked (more on that later). The remaining seven bits are the payload length.
Payload Length
This is a bit of a "nightmare" to read because the value payload length value has different meanings.
- If the value is less than 126, it is the true size of the payload in bytes.
- If the value is 126, the next two bytes contain the size of the payload.
- If the value is 127, the next eight bytes are the size of the payload.
Remember the big-endian from the Learning with AI: No Dumb Questions section? If it's either of these special cases, to get the true payload size, we have to combine the first by byte value with the next two or eight bytes.
// loop over the byte slice, multiple each byte by the position from the right most byte and add it to the length as an integer
for i := 0; i < 8; i++ {
length |= int(payloadLength[i]) << (56 - i*8)
}
To clarify, first, we'll have the most significant byte's value, and we'll shift <<
it left by 56 bits (7 bytes) this
creates seven sets of 0000 0000
bits creating a 64-bit number. The |=
operator is a bitwise OR to combine the binary
values of our integers. Same in the rest of the iterations except we shift left by eight bits less each time.
The Reveal
I promise we're close to reading our payload of "Hello, World!"
.
If our payload is masked, which it will be from a browser, we have to unmask it.
what if it's from a server? 🚨 rabbit hole alert 🚨
The masking key is the next four bytes after the payload size. With this we can unmask the payload by XOR-ing it with the masking key.
// loop over the bytes
for i := range payload {
// XOR the payload:
// at the index we combine the payload with the mask key at the corresponding index
// modulo ensures that the mask key is repeated
payload[i] ^= maskKey[i%4]
}
To recap, we apply the XOR operation to each byte with a rotating mask key (first byte with first mask, second with second, and so on). The modulo operator ensures that we cycle through the mask key as we read the payload.
Congratulations! You've successfully read a WebSocket frame. We now know it is a final frame, the type of frame, and we have the content. We can now handle it to our heart's content.
Sending a Message
This, I won't cover as this is the reverse of reading a message except for masking. The server is not expected to mask the payload. If you're still interested, you can check out the code at the link in the Final Notes section.
The Rabbit Holes
Here's a couple of topics that I followed up on until I had some understanding of them.
Malicious Fragmented Frames: The first couple bytes of a websocket frame also specifies the payload size. The maximum payload size is 2^63 bytes. Around ~18.3 exabytes (exabyte > petabyte > terabyte > gigabyte). What if a malicious client was capable of sending such an amount or more? Surely some infrastructure would fail/flag you or perhaps the server would fail to allocate enough memory. Turns out much more clever people have worked on such things than I am, and the OS handles this quite gracefully. When the client sends the data, the OS reads it into a buffer (a space managed in kernel, not accessible by applications) from the TCP stream. This buffer is managed by the OS using an ACK (acknowledge) TCP packet which has a header called window size, telling the client how much data it can send. Presumably, however, much space is left in the buffer. If the data sent isn't the specified size, the OS may just drop the packets or straight up close the connection. I'm certain there's more to this than my surface level knowledge (always is), but you can read more about the Transport Layer of the OSI Model if you're interested.
Pulling on Go's CGoroutine thread: One of the big draws of Go is its concurrency achieved with the
developer-friendly goroutines. To understand goroutines, it's important to know how the OS handles the workload.
OS scheduler manages software threads on physical threads on physical cores in the CPU. A thread can be in one of three states:
- runnable (ready to go, need some CPU time assigned),
- waiting (waiting on network, IO-bound tasks, etc.),
- or executing (the sweet spot).
When the OS scheduler assigns a thread to a core, it will run until it is either blocked or the time assigned to it is up. This is a rather complex algorithm, and it's called preemptive scheduling. Meaning we can't know when a thread is going to be interrupted (there are many factors here that are non-deterministic like events, networks, etc. and all this compounded by thread priorities).
In some ways, Go's scheduler mirrors the OS scheduler. It looks like a preemptive scheduler, and acts like a preemptive scheduler, but it isn't (fully) one. It's a cooperative scheduler, however, it will also use preemption on CPU-bound tasks after 10ms (read, hogging the CPU). This means that the go scheduler will switch context when the goroutine is blocked or when it yields (safe-points). But for all intents and purposes from a developer's perspective, think of it as a preemptive scheduler. To keep track of tasks, go has two type queues for scheduling goroutines waiting both being FIFO queues. A global run queue (GRQ) and a local run queue (LRQ). The global queue is shared between all the threads, while the local queue is specific to the thread. Go uses three parts to manage the scheduling of tasks:
- G (Goroutine): Your actual task with its own stack and state (states just like the OS thread)
- P (Logical Processor): Acts as a virtual CPU with its own local run queue
- M (Machine): The actual OS thread that executes code
When you call go myFunc()
, Go creates a G, puts it in a run queue (either global or local), and then one of the
Ms will eventually pick it up to execute it. What's fascinating is that if a G blocks
(like waiting on I/O or a channel), the M can detach from it, grab another P, and continue executing other
Gs from the local queue. When the blocked G is ready again, it gets placed back in a run queue.
This elegant dance between Gs, Ms, and Ps is why Go can handle thousands of goroutines with just a few OS threads.
A Couple of other interesting things I learned while I was falling down this hole:
- the hex numbers at the end of the stack trace are the instruction pointer's offset to the next instruction
- I'm definitely out of my depth here
- OS threads using the same cache can lead to false sharing, which is when two threads are working on the same cache line (jargon for bits of memory), causing them to invalidate each other's cache leading to performance issues.
Some further reading:
Final Notes
All in all, I found this to be an invaluable exercise and I think I have a base understanding of WebSockets now. As for Go, I think I will be using it more in the future and continue to challenge myself with it. This also gives me a newfound appreciation for how much work goes into open source projects.
Here are some of the things you should take away from this:
- If you're in a similar position as I am/I was, I suggest you check out the project on GitHub. It's well commented and has more substance than this post.
- Most of the time you're better off using well-established libraries instead of hand-rolling your own. Don't be that person.
- Without AI, I would have spent hours wading through dense RFC documents. Instead, I got fast, to-the-point explanations and moved on. Given this isn't a novel subject, I see no issue using AI to help you learn.