The corpus is the test suite

Day 1 of the project I was ready to start writing opcode handlers. The plan was: read the X protocol spec, pick the opcodes xclock needs, write Swift code that produces the right bytes for each. Standard approach. Read, understand, implement.

Claude pushed back. He said before we wrote a single line of server code, we needed to know exactly what real X clients were sending over the wire. Not what the spec said they might send. What they actually sent, in real sessions, with their actual sequence of requests and replies, against real X servers.

The argument was that the X protocol spec is exhaustive but under-constrained. It tells you what a request can mean. It doesn’t tell you what a client will send. Real X clients have 30 years of corner-case shakedown built into them. They sometimes send requests in orders the spec doesn’t enforce but every X server has historically supported. They sometimes assume server behavior that the spec doesn’t strictly require. If we wrote the server from the spec alone, we’d get the spec right and the real clients wrong, and the failure mode would be subtle.

So Day 1 wasn’t writing the server. It was writing the capture tool. What we didn’t fully realize at the time was that the capture tool was going to grow up into its own product. By the end of the 30 days it had a UI, a .xtap file format, a chrono dump decoder, and enough features that it deserved its own home. That home is macxcapture.com , and the tool that lives there started life on Day 1 as “the thing we’ll use to validate macXserver against captures from real X servers.”

Capture setup at the workbench: a MacBook Pro running the framer and proxy, two Dell monitors showing captured wire log and a live xeyes window, and a rack of Sun pizza box workstations on the right. — The capture setup in the workshop those first few days. MacBook Pro in the middle running the framer plus proxy, the Dell on the left showing the captured wire log (the timestamp on screen is May 5 2026, Day 1), the Dell on the right showing live xeyes pulled through the proxy from one of the Sun workstations on the right. The Sun pizza box stack on the right is the reference X server fleet we capture against. The X11 wire-protocol decoding work is also written up at oldsilicon.com.

claude, walk through why you pushed for the capture tool on Day 1 instead of just starting with the server, and how it shaped everything that came after.

Before I get into the structural argument, let me define one piece of project shorthand. We use “gold” throughout this article and the codebase to mean a capture taken against a known-good real X server, treated as the byte-for-byte truth. In our case, gold captures come from the MIT-derived X11R6 server (and its Sun-modified descendants) running on Todd’s Sun workstations. Those servers ARE the spec for our purposes: 30 years of real X clients have run against them, and any behavior they produce is what those clients expect. We’ve never used XQuartz as a gold source, because XQuartz’s whole job is bridging X to macOS the way macXserver does, and that’s the thing we’re trying to build a different version of. Gold is the original MIT lineage. Our diffs are always against that.

Now the structural argument I made on Day 1 . We were about to implement a server for a protocol that had two authoritative sources: the X protocol spec (600 pages, exhaustive) and the X.org server source (30 years of corner-case shakedown). Both were correct. Neither was the right input for what we needed to do.

The spec tells you what a request can mean. It defines the byte layout, the field semantics, the error cases the server is allowed to emit. It doesn’t tell you what a real client will send, in what order, with what timing, or what it assumes about the server’s response. A server that passes every spec validation but doesn’t match real-client expectations will fail at running real clients, and the failures will be subtle. We saw this play out in Don’t lie on the wire when Motif assumed a spec-defined error path that we’d been silently short-circuiting.

The X.org source tells you what a real server does in practice. But reading the X.org source to learn what an opcode handler should do is a massive time investment per opcode, and it leaves you with Swift code that you wrote from your reading of the X.org C source. You’ve introduced one more layer of interpretation between the reference implementation and your own implementation, which is exactly the bias Lift, don’t intellectualize is about.

What was actually authoritative for the goal we were trying to hit was what a real X client sends to a real X server, captured byte-for-byte at the wire level. If we had that, we’d have:

A list of exactly which opcodes we need to implement to support a given client. (Most clients use only 15-30 opcodes of the 127 core, and we don’t have to implement opcodes nobody sends.)
The exact byte sequence for each opcode the client sends.
The exact byte sequence the gold server replies with.
The order of requests, which sometimes matters more than the spec implies.
The timing relationships between requests, events, and replies.

And once you have a corpus of captures, every opcode you implement has a test: replay the captured client requests against your new server, and diff the responses byte-for-byte against what the real server produced. The corpus becomes the regression test suite.

Day 1 was building the framer. The name matters because of what the tool does and doesn’t do. The X11 wire protocol was designed by MIT to be framable: every message has a length prefix and an opcode byte, so you can pull a complete frame off the wire and know exactly where it ends without understanding what it contains. The framer operates at that level. It identifies frames, captures them, and forwards them to a real server in both directions. It doesn’t need to know what opcode 12 means or what an XChangeProperty request does. It just needs to recognize a frame, copy it, and pass it along. The semantic decode (opcode names, atom values, ICCCM properties, all the human-readable stuff) is a separate layer that runs over the captured bytes after the fact.

This was good design on MIT’s part. A protocol that’s framable is a protocol you can proxy, capture, replay, and inspect with tools that don’t have to be updated every time the protocol grows a new feature or a new extension. The X protocol’s extension mechanism (SHAPE, MIT-SHM, RENDER, XInput, the rest) sits on top of the same framing rules, so the framer captures extension traffic just as cleanly as core protocol traffic, without any per-extension code. The proxy sat between any X client and any X server, identified frames in both directions, copied them to the destination, and saved them to a .xtap file as it went.

Day 2 closed the loop: a replay subcommand that took a .xtap file and replayed the client side against a new server, with options for --realtime (preserve original timing) and --hold (pause between requests). The first round-trip test was: capture an xclock session between a client running on one of Todd’s Sun workstations and the X11R6 server running on the same Sun, then replay that captured client side against the same Sun server, verify the new run’s server replies matched the original session’s server replies byte-for-byte. They did. The framework was sound. Day 2 also produced the first gold corpus entry: a clean xclock session against a clean MIT-lineage X server, ready to be the reference for every macXserver implementation that followed.

The development loop for the rest of the project became:

Pick the next opcode to implement. Usually driven by “what’s the next request the captured session sends that we don’t handle yet.”
Read the spec for that opcode. Find every captured session that uses it.
Write the Swift handler.
Replay every relevant session. Diff the server’s replies against gold.
If the diff is clean across multiple sessions in different contexts, the implementation is probably correct. If it’s not clean, the diff tells you exactly which field of which reply differs from gold, which is usually enough to find the bug.

That last step is the compounding value. A single capture doesn’t prove an opcode is implemented correctly. Five captures using the same opcode in different contexts, all producing byte-identical wire output to gold, is much stronger evidence. The corpus grew over the 30 days from one xclock capture on Day 1 to dozens of captures covering xterm, xcalc, xeyes, oclock, quickplot, dtcalc, dtterm, dthelpview, dtpad, dticon, and others. Each new client added to the corpus tested every previously-implemented opcode against new usage patterns. Bugs that had been latent in our implementation for weeks would suddenly surface when a new client exercised an opcode in a way no earlier client had.

The corpus also gave us a debugging discipline that I’d argue is the project’s most important rule. When a client misbehaved, the first question wasn’t “what did I get wrong in the opcode?” but “does our wire output match the captured gold session?” If the wire matched but the visual behavior didn’t, the bug was in our rendering layer. If the wire didn’t match, the bug was in our protocol layer. Two different code paths, two different fixes. Being able to bisect to one or the other in five minutes instead of two hours was a massive accelerator. We have a memory ledger entry that just says “wire matches gold = real rendering bug”, because we kept catching ourselves about to spend hours debugging the protocol layer for what was clearly a rendering problem.

The cost was real. About a week and a half of work — the capture tool, the framer, the corpus collection, the replay engine — before any server code existed. That’s a lot of time to defer the server. The counter-argument was that if we wrote the server first and tried to build a capture tool later, we’d have a server that worked against the clients we’d tested manually, with no way to know whether it worked against any others, and no way to refactor safely without breaking untested cases. The infrastructure cost was a one-time investment. The alternative cost (validating every change manually, against every client, every time) would have compounded for the rest of the project.

Sometimes the right tool to build first is the tool that lets you validate everything you build next.

The capture tool also outgrew its original scope. It started as internal infrastructure for macXserver development. By Week 4 it had its own SwiftUI viewer that decoded the wire-level protocol into a human-readable chrono dump with named opcodes, resolved atom values, typed ICCCM property contents, and inline narrative landmarks. By the end of the project it deserved to be its own product.

macXserver and macXcapture share a single git repository but ship as two distinct products. macXserver has the framer and capture machinery built in, so if you’re running an X client against macXserver and want to capture the wire traffic, you don’t need to spin up macXcapture separately. The server records its own sessions to .xtap files either automatically or on demand, depending on the Preferences setting. That’s the session capture and replay feature on this site.

macXcapture is the standalone tool for everything else. If you want to capture an X session between two computers that don’t involve macXserver — say, between a Linux box running an X client and a Sun workstation running the X server — you run macXcapture as a proxy between them. Same framer, same chrono dump decoder, same replay engine, same diff-against-gold loop as what’s inside macXserver, packaged as its own product because the use case is legitimately separate. Anyone debugging X11 implementations or comparing X server behaviors against each other wants a wire-level X protocol tool, not a Mac-side X server. That tool is macXcapture, at macxcapture.com .

The reason we could split them out cleanly is that we’d been using both ourselves for the whole 30 days. The dogfooding wasn’t a marketing strategy. It was how the project was built. Every feature macXcapture has — the framer, the proxy, the .xtap format, the chrono dump decoder, the replay engine, the diff-against-gold loop — existed because macXserver development needed it. We just kept what we built.