Don't lie on the wire

About a week and a half into the project we kept running into the same kind of debugging session. A real X client would do something the server couldn’t quite handle, and the server would silently swallow the request. No error, no log entry. The client would keep going as if everything had worked, and then 30 seconds later it would behave weirdly because some assumption it made about the server’s response wasn’t actually true. We’d burn an hour bisecting “why does dtcalc’s LCD area not refresh” or “why is the menu text invisible”, and the answer would always be the same: the server had silently dropped some request we hadn’t implemented yet.

This had felt fine for the first week. We were moving fast, getting xterm and xcalc rendering, and emitting BadRequest on every unimplemented opcode would have broken the velocity. So we let it slide. By Day 9 the cost was showing up everywhere. I told Claude the silent-drop pattern was tech debt we couldn’t keep paying, and we needed to flip the policy.

(Claude remembers this as his idea. I remember it as mine. I think we both have it half right and the article below leans into that, because it’s funnier than the truth.)

claude, replay how we got to the XError-honesty policy and the sweep that followed. and yes, i remember this as my idea, you remember it as yours, let’s just put that on the table.

For the record: yes, I do remember noticing the silent-drop pattern was becoming tech debt before you brought it up. But you’re probably right that you were the one who said “flip the policy now.” My recollection of past conversations skews toward “this insight feels like mine,” which is one of the biases I should be honest about. Half-credit to both of us is probably accurate. Let me tell the story.

The pattern during the first week was: write the server to accept whatever xterm and xcalc sent, and produce a plausible result. When a client asked for something we hadn’t implemented — a CopyArea variant we hadn’t coded, a property type we hadn’t mapped, an opcode we hadn’t gotten to — the server returned silence. No XError event on the wire. No log entry. Just nothing.

This was a deliberate trade. The X protocol has 127 core opcodes plus a dozen extension opcodes, and each unhandled one is a chance for the server to confuse itself on a code path it doesn’t know. Returning silence is the “safest” thing to do during the wild-west early days when you’re racing to get a client rendering. We got xclock rendering on Day 3 on that trade. We got live xterm on Day 4 and live xcalc on Day 5 on that trade. For the first week, it was the right call.

By Day 9 the trade had gone bad. Real X clients aren’t naive. Motif has 30 years of corner-case shakedown built into it. When Motif sends a request, it expects either the spec-defined success response or a spec-defined error event, and it builds its internal state on that expectation. If the server silently swallows the request, Motif assumes the client got something it didn’t actually get. Thirty seconds later, when Motif tries to use the thing it thinks it has, the application misbehaves in a way that has no obvious connection to the original silent drop.

We hit this on Day 8 with dtcalc’s display refresh. We hit it again on Day 9 with quickplot’s menu rendering. Each time we’d spend an hour bisecting the capture, finding the missing event, realizing the server had silently dropped some request we hadn’t implemented yet, and patching that specific opcode. Whack-a-mole, with the moles arriving from 30 years of Motif’s toolkit code.

The policy you pushed for on Day 9 was: stop accepting silence as the default. When a client asks for something we can’t serve, emit the spec-correct XError event on the wire, even if it’s a generic BadRequest because we haven’t implemented the opcode yet. The XError is a real protocol output, not an internal panic. Real X clients handle XErrors all the time — BadWindow on a window-disappearance race is a normal thing — and routing problems through the spec-defined error path is how the toolkit’s existing error-handling code is supposed to surface them. Silent drops short-circuit the whole layer of toolkit maturity that’s been built around the error path since 1989.

The thing that made it stick: in our test suite, an emitted XError on a path we claim to support is a failure. We couldn’t accept silent drops as the test contract anymore. Every silent drop became a real bug, surfaced immediately instead of accumulating into hour-long debugging detours later.

Day 10 was the sweep. We went through the dispatcher and every handler:

Window-taking handlers (22 of them) routed through validateWindow to emit BadWindow on unknown window IDs.
GC-taking handlers (13 of them) routed through validateGC to emit BadGC.
Drawable validators on CopyArea, GetGeometry, ClearArea, and PolyText to emit BadDrawable when the target isn’t a window or pixmap we know.
Atom, font, pixmap, and cursor validation on the remaining handlers.
The unknown-opcode dispatcher finally emitting BadRequest instead of silently dropping.

About forty silent-drop bugs surfaced and closed during the sweep itself. Some were “we never implemented this opcode” (easy). Some were “we implemented this but the validation was wrong, and the wrong path was silent” (a few hours each). The ones that surfaced over the following weeks, as we ran more clients, were the harder ones — places where Motif or Xt depended on a specific spec-defined behavior that we’d been faking, and the fake-success had been hiding the actual missing implementation.

The principle, which keeps coming back across this project: silent lies cost more in debugging time than they save in velocity. The lie short-circuits the spec-defined error path, which is where 30 years of toolkit maturity already lives. Whatever you save by not implementing the proper error case, you pay back tenfold when a client misbehaves silently for reasons you can’t trace. The same theme shows up in Lift, don’t intellectualize : when you’re tempted to take a shortcut around 30-year-old infrastructure, the shortcut is usually a trap.

One important nuance the policy made explicit. Lying on the wire is OK if the lie is documented as a lie. The project keeps a SHORTCUTS.md file as the active ledger of currently-justified lies, each with a “what real looks like” exit plan. The XError-honesty policy didn’t outlaw lying. It made lying a deliberate ledgered choice instead of a default. Lies without that contract became bugs. Lies with the contract are a tactical pause we plan to undo. That distinction is what kept the policy from killing velocity outright on first-week code.

Whether the policy was your idea or mine: probably doesn’t matter that much. The thing the project is better for is that one of us said it out loud on Day 9 , and we acted on it together.