Two window trees, one NSView

Around Day 8 or 9 of the project we hit a decision point that I didn’t realize at the time was going to inform basically everything we built afterward. The X protocol treats every visible thing as a window. Top-level windows are what the user thinks of as an application window. Underneath each one, the application creates a tree of child X windows: every Motif button, every text-input field, every menu item, every pulldown is its own X window with its own ID, its own coordinate space relative to its parent, its own input event mask, its own visibility state. Quickplot maps a top-level and creates 50 or 100 X windows underneath it. Each one has to behave like a real, addressable, drawable X window.

The question was how to model all of that inside AppKit. XQuartz’s approach, which we could read in the xpr source, is to make each X window its own NSView nested under the top-level. So a Motif dialog with 30 widgets gets 30 NSViews in a parallel hierarchy. AppKit handles clipping, AppKit routes mouse events, AppKit does a lot of the heavy lifting that the X server would otherwise have to do itself. Clean mapping.

We didn’t go that way. Every X window inside the top-level lives entirely inside the top-level’s NSView as a virtual surface, with its own clipList computed by our own region algebra. There’s exactly one NSView per top-level, regardless of how many X windows the client creates inside it.

claude, lay out why we went with one NSView per top-level instead of one NSView per X window like XQuartz does.

This is one of those architectural choices that looks neutral when you make it and then quietly shapes everything you build on top. Let me walk through the trade-off.

The X clipping model is specific. Every X window has a clipList, which is the region of its content that’s actually visible after accounting for ancestors clipping it, descendants opaquely covering parts of it, siblings drawn above it, and the window’s own bounds. The X protocol defines exactly how clipList updates when a window moves, resizes, gets mapped, gets unmapped, gets a child mapped on top of it, or gets a sibling raised above it. It also defines exactly how Expose events get emitted when regions change. Real X clients depend on those rules. Motif’s widget realization order, Xt’s geometry-management cascade, the XSetWMHints size-hints flow in Xlib: all of them assume the X server is producing clipList-derived behavior to spec.

AppKit’s NSView clipping is related but distinct. AppKit clips a view’s drawing to its bounds. It clips child views to their parent’s bounds. It clips sibling views based on Z-order. Each of those rules is similar to the X rule it parallels, but the edges are different. Sibling-overlap visibility in X is computed bottom-up from the X window tree; AppKit’s sibling visibility is computed from the view ordering in NSView.subviews. The order of setFrame: calls during a resize cascade doesn’t match the order of X ConfigureNotify emission. The relationship between window border and content area is different. The rule for what counts as “visible” when a child is partially covered by a sibling, and how that interacts with drawRect:, isn’t quite what X says it should be.

XQuartz handles those differences in xpr by adding translation code on top of the NSView mapping. The X server pushes a ConfigureNotify into AppKit, AppKit emits a viewDidMoveToWindow or setFrame: callback, XQuartz catches it and reconciles with what X expected to happen. There’s a lot of “what X said should be visible vs. what AppKit computed to be visible” reconciliation code. It works, but it’s a layer.

macXserver’s choice is to skip the layer. We don’t try to map X clipping onto AppKit’s view clipping. We keep AppKit’s hands off everything below the top-level. Inside that NSView, the X server runs its own region algebra: a Region.swift type ported from miregion.c (see Lift, don’t intellectualize ), a ClipListEngine that walks the X window tree and computes each window’s clipList from ancestor / sibling / descendant geometry, and a delta cascade that emits Expose events according to the X protocol’s rules. AppKit sees one big view. We draw into it with our own clip math.

The cost is real, and I want to be honest about it. We’ve written:

The region algebra (Region.swift, the miregion port).
The clipList computation walking the X window tree.
The resize cascade logic, including the delta-cascade rule we eventually shipped on Day 21 .
The per-window background paint logic, including the ParentRelative ancestor-chain walk.
The sibling delta cascade (when a sibling’s clipList grows because another sibling got unmapped or moved).
The borderClip logic and the paint-parent-bg-over-uncovered-region path.

All of that AppKit gives you for free if you go the XQuartz route. We don’t get any of it for free. We have it because we wrote it.

What we win is that the X semantics are pure. When a Motif widget expects its clipList to update in a particular order during a resize, we match the spec. When an Xt widget expects an Expose event on a specific sub-region after an ancestor’s unmap, we emit it. When a 1992-era X11R6 program expects pixel-exact draw-to-erase semantics inside its window, we provide them. The bugs we ship come from our region code being wrong about an X rule (and we fix them by going back to miregion.c), not from AppKit and X disagreeing about what should happen.

There’s also a related win on the SHAPE side that we hit later (see Smooth at device scale ). When we apply a shape mask to a top-level, we apply it to the one NSWindow’s compositing layer. The descendant X windows inside don’t have their own AppKit clipping to fight with; the top-level’s mask defines the outline and that’s it. If each X descendant had its own NSView, each one would need a mask synced to the top-level’s shape, which sounds simple until you realize Motif’s widget realization order means most descendants exist before the SHAPE request even arrives.

The general pattern, which keeps coming back across the project: when two layers (the X protocol and AppKit) have similar but distinct semantics for the same concept, it’s usually cleaner to keep them separated and translate at one well-defined boundary than to try to map one onto the other and patch the seams. The boundary in this case is the top-level NSView. Above it, AppKit owns the window: drag, resize, Mission Control, Cmd-Tab, all the macOS features that make X clients feel like first-class Mac windows. Below it, the X server owns the regions: every clipList computation, every Expose event, every draw-to-erase guarantee. Neither layer is fighting the other because neither layer is inside the other.

If we did the project again I’d still make this call. The only thing I’d change is to port miregion.c from Day 1 instead of trying to derive the region algebra from the spec until Day 9 .