If you’re writing a rootless X server on a Mac, your server is also acting as the window manager. Kind of, anyway. You get part of the window-manager behavior from the Mac’s native window manager and the rest you have to implement yourself.
By Day 15 the server was running real X clients across xterm, xcalc, quickplot, oclock, xeyes, and the CDE dt-apps. Most things worked. One class of bug refused to die: Motif menu clicks would silently drop.
Menus in general are windows themselves, something you might not think about initially. When you click a menu title, a new top-level window pops up with no title bar, just a region of the screen showing the menu items. This means you have to report coordinates back to the client in the root coordinate system, so it knows where to position the pulldown. We had a day or so of menus not popping up under the menu title but rather all over the screen.
claude, explain to me how menus work?
Yeah, menus in X are weirder than they look at first. Let me walk through how it actually works, because the structure is the answer to why we kept having this bug.
X menus aren’t a single drawing on the top-level window. Each menu is its own
X window. When a user clicks “File” in the menu bar, the application creates
a new top-level X window that becomes the pulldown: no title bar, no border,
just a region of the screen with the menu items drawn in it. From the X
server’s perspective, it’s just another CreateWindow followed by
MapWindow.
The interesting part is where that menu window ends up. The application has to position it directly under the menu-bar text, which means computing a root-coordinate position: the menu-bar item’s screen position plus an offset down. To do that, the application needs to know its own top-level’s root coordinates, the position of the top-level relative to the root window of the display.
Toolkits like Motif’s Xt cache that root-coordinate value. They have to:
every time a widget is realized, Xt records its root-relative position so
that when it needs to compute things like menu positions, it doesn’t have to
round-trip a TranslateCoordinates request to the server every time.
The cache stays valid until the toolkit gets a ConfigureNotify event for
the top-level. But there’s a protocol detail you need to know: there are TWO
kinds of ConfigureNotify. The “real” event the server emits when the
geometry actually changes, and a “synthetic” version with the send_event
bit set to 1. The toolkit only invalidates its root-coord cache on the
synthetic event.
Why two kinds? For a reparented top-level (a window with a window-manager
frame around it), the real ConfigureNotify reports the window’s position
relative to its parent. And the parent is the WM frame, not the root. The
application can’t use that to compute a menu position because it’s relative
to the frame, not the screen. The ICCCM solves the ambiguity by reserving
the synthetic event for “this is where you are in root coordinates,
definitively.” That’s spelled out in section 4.1.5.
For a rootless X server on a Mac, macXserver is the window manager from
the client’s perspective. AppKit owns the window placement and the frame,
but the X client doesn’t know that. The server has to pretend to be the WM
and emit synthetic ConfigureNotify events after every position change.
Otherwise Motif’s coordinate cache stays stale, every menu computation lands
at “where the top-level used to be,” and the pulldown pops up in the wrong
place. Which is exactly what you were seeing.
The fix is to emit a synthetic event after every code path that moves the window:
- After
MapNotifyon a top-level. The initial position needs to be reported. - After the user drags the window. AppKit’s window-move callback.
- After a programmatic
ConfigureWindowfrom the client. - After AppKit’s zoom or unminimize transitions, which move the window without going through the user-drag path.
I missed two of these on the first pass and the menu clicks worked sometimes but not after a window move. Took an hour to find the missing emit sites.
This was a multi-week mystery whose cause was one protocol line. The symptom was broad (“Motif is flaky”) and the cause was narrow (one missing event with one bit set). The lesson I want to remember: when a 30-year-old toolkit looks like it’s doing something weird, the toolkit is doing exactly what the spec says, and the bug is that I’m not.