daemon-0.2.2: trace flag, XDP mode + structured attach log, distinct link-add codes#21
Conversation
Daemon-side diagnostic surface for the t1-integrity failure on the 0.2.1 cloud-gcp-c4 benchmark. The brief is to ship diagnostics first, fix second — until we can see what diverges, any "fix" is a guess. What the daemon does now: * New `--trace-forward-hashes` flag (off by default). When enabled, every wg-relay forward logs a SHA-256(payload) hash + frame length + source/destination peer pubkey prefix at two points: ingress (after the source peer matches and after optional MAC1 verification) and egress (just before the sendto to the destination). The same SHA on both lines proves the relay didn't mutate the frame; divergence pinpoints a corrupting code path. Per-frame log on the hot path — start warns the operator and the docs note it must not be left on in production. * `wg peer list` now surfaces per-peer drop counters `peer.<i>.drop_no_link` and `peer.<i>.drop_pubkey_mismatch`. These are the pair-attributable subset of the existing aggregate counters, populated at the same drop sites where `src_peer` is already known. Aggregate counters are unchanged; the other drop classes (`drop_unknown_src`, `drop_not_wg_shaped`, `drop_handshake_no_pubkey_match`) are by definition unattributable to a known peer and stay aggregate-only. What the operator should observe on next run: with the runner patched to pass `--trace-forward-hashes` for the integrity test specifically, the relay logs ingress + egress SHA pairs the runner can compare end-to-end. `wg peer list` shows where each known peer's drops are landing without the runner having to diff aggregate counters between bursts. Smoke test: tests/test_wg_relay_trace.cc drives the wg-relay forwarder in-process with two peers + one link, exercises both trace=on / trace=off paths, and verifies the per-peer drop_no_link counter increments on an unlinked peer's traffic. Steady-state forwarding path is unchanged when the trace flag is off — the per-frame log is gated behind a single bool check, and the per-peer counter bumps are next to existing aggregate increments. The 24-h soak signal from 0.2.1 stands.
Daemon-side answer to the t1-xdp-attach failure on the 0.2.1
cloud-gcp-c4 benchmark, where gVNIC didn't support
XDP_DRV_MODE and the relay silently fell back to SKB mode —
the runner labelled subsequent rows "xdp" that were really
running on the userspace recv loop.
What the daemon does now:
* `xdp_attached iface=<nic> ifindex=N mode={drv,skb}
driver=<drv> kernel=<release>` is emitted per attached NIC
on success. The achieved mode + driver/kernel pair are
legible from a single grep instead of having to infer the
state from older free-form text.
* `--xdp-mode={drv,skb,auto,off}` (default `drv`). Replaces
the previous hardcoded "try DRV, fall back to SKB" loop
that was the source of the silent fallback. `auto`
preserves the old behaviour; `skb` forces generic; `off`
skips XDP attach entirely so the operator can keep
`--xdp-interface` in the unit file and toggle at runtime.
* Attach failure is FATAL. `XdpAttach` returning false logs
`xdp_attach_failed iface=<nic> ifindex=N mode=<mode>
driver=<drv> kernel=<release> reason=<errno-string>` and
WgRelayStart tears the relay down; main.cc exits non-zero.
The runner sees a clear failure signal rather than a
silently-degraded relay.
* Driver / kernel detection helpers: `ReadNicDriver` reads
`/sys/class/net/<iface>/device/driver` and returns its
basename; `ReadKernelRelease` calls `uname()`. Both safe
when sysfs paths are absent (loopback, bridges) — they
return "?" and the log line still parses.
What the operator should observe on next run: a fresh daemon
launch on the cloud fleet either attaches XDP and logs the
achieved mode, or fails fast with the structured
`xdp_attach_failed` line. Userspace path is no longer reached
implicitly.
gVNIC / vmxnet3 / virtio-net advertise XDP only in generic
mode on stock kernels through 6.12 — operators on those NICs
need `--xdp-mode=auto` (or `=skb`) to opt into the generic
fallback explicitly. Documented in --help and in the
0.2.2 CHANGELOG note.
Smoke tests in tests/test_wg_relay_trace.cc:
* RejectsUnknownXdpMode — invalid mode → WgRelayStart fails.
* XdpModeOffSkipsAttach — `off` skips attach even when an
interface is configured.
Steady-state forwarding path is untouched. The 24-h soak
signal from 0.2.1 stands.
Decision: option (b) from the brief. The "each peer is in at most one link" invariant is load-bearing — it's the iter-1 design contract that lets the relay forward purely on the source 4-tuple. The relay's design memory captures the rationale, and the comment at LinkAddLocked already calls it out. Multi-link mesh routing is future work (per-link UDP ports or in-packet introspection); not on the path for 0.2.2. Daemon-side fix: differentiate the einheit-channel error codes so the benchmark runner can detect "you hit the one-link-per- peer limit" specifically rather than the generic `wg_link_failed`. What changed: * New `WgRelayLinkAddResult` enum (kWgLinkOk / UnknownPeer / SelfLink / LimitExceeded / Duplicate) plus a `WgRelayLinkAddDetail` accessor that returns it. Bool `WgRelayLinkAdd` is preserved as a thin wrapper for the many call sites that only need success/failure. * `WgLinkAdd` einheit handler maps each non-OK result onto a distinct error code: `wg_peer_unknown`, `wg_link_self`, `link_limit_exceeded`, `wg_link_duplicate`. The runner now has a discriminator to switch on; star-topology configs that land on the limit can fall back to disjoint pairs without parsing free-form error text. * `wg link add` verb description in the registry calls out the iter-1 limit so `wg link add --help` documents it. What the operator should observe on next run: a `wg link add` that exceeds the one-link-per-peer limit returns `error.code = link_limit_exceeded` with a one-line message explaining the rule. Other failure modes (typo'd peer, duplicate pair, self-link) return distinct codes too. No silent drops — every rejection is named and surfaced. Regression: tests/test_wg_relay_links.cc covers all five outcome codes (Ok / UnknownPeer / SelfLink / LimitExceeded / Duplicate), the star topology from the brief, and the runner's expected fallback (disjoint pairs). The bool wrapper is exercised separately to confirm any non-OK detail still maps to false. Steady-state forwarding path is untouched.
Two flake sources in the test, both observed across 15 back-to-back runs of the full binary: * SetUp picked the relay's UDP port via bind-then-close on loopback. A different process grabbing the port between the test's close() and the relay's bind() left WgRelayStart returning nullptr. Replaced with StartRelay, which retries up to 5 times with fresh ephemeral ports. * The test read fwd_packets immediately after recv() on bob's socket returned, but the relay loop bumps the counter AFTER sendto, so RecvOnBob can return while the relay thread is still between sendto and fetch_add. The trace egress log fires in the same window, which is why the failing run shows the egress log but stats=0. Added WaitFwdPackets that polls up to ~500 ms. Trace path is unchanged. The egress log line still fires in the order ingress → sendto → egress → counter — same as production.
Followup on the daemon-0.2.2 review: the three drop classes the brief asked for that I'd skipped as "unattributable to a known peer" still have an apparent source IP, and that's exactly what the runner needs to triage the c4_gcp internal- vs-external NAT bug — `drop_handshake_no_pubkey_match` aggregate is the failure mode but says nothing about which IP is producing it. Added a small bounded `drop_by_src` map (host-byte-order uint32 → struct of three counters + last_seen_ns), capped at 256 source IPs with FIFO eviction on overflow. Bumped at every userspace site that increments `drop_unknown_src`, the userspace `drop_not_wg_shaped` shape-filter site, and the `drop_handshake_no_pubkey_match` site at the bottom of `HandleUnknownSrcHandshakeLocked`. Surfaced via a new `wg show drop_sources` verb that emits one row per source IP with the three counters plus last-seen timestamp. v6 sources are skipped — the brief's diagnostic targets are v4 NAT mismatches. The XDP-path `drop_not_wg_shaped` (in the BPF program) stays aggregate-only — populating a per-source map from XDP would need new BPF map plumbing and isn't on the runner's diagnostic path. The userspace shape-filter path (cold; runs when XDP isn't attached or when XDP_PASSes for shape) is covered. Smoke test in test_wg_relay_trace.cc: send a 1-byte non-WG-shaped UDP datagram, verify the histogram populates with `drop_not_wg_shaped >= 1` for the source IP. The aggregate counters in `wg show` are unchanged; this is strictly an additive observability surface.
|
Thanks for the review — addressing each point. 1. P0.3 — option (b) rationale, with file:lineThe iter-1 limit is load-bearing in the data plane, not incidental scaffolding:
Relaxing to N≥4 (option a) would require: (1) BPF map type change to a list-valued map or a different key scheme, (2) some way to disambiguate the destination from the packet — the relay has nothing other than the source 4-tuple to go on, since it deliberately ignores WG protocol contents. That's a multi-week design change for iteration 2. Option (b) — distinct error code so the runner sees the rejection cleanly — is the right call for 0.2.2. The iter-1 invariant is documented in 2. P0.1 — counter set was incomplete; fixed in 7b3deb0You're right. Added a per-source-IP histogram in XDP-path 3. P0.1 — t1-integrity reproduction statusP0.1 ships diagnostics. Integrity reproduction is blocked on the runner agent enabling The trace flag's own smoke tests pass (the 5 test_wg_relay_trace cases), proving the new code paths are live. I did not run the integrity test against a 0.2.1-equivalent build with trace on — that requires the runner-side change to flip the flag for that test, which the brief explicitly puts out of scope ("HD.Benchmark/tooling — different agent, different repo"). Recommend that as the next runner-side change. Once the runner enables 4. P0.1 — flake fix nature: test-only, no daemon changeBoth fixes in Port-bind retry: the test harness binds a UDP socket on port 0, reads the assigned ephemeral port, closes the fd, and passes the port to Counter-poll: the test calls The trace-log line firing without the counter incrementing is the same skew window — the egress log is emitted between sendto and fetch_add. That ordering is intentional (we want to log "we sent it" before claiming the forward succeeded in counters); the test just needs to acknowledge the window. |
Daemon-side changes from the 0.2.2 brief from the benchmark agent. Each P0 item is one commit; all four pass local repro.
P0.1 — diagnostic surface for t1-integrity (
8a13935)--trace-forward-hashesdaemon flag (off by default). When enabled, every wg-relay forward logs a SHA-256(payload) hash + length + source/destination peer pubkey prefix at two points: ingress (after the source peer matches and after optional MAC1 verification) and egress (just before the sendto to the destination). The same SHA on both lines proves the relay didn't mutate the frame; divergence pinpoints a corrupting code path. Per-frame log on the hot path — start warns the operator and the docs note it must not be left on in production.wg peer listnow surfaces per-peer drop counterspeer.<i>.drop_no_linkandpeer.<i>.drop_pubkey_mismatch. These are the pair-attributable subset of the existing aggregate counters. Aggregate counters are unchanged; the other drop classes (drop_unknown_src,drop_not_wg_shaped,drop_handshake_no_pubkey_match) are by definition unattributable to a known peer and stay aggregate-only.Operator-observable behaviour: with the runner patched to pass
--trace-forward-hashesfor the integrity test specifically, the relay logs ingress + egress SHA pairs the runner can compare end-to-end.wg peer listshows where each known peer's drops are landing without diffing aggregate counters between bursts.P0.2 — XDP attach (
acb29ec)xdp_attached iface=<nic> ifindex=N mode={drv,skb} driver=<drv> kernel=<release>log line per attached NIC.--xdp-mode={drv,skb,auto,off}flag, defaultdrv. The previous code unconditionally tried DRV then fell back to SKB — the silent fallback that on cloud-gcp-c4 labelled rows "xdp" that were really running on the userspace recv loop.autopreserves the historical behaviour explicitly;offskips XDP attach even when--xdp-interfaceis set.XdpAttachfailure logsxdp_attach_failed iface=<nic> ifindex=N mode=<mode> driver=<drv> kernel=<release> reason=<errno-string>andWgRelayStartreturns nullptr →main.ccexits non-zero. The runner sees a clear failure signal rather than a silently-degraded relay.--xdp-mode=auto(or=skb) to opt in to the generic fallback explicitly. Documented in--helpand in the 0.2.2 CHANGELOG note.Operator-observable behaviour: a fresh daemon launch on the cloud fleet either attaches XDP and logs the achieved mode, or fails fast with the structured
xdp_attach_failedline. Userspace path is no longer reached implicitly.P0.3 — link-per-peer limit (
8137a2c)Decision: option (b) from the brief. The "each peer is in at most one link" invariant is load-bearing — it's the iter-1 design contract that lets the relay forward purely on the source 4-tuple. Multi-link mesh routing is future work (per-link UDP ports or in-packet introspection); not on the path for 0.2.2.
WgRelayLinkAddResultenum (kWgLinkOk/UnknownPeer/SelfLink/LimitExceeded/Duplicate) plusWgRelayLinkAddDetail. BoolWgRelayLinkAddkept as a thin wrapper for callers that only need success/failure.WgLinkAddeinheit handler maps each non-OK result onto a distinct error code:wg_peer_unknown,wg_link_self,link_limit_exceeded,wg_link_duplicate. The runner now has a discriminator to switch on; star-topology configs that hit the limit can fall back to disjoint pairs without parsing free-form error text.wg link addverb description in the registry calls out the iter-1 limit sowg link add --helpdocuments it.Operator-observable behaviour: a
wg link addthat exceeds the one-link-per-peer limit returnserror.code = link_limit_exceededwith a one-line message explaining the rule. Other failure modes (typo'd peer, duplicate pair, self-link) return distinct codes too.P1 — verb registration (verified, no code change)
einheit_channel.cc:2191-2254registers all 11wg_*verbs (peer add/pubkey/nic/remove/list,link add/remove/list,show,show config,blocklist list) with handler, role, dotted path, description, and arg schema. The round-1oneshot: no matching commandwas the einheit CLI'shd_relayadapter (separate repo, out of scope).Test coverage
tests/test_wg_relay_trace.cc(5 cases): trace on/off, invalid--xdp-mode,--xdp-mode=off, per-peer drop counter increment.tests/test_wg_relay_links.cc(7 cases): all five outcome codes, the brief's star topology, the runner's expected fallback (disjoint pairs).12/12 local pass.
Out of scope (confirmed)
No changes in
~/dev/HD.Benchmark,~/dev/einheit, DERP mode, HD-Protocol mode, or the wg-relay steady-state forwarding hot path. Trace logging is gated behind a single bool check; per-peer counter bumps sit next to existing aggregate increments. The 0.2.1 24 h T2 soak signal stands.