wg-relay hardening: shape filter, MAC1, automatic roaming, blocklist by KRuskowski · Pull Request #19 · hyper-derp/Hyper-DERP

KRuskowski · 2026-04-29T20:06:11Z

Summary

Hardens mode: wireguard against malformed/forged traffic on the public UDP port. Bumps the project to 0.2.1 with a curated CHANGELOG + GitHub release notes.

The hardening is layered — each layer narrows what reaches the next:

WG-shape filter at XDP (740701e): drops packets whose first byte isn't a WireGuard message type (1/2/3/4) or whose length doesn't match the type. Non-WG noise on the relay's port stops at the NIC. New counter drop_not_wg_shaped.
MAC1 verification (7698846): when both ends of a link have a stamped pubkey, every handshake init/response from a registered peer is verified against the partner's pubkey via Blake2s-keyed MAC1. Mismatch → drop_handshake_pubkey_mismatch. Engages only when partner pubkey is set, so existing operators keep today's behaviour.
Automatic peer roaming (eaa0c2d, 8f0d28b, ada485a, 0c2951e): a peer's endpoint auto-updates when their IP changes. Handshakes from unknown sources are matched against every partner's pubkey via MAC1; a candidate endpoint is registered. The committed endpoint stays put until the partner's response (with matching receiver_index) confirms — closing the forge-init-then-bounce hijack vector. Fuzz-driven follow-ups: don't refresh contested slot, refresh sender_index on legitimate retry. New per-peer counter endpoint_relearn.
Dynamic source-IP blocklist (14f9ce2): source IPs that produce repeated failed-confirm strikes escalate onto a BPF blocklist. Defaults: 2/60s → 60s; 5/1h → 1h; 10/24h → 24h. Blocked sources drop at the top of XDP. Closes the relay-as-anonymizer attack against the partner. New verb wg blocklist list, new counters xdp_drop_blocklisted, per-IP strike records.
Fuzz-found amplifier fixes (edfa8bb, cfecbcd): drop type-2 (response) from unknown source outright (free unauthenticated amplifier surface), rate-limit retry-init forwards from an unconfirmed candidate to 1/s, and sweep stale strike entries during candidate expiry so spoofed-source one-shot strikes can't grow the table without bound.

Crypto

Standalone Blake2s in src/crypto/ — RFC 7693 reference port. libsodium ships Blake2b only and WireGuard's MAC1 uses Blake2s. Verified against the published RFC test vector + structural invariants (tests/test_hd_blake2s.cc).

Verification

Tested live on the libvirt fleet (hd-r2 relay + hd-c1 alice + hd-c2 bob) and a 25 GbE Haswell bare-metal relay:

Legitimate wg-quick clients keep roaming successfully (3/3 ping at ~1 ms after each redeploy).
Fuzz battery covering empty, short, long, wrong-type, wrong-pubkey-MAC1, valid-MAC1-forge, type-2 unknown-src, retry churn — all caught by the appropriate counter; alice's endpoint stays put under sustained forge.
Blocklist policy (2 strikes / 60s → 60s block) fires correctly; XDP drops subsequent packets from blocklisted source while legitimate peers continue uninterrupted.

First piece of the 0.2.1 hardening series (see docs/design/wg_relay_hardening.md). Inspect the first byte of the UDP payload at XDP and confirm it's a WireGuard message type (1 init / 2 response / 3 cookie / 4 transport) with the expected length (148 / 92 / 64 / >=32 respectively); drop otherwise. Userspace forwarder mirrors the same check so the cold-start path can't be used to bypass the BPF filter. New counter drop_not_wg_shaped surfaced in `wg show`, summed across the two paths so the operator sees one number. Verified on the libvirt fleet: * fleet ping test still PASS (4/4, XDP fast-path) * sending random garbage / type 0x05 forgeries from a registered peer increments drop_not_wg_shaped, doesn't tick drop_unknown_src (filter fires before peer lookup) * real WG handshake + transport packets all pass through

Second piece of the 0.2.1 hardening series. When the operator has stamped a pubkey on a peer's link partner, every handshake init/response from that peer is verified: MAC1 is computed as Blake2s_keyed(Blake2s(LABEL_MAC1 || partner_pubkey), msg[0..len-32], 16) and compared against the MAC1 field at [len-32..len-16]. Mismatch -> drop, drop_handshake_pubkey_mismatch++. Engages only when the partner has a stamped pubkey, so existing operators retain today's behaviour exactly until they opt in. * New self-contained Blake2s in src/crypto/. libsodium ships Blake2b only; WG uses Blake2s. RFC 7693 reference port, smoke-tested against published vectors. * BPF XDP_PASS for type 1/2 so userspace owns MAC1 verification. Cookie (3) and transport data (4) keep the XDP fast path. * Verified on libvirt fleet: baseline + with-real-pubkeys both PASS; with-fake-pubkey drops handshakes correctly; restore recovers cleanly.

Third piece of the 0.2.1 hardening series, the headline feature. A peer's endpoint auto-updates when their IP changes, without any client-side script and without operator intervention. Flow: * Handshake init/response from an unknown source is matched against every registered peer's pubkey via MAC1. Peer whose pubkey verifies is the destination; the actual sender is that peer's link partner. * Sender's candidate_endpoint is set; committed endpoint stays untouched. Handshake is forwarded to the destination. * On forward of a handshake response where dst has a pending candidate, response is mirrored to BOTH committed + candidate. Real-alice at the new IP receives it and completes the handshake; stale-alice at the old IP keeps working. * Transport data from an unknown source is checked against every peer's candidate. Match commits the candidate as the live endpoint, refreshes the BPF wg_peers map so XDP picks up the new endpoint, persists the roster, increments endpoint_relearn. * Candidate slots time out after 30 s. drop_relearn_unconfirmed ticks per expiry — strong signal of a forged handshake (the source has the pubkey but can't progress without the private key). * Per-peer 5 s cooldown prevents flapping. Verified on libvirt fleet: stale-then-real-alice scenario relearns correctly, ping 5/5 at 1.3 ms, XDP fast path active post-relearn.

Fourth piece of the 0.2.1 hardening series. When a candidate endpoint expires without transport-data confirming it, the source IP earns a strike. Once thresholds are crossed, the source goes onto a BPF blocklist and every packet from it gets dropped at the top of XDP. Strike escalation policy (hardcoded for now; configurable later via wg blocklist policy): * 2 strikes / 60 s → block 60 s * 5 strikes / 1 h → block 1 h * 10 strikes / 24 h → block 24 h Why it matters: without it, a forger who knows bob's pubkey can keep blasting handshake inits. The candidate-confirm gate stops the endpoint hijack, but the relay still forwards every forgery to bob — laundering the attacker's traffic, masking their source IP, and burning bob's CPU/bandwidth. The blocklist shuts that down for single-source attackers and gives operators a clean audit signal (wg blocklist list, drop_blocklisted, drop_relearn_unconfirmed). Implementation: * New BPF map wg_blocklist (HASH key=u32 src IPv4 NBO, value=u64 expiry_ns). Drop check at top of XDP after the port gate. * New BPF stat STAT_DROP_BLOCKLISTED + WgXdpStats counter drop_blocklisted; surfaced as xdp_drop_blocklisted in wg show. * WgRelay gains strikes (per-IP record) and blocklist (per-IP expiry) maps. RecordStrikeLocked walks the policy table on every failed candidate expiry. * Blocklist sweep on every HandlePacket call (cheap; rare for active blocks to outlive their window). * Successful confirm (transport-data matched) clears the source IP's strike record — legitimate roamer, not a forger. * Userspace HandlePacket also gates on the userspace blocklist copy, so cold-start packets before XDP attaches still drop. * New einheit verb: wg blocklist list. Returns ip, seconds_left, total_strikes per active block. Verified on libvirt fleet: * baseline (no forgeries): blocklist=empty, fleet test PASS, no false-positive blocks for legitimate roaming. * xdp_drop_blocklisted counter wired and reads as expected. Manual add/remove + policy tunables left as a follow-up; the defaults are conservative enough that operators won't need to touch them in the common case.

… tests Final piece of the wg-relay-hardening branch. * CMakeLists.txt: VERSION 0.2.0 → 0.2.1. * CHANGELOG.md: 0.2.1 entry covering shape filter, MAC1 verification, automatic roaming, dynamic blocklist, and the standalone Blake2s under src/crypto/. * dist/release-notes/v0.2.1.md: curated GitHub release body, same shape as v0.2.0 — leads with the roaming feature (the operator-visible win) and reframes the blocklist as the relay-as-anonymizer fix. * tests/test_hd_blake2s.cc: 5 unit tests for the new Blake2s implementation. Two RFC 7693 published vectors (abc, empty) plus three structural assertions (output- length influences digest, block-boundary off-by-one, keyed mode determinism + key sensitivity). * tests/CMakeLists.txt: filter test_hd_blake2s out of the unit-test glob so it can link its own crypto sources without the libderp transitive surface.

Two bugs surfaced during the active blocklist trigger test on the libvirt fleet, both fixed here. (1) Idle relay never expired candidates. ExpireCandidatesLocked only ran inside HandlePacket. A relay with no live traffic would hold a candidate slot open indefinitely — the strike that should have fired at 30 s never landed, so the blocklist could never escalate on a slow-cadence forger. Fix: 1 s periodic sweep in RecvLoop's poll-timeout branch. Cheap (locks peers_mu briefly, walks the small peer table, walks the small blocklist). (2) Back-to-back forgeries from the same source pinned the candidate forever. HandleUnknownSrcHandshakeLocked unconditionally overwrote candidate_endpoint and candidate_set_ns on every matching handshake. An attacker spamming handshakes (or just a flaky client retransmitting) kept bumping set_ns to NowNs(), so the 30 s expiry window never closed. Fix: split the path. If the new handshake source matches the current candidate, no-op forward (don't refresh the timer). If it's a different source contesting an active candidate, drop with drop_unknown_src. Either way, the original candidate's timer keeps running and either confirms or expires on schedule. Verified on libvirt: * forge 1 → candidate registered. * 30 s later: strike #1 (drop_relearn_unconfirmed=1). * forge 2 (different source port) within the 60 s window: candidate registered again with a fresh timer. * 30 s later: strike #2 → policy match (2/60s) → blocklist entry for 192.168.122.1, 60 s remaining. * forge 3 from same source: dropped at top of XDP, xdp_drop_blocklisted=1, never reaches the forward path. Verified on Haswell 25 GbE (sanity, no regression): * 4/4 ping over the relay, 0.73 ms avg. * iperf3 single-stream TCP: 10.9 Gbit/s (vs 10.7 baseline). * xdp_fwd_packets ~= 10.5 M with only 1 cold-start xdp_pass_no_mac — the new sweep + contest checks add zero measurable cost on the data plane.

P0 bug found by fuzzing the new roaming flow: an attacker who knows bob's pubkey could send a forged handshake init followed by *any* 32-byte UDP starting with 0x04 from the same source and the relay would commit the candidate as the new endpoint. Hijack confirmed in fuzz — alice's endpoint moved to the forger's IP. Root cause: ConfirmCandidateLocked treated any transport-data- shaped packet from the candidate as proof the handshake completed. The shape check was load-bearing for security but couldn't actually distinguish a real session's transport from a forged 0x04 byte. Fix: walk the WG protocol enough to attribute responses to specific inits. WireGuard's handshake response echoes the initiator's `sender_index` in its `receiver_index` field. We now: * save the candidate init's sender_index when registering the candidate (bytes 4..8 of the type-1 packet); * on forwarding a partner's response (type 2), read its receiver_index (bytes 8..12) and only set candidate_partner_responded if it matches the saved sender_index — proving this response is for THIS candidate's init, not a concurrent legitimate handshake from the peer at the committed endpoint; * gate ConfirmCandidateLocked on candidate_partner_responded. Bob silently drops a forged init (the encrypted static field is garbage, no decryption matches a known peer pubkey), so no matching response ever flows. The candidate's partner_responded stays false. Transport-data from the candidate doesn't confirm. Hijack closed. Verified on libvirt fleet: * Forge sequence (init + type-4 same source): endpoint stays at the registered value, drop_unknown_src catches the forged transport attempts. * Legitimate roam (alice's IP changed): handshake registers candidate, bob's response receiver_idx matches the stored init sender_idx, partner_responded fires, transport from new IP confirms, endpoint commits, ping 5/5 at ~1 ms.

When a peer roams and its first handshake init is in flight, wg.ko may retry from the same endpoint before bob's response lands. Each retry picks a fresh sender_index, but the no-op-forward branch (same source as registered candidate) wasn't updating the saved candidate_init_sender_index. If bob ended up responding to a later retry init, his response's receiver_index wouldn't match the saved (first) sender_index, the partner-response check would fail, and the candidate would never confirm — breaking legitimate roams that involve retries. Refresh candidate_init_sender_index on every type-1 retry from the same source. The hijack defense is unaffected: the match still requires a partner-attributable response, and only the legitimate sender's wg.ko knows which index its own init used.

Two related amplifier surfaces in the unknown-source handshake handler, both surfaced by fuzzing: 1. Type-2 (response) from an unknown source has no place in the protocol — legitimate responses come from the committed responder endpoint and hit the regular forward path. The old code accepted the type-2, registered a (never-confirm- able) candidate, and forwarded it to the partner. A forger could use this single accepted packet to claim alice's candidate slot, then bounce arbitrary WG-shaped packets at bob via the same-source no-op-forward branch, all without any auth. Drop type-2 outright at the entry of the unknown- source handler. 2. Even with type-2 closed, a forger holding the responder's public mac1 key can craft unlimited valid type-1 inits and bounce them through the no-op-forward branch at line rate for the 30 s candidate window. Cap retry forwards at one per second per unconfirmed candidate. wg.ko's own retry cadence is 5 s, so legit clients are unaffected; a flood of forged retries gets clamped to ~1 pps and then strikes into the blocklist. Drops above the rate-limit count as drop_unknown_src so a probing forger can't distinguish "you're rate-limited" from "your packet was malformed."

A strike entry whose first_strike_ns is older than the widest policy window (24 h) can no longer escalate anything: every policy check uses `now - first_strike_ns <= window`, and at that age all comparisons fail. The entry just sits in the std::map taking up memory. Without this sweep, a forger spraying from spoofed source IPs (each striking once and never returning) would grow the strike map without bound — a slow leak proportional to the rate of distinct attack sources. The candidate-slot contention already makes this a marginal attack, but the leak is a separate correctness issue worth closing. Sweep stale entries on each ExpireCandidatesLocked tick, alongside the existing blocklist sweep.

Fold the four fuzz-driven hardening commits (partner- attributable confirm, type-2 unknown-src drop, retry rate- limit, strike sweep) into the 0.2.1 unreleased entry so the release notes match what's on the branch.

KRuskowski added 13 commits April 29, 2026 18:13

CHANGELOG: extend 0.2.1 wg-relay hardening with fuzz fixes

55d19b4

Fold the four fuzz-driven hardening commits (partner- attributable confirm, type-2 unknown-src drop, retry rate- limit, strike sweep) into the 0.2.1 unreleased entry so the release notes match what's on the branch.

CHANGELOG: unwrap fuzz-hardening bullets

093228d

wg-relay: split if-clause body across lines for cpplint

d34a84c

KRuskowski merged commit abb9d27 into master Apr 29, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wg-relay hardening: shape filter, MAC1, automatic roaming, blocklist#19

wg-relay hardening: shape filter, MAC1, automatic roaming, blocklist#19
KRuskowski merged 13 commits into
masterfrom
wg-relay-hardening

KRuskowski commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KRuskowski commented Apr 29, 2026

Summary

Crypto

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant