Skip to content

wg-relay hardening: shape filter, MAC1, automatic roaming, blocklist#19

Merged
KRuskowski merged 13 commits into
masterfrom
wg-relay-hardening
Apr 29, 2026
Merged

wg-relay hardening: shape filter, MAC1, automatic roaming, blocklist#19
KRuskowski merged 13 commits into
masterfrom
wg-relay-hardening

Conversation

@KRuskowski
Copy link
Copy Markdown
Contributor

Summary

Hardens mode: wireguard against malformed/forged traffic on the public UDP port. Bumps the project to 0.2.1 with a curated CHANGELOG + GitHub release notes.

The hardening is layered — each layer narrows what reaches the next:

  1. WG-shape filter at XDP (740701e): drops packets whose first byte isn't a WireGuard message type (1/2/3/4) or whose length doesn't match the type. Non-WG noise on the relay's port stops at the NIC. New counter drop_not_wg_shaped.

  2. MAC1 verification (7698846): when both ends of a link have a stamped pubkey, every handshake init/response from a registered peer is verified against the partner's pubkey via Blake2s-keyed MAC1. Mismatch → drop_handshake_pubkey_mismatch. Engages only when partner pubkey is set, so existing operators keep today's behaviour.

  3. Automatic peer roaming (eaa0c2d, 8f0d28b, ada485a, 0c2951e): a peer's endpoint auto-updates when their IP changes. Handshakes from unknown sources are matched against every partner's pubkey via MAC1; a candidate endpoint is registered. The committed endpoint stays put until the partner's response (with matching receiver_index) confirms — closing the forge-init-then-bounce hijack vector. Fuzz-driven follow-ups: don't refresh contested slot, refresh sender_index on legitimate retry. New per-peer counter endpoint_relearn.

  4. Dynamic source-IP blocklist (14f9ce2): source IPs that produce repeated failed-confirm strikes escalate onto a BPF blocklist. Defaults: 2/60s → 60s; 5/1h → 1h; 10/24h → 24h. Blocked sources drop at the top of XDP. Closes the relay-as-anonymizer attack against the partner. New verb wg blocklist list, new counters xdp_drop_blocklisted, per-IP strike records.

  5. Fuzz-found amplifier fixes (edfa8bb, cfecbcd): drop type-2 (response) from unknown source outright (free unauthenticated amplifier surface), rate-limit retry-init forwards from an unconfirmed candidate to 1/s, and sweep stale strike entries during candidate expiry so spoofed-source one-shot strikes can't grow the table without bound.

Crypto

Standalone Blake2s in src/crypto/ — RFC 7693 reference port. libsodium ships Blake2b only and WireGuard's MAC1 uses Blake2s. Verified against the published RFC test vector + structural invariants (tests/test_hd_blake2s.cc).

Verification

Tested live on the libvirt fleet (hd-r2 relay + hd-c1 alice + hd-c2 bob) and a 25 GbE Haswell bare-metal relay:

  • Legitimate wg-quick clients keep roaming successfully (3/3 ping at ~1 ms after each redeploy).
  • Fuzz battery covering empty, short, long, wrong-type, wrong-pubkey-MAC1, valid-MAC1-forge, type-2 unknown-src, retry churn — all caught by the appropriate counter; alice's endpoint stays put under sustained forge.
  • Blocklist policy (2 strikes / 60s → 60s block) fires correctly; XDP drops subsequent packets from blocklisted source while legitimate peers continue uninterrupted.

First piece of the 0.2.1 hardening series (see
docs/design/wg_relay_hardening.md).  Inspect the first byte of
the UDP payload at XDP and confirm it's a WireGuard message
type (1 init / 2 response / 3 cookie / 4 transport) with the
expected length (148 / 92 / 64 / >=32 respectively); drop
otherwise.

Userspace forwarder mirrors the same check so the cold-start
path can't be used to bypass the BPF filter.  New counter
drop_not_wg_shaped surfaced in `wg show`, summed across the
two paths so the operator sees one number.

Verified on the libvirt fleet:
* fleet ping test still PASS (4/4, XDP fast-path)
* sending random garbage / type 0x05 forgeries from a
  registered peer increments drop_not_wg_shaped, doesn't tick
  drop_unknown_src (filter fires before peer lookup)
* real WG handshake + transport packets all pass through
Second piece of the 0.2.1 hardening series. When the operator has
stamped a pubkey on a peer's link partner, every handshake
init/response from that peer is verified: MAC1 is computed as
Blake2s_keyed(Blake2s(LABEL_MAC1 || partner_pubkey),
msg[0..len-32], 16) and compared against the MAC1 field at
[len-32..len-16]. Mismatch -> drop, drop_handshake_pubkey_mismatch++.

Engages only when the partner has a stamped pubkey, so existing
operators retain today's behaviour exactly until they opt in.

* New self-contained Blake2s in src/crypto/. libsodium ships
  Blake2b only; WG uses Blake2s. RFC 7693 reference port,
  smoke-tested against published vectors.
* BPF XDP_PASS for type 1/2 so userspace owns MAC1 verification.
  Cookie (3) and transport data (4) keep the XDP fast path.
* Verified on libvirt fleet: baseline + with-real-pubkeys both
  PASS; with-fake-pubkey drops handshakes correctly; restore
  recovers cleanly.
Third piece of the 0.2.1 hardening series, the headline feature.
A peer's endpoint auto-updates when their IP changes, without
any client-side script and without operator intervention.

Flow:
* Handshake init/response from an unknown source is matched
  against every registered peer's pubkey via MAC1. Peer whose
  pubkey verifies is the destination; the actual sender is
  that peer's link partner.
* Sender's candidate_endpoint is set; committed endpoint stays
  untouched. Handshake is forwarded to the destination.
* On forward of a handshake response where dst has a pending
  candidate, response is mirrored to BOTH committed + candidate.
  Real-alice at the new IP receives it and completes the
  handshake; stale-alice at the old IP keeps working.
* Transport data from an unknown source is checked against
  every peer's candidate. Match commits the candidate as the
  live endpoint, refreshes the BPF wg_peers map so XDP picks
  up the new endpoint, persists the roster, increments
  endpoint_relearn.
* Candidate slots time out after 30 s. drop_relearn_unconfirmed
  ticks per expiry — strong signal of a forged handshake (the
  source has the pubkey but can't progress without the private
  key).
* Per-peer 5 s cooldown prevents flapping.

Verified on libvirt fleet: stale-then-real-alice scenario
relearns correctly, ping 5/5 at 1.3 ms, XDP fast path active
post-relearn.
Fourth piece of the 0.2.1 hardening series.  When a candidate
endpoint expires without transport-data confirming it, the
source IP earns a strike.  Once thresholds are crossed, the
source goes onto a BPF blocklist and every packet from it gets
dropped at the top of XDP.

Strike escalation policy (hardcoded for now; configurable
later via wg blocklist policy):
  * 2 strikes / 60 s   → block 60 s
  * 5 strikes / 1 h    → block 1 h
  * 10 strikes / 24 h  → block 24 h

Why it matters: without it, a forger who knows bob's pubkey can
keep blasting handshake inits.  The candidate-confirm gate stops
the endpoint hijack, but the relay still forwards every forgery
to bob — laundering the attacker's traffic, masking their
source IP, and burning bob's CPU/bandwidth.  The blocklist
shuts that down for single-source attackers and gives operators
a clean audit signal (wg blocklist list, drop_blocklisted,
drop_relearn_unconfirmed).

Implementation:

* New BPF map wg_blocklist (HASH key=u32 src IPv4 NBO,
  value=u64 expiry_ns).  Drop check at top of XDP after the
  port gate.
* New BPF stat STAT_DROP_BLOCKLISTED + WgXdpStats counter
  drop_blocklisted; surfaced as xdp_drop_blocklisted in
  wg show.
* WgRelay gains strikes (per-IP record) and blocklist (per-IP
  expiry) maps.  RecordStrikeLocked walks the policy table on
  every failed candidate expiry.
* Blocklist sweep on every HandlePacket call (cheap; rare for
  active blocks to outlive their window).
* Successful confirm (transport-data matched) clears the
  source IP's strike record — legitimate roamer, not a forger.
* Userspace HandlePacket also gates on the userspace blocklist
  copy, so cold-start packets before XDP attaches still drop.
* New einheit verb: wg blocklist list.  Returns ip,
  seconds_left, total_strikes per active block.

Verified on libvirt fleet:
* baseline (no forgeries): blocklist=empty, fleet test PASS,
  no false-positive blocks for legitimate roaming.
* xdp_drop_blocklisted counter wired and reads as expected.

Manual add/remove + policy tunables left as a follow-up; the
defaults are conservative enough that operators won't need to
touch them in the common case.
… tests

Final piece of the wg-relay-hardening branch.

* CMakeLists.txt: VERSION 0.2.0 → 0.2.1.
* CHANGELOG.md: 0.2.1 entry covering shape filter, MAC1
  verification, automatic roaming, dynamic blocklist, and
  the standalone Blake2s under src/crypto/.
* dist/release-notes/v0.2.1.md: curated GitHub release body,
  same shape as v0.2.0 — leads with the roaming feature
  (the operator-visible win) and reframes the blocklist as
  the relay-as-anonymizer fix.
* tests/test_hd_blake2s.cc: 5 unit tests for the new
  Blake2s implementation. Two RFC 7693 published vectors
  (abc, empty) plus three structural assertions (output-
  length influences digest, block-boundary off-by-one,
  keyed mode determinism + key sensitivity).
* tests/CMakeLists.txt: filter test_hd_blake2s out of the
  unit-test glob so it can link its own crypto sources
  without the libderp transitive surface.
Two bugs surfaced during the active blocklist trigger test on
the libvirt fleet, both fixed here.

(1) Idle relay never expired candidates.
    ExpireCandidatesLocked only ran inside HandlePacket. A
    relay with no live traffic would hold a candidate slot
    open indefinitely — the strike that should have fired at
    30 s never landed, so the blocklist could never escalate
    on a slow-cadence forger.
    Fix: 1 s periodic sweep in RecvLoop's poll-timeout
    branch. Cheap (locks peers_mu briefly, walks the small
    peer table, walks the small blocklist).

(2) Back-to-back forgeries from the same source pinned the
    candidate forever.
    HandleUnknownSrcHandshakeLocked unconditionally
    overwrote candidate_endpoint and candidate_set_ns on
    every matching handshake. An attacker spamming
    handshakes (or just a flaky client retransmitting) kept
    bumping set_ns to NowNs(), so the 30 s expiry window
    never closed.
    Fix: split the path. If the new handshake source
    matches the current candidate, no-op forward (don't
    refresh the timer). If it's a different source
    contesting an active candidate, drop with
    drop_unknown_src. Either way, the original candidate's
    timer keeps running and either confirms or expires on
    schedule.

Verified on libvirt:
* forge 1 → candidate registered.
* 30 s later: strike #1 (drop_relearn_unconfirmed=1).
* forge 2 (different source port) within the 60 s window:
  candidate registered again with a fresh timer.
* 30 s later: strike #2 → policy match (2/60s) → blocklist
  entry for 192.168.122.1, 60 s remaining.
* forge 3 from same source: dropped at top of XDP,
  xdp_drop_blocklisted=1, never reaches the forward path.

Verified on Haswell 25 GbE (sanity, no regression):
* 4/4 ping over the relay, 0.73 ms avg.
* iperf3 single-stream TCP: 10.9 Gbit/s (vs 10.7 baseline).
* xdp_fwd_packets ~= 10.5 M with only 1 cold-start
  xdp_pass_no_mac — the new sweep + contest checks add
  zero measurable cost on the data plane.
P0 bug found by fuzzing the new roaming flow: an attacker who
knows bob's pubkey could send a forged handshake init followed
by *any* 32-byte UDP starting with 0x04 from the same source
and the relay would commit the candidate as the new endpoint.
Hijack confirmed in fuzz — alice's endpoint moved to the
forger's IP.

Root cause: ConfirmCandidateLocked treated any transport-data-
shaped packet from the candidate as proof the handshake
completed. The shape check was load-bearing for security but
couldn't actually distinguish a real session's transport from
a forged 0x04 byte.

Fix: walk the WG protocol enough to attribute responses to
specific inits. WireGuard's handshake response echoes the
initiator's `sender_index` in its `receiver_index` field. We
now:

* save the candidate init's sender_index when registering
  the candidate (bytes 4..8 of the type-1 packet);
* on forwarding a partner's response (type 2), read its
  receiver_index (bytes 8..12) and only set
  candidate_partner_responded if it matches the saved
  sender_index — proving this response is for THIS
  candidate's init, not a concurrent legitimate handshake
  from the peer at the committed endpoint;
* gate ConfirmCandidateLocked on
  candidate_partner_responded.

Bob silently drops a forged init (the encrypted static field
is garbage, no decryption matches a known peer pubkey), so no
matching response ever flows. The candidate's
partner_responded stays false. Transport-data from the
candidate doesn't confirm. Hijack closed.

Verified on libvirt fleet:

* Forge sequence (init + type-4 same source): endpoint stays
  at the registered value, drop_unknown_src catches the
  forged transport attempts.
* Legitimate roam (alice's IP changed): handshake registers
  candidate, bob's response receiver_idx matches the stored
  init sender_idx, partner_responded fires, transport from
  new IP confirms, endpoint commits, ping 5/5 at ~1 ms.
When a peer roams and its first handshake init is in flight,
wg.ko may retry from the same endpoint before bob's response
lands. Each retry picks a fresh sender_index, but the
no-op-forward branch (same source as registered candidate)
wasn't updating the saved candidate_init_sender_index. If bob
ended up responding to a later retry init, his response's
receiver_index wouldn't match the saved (first) sender_index,
the partner-response check would fail, and the candidate
would never confirm — breaking legitimate roams that involve
retries.

Refresh candidate_init_sender_index on every type-1 retry
from the same source. The hijack defense is unaffected: the
match still requires a partner-attributable response, and
only the legitimate sender's wg.ko knows which index its own
init used.
Two related amplifier surfaces in the unknown-source handshake
handler, both surfaced by fuzzing:

1. Type-2 (response) from an unknown source has no place in
   the protocol — legitimate responses come from the committed
   responder endpoint and hit the regular forward path. The
   old code accepted the type-2, registered a (never-confirm-
   able) candidate, and forwarded it to the partner. A forger
   could use this single accepted packet to claim alice's
   candidate slot, then bounce arbitrary WG-shaped packets at
   bob via the same-source no-op-forward branch, all without
   any auth. Drop type-2 outright at the entry of the unknown-
   source handler.

2. Even with type-2 closed, a forger holding the responder's
   public mac1 key can craft unlimited valid type-1 inits and
   bounce them through the no-op-forward branch at line rate
   for the 30 s candidate window. Cap retry forwards at one
   per second per unconfirmed candidate. wg.ko's own retry
   cadence is 5 s, so legit clients are unaffected; a flood
   of forged retries gets clamped to ~1 pps and then strikes
   into the blocklist.

Drops above the rate-limit count as drop_unknown_src so a
probing forger can't distinguish "you're rate-limited" from
"your packet was malformed."
A strike entry whose first_strike_ns is older than the widest
policy window (24 h) can no longer escalate anything: every
policy check uses `now - first_strike_ns <= window`, and at
that age all comparisons fail. The entry just sits in the
std::map taking up memory.

Without this sweep, a forger spraying from spoofed source IPs
(each striking once and never returning) would grow the strike
map without bound — a slow leak proportional to the rate of
distinct attack sources. The candidate-slot contention already
makes this a marginal attack, but the leak is a separate
correctness issue worth closing.

Sweep stale entries on each ExpireCandidatesLocked tick,
alongside the existing blocklist sweep.
Fold the four fuzz-driven hardening commits (partner-
attributable confirm, type-2 unknown-src drop, retry rate-
limit, strike sweep) into the 0.2.1 unreleased entry so the
release notes match what's on the branch.
@KRuskowski KRuskowski merged commit abb9d27 into master Apr 29, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant