Skip to content

STUN prober fans out past MAX_STUN_IN_FLIGHT in one round — trailing derpmap servers always starved (cosmetic) #37

@GeiserX

Description

@GeiserX

Severity: low / cosmetic. Not a connectivity bug — flagged by a consumer-lane field observation + an earlier review pass, confirmed against the code.

Symptom

On a node probing the full default derpmap, ts_magicsock logs "STUN in-flight set full, dropping new request (fail-safe)" repeatedly, and the trailing STUN servers in a probe round never get a request.

Cause

run_stun_proberprobe_stun_servers_once (ts_runtime/src/direct.rs) fires send_stun_request at every server in the derpmap sequentially in one round. The default controlplane.tailscale.com derpmap has ~25–30 FixedAddr-v4 STUN servers, but MAX_STUN_IN_FLIGHT = 16 (ts_magicsock/src/sock.rs). So the first ~16 fill the in-flight set and the rest are dropped fail-safe every round — and because the iteration order is stable, it's always the same trailing servers that are starved.

Why it's NOT a defect (the set drains fine)

send_stun_request prunes the in-flight set by STUN_TX_TTL (5s) before the cap check, so entries self-evict ≤5s after being sent whether or not a response arrives — the set can never permanently wedge. And STUN only feeds direct-path discovery (learning our reflexive address); the DERP relay floor carries traffic regardless, so a starved STUN round never blocks connectivity. (A consumer saw the log spam while a peer was absent from the netmap; it vanished once the peer became reachable — symptom, not cause.)

Suggested fix (low priority)

Make probe_stun_servers_once respect the cap rather than over-fire: e.g. probe at most MAX_STUN_IN_FLIGHT (or a small N) servers per round, round-robin across the derpmap on successive ticks so no server is permanently starved, and/or shuffle the order. A reflexive address is learned from any one server, so probing all ~30 every round is unnecessary anyway. Optionally downgrade the "in-flight full" log from the hot path to once-per-round.

Found via: nk8s consumer-lane field report (raised at low confidence as "probably a symptom") + review-code! pass 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions