Severity: low / cosmetic. Not a connectivity bug — flagged by a consumer-lane field observation + an earlier review pass, confirmed against the code.
Symptom
On a node probing the full default derpmap, ts_magicsock logs "STUN in-flight set full, dropping new request (fail-safe)" repeatedly, and the trailing STUN servers in a probe round never get a request.
Cause
run_stun_prober → probe_stun_servers_once (ts_runtime/src/direct.rs) fires send_stun_request at every server in the derpmap sequentially in one round. The default controlplane.tailscale.com derpmap has ~25–30 FixedAddr-v4 STUN servers, but MAX_STUN_IN_FLIGHT = 16 (ts_magicsock/src/sock.rs). So the first ~16 fill the in-flight set and the rest are dropped fail-safe every round — and because the iteration order is stable, it's always the same trailing servers that are starved.
Why it's NOT a defect (the set drains fine)
send_stun_request prunes the in-flight set by STUN_TX_TTL (5s) before the cap check, so entries self-evict ≤5s after being sent whether or not a response arrives — the set can never permanently wedge. And STUN only feeds direct-path discovery (learning our reflexive address); the DERP relay floor carries traffic regardless, so a starved STUN round never blocks connectivity. (A consumer saw the log spam while a peer was absent from the netmap; it vanished once the peer became reachable — symptom, not cause.)
Suggested fix (low priority)
Make probe_stun_servers_once respect the cap rather than over-fire: e.g. probe at most MAX_STUN_IN_FLIGHT (or a small N) servers per round, round-robin across the derpmap on successive ticks so no server is permanently starved, and/or shuffle the order. A reflexive address is learned from any one server, so probing all ~30 every round is unnecessary anyway. Optionally downgrade the "in-flight full" log from the hot path to once-per-round.
Found via: nk8s consumer-lane field report (raised at low confidence as "probably a symptom") + review-code! pass 1.
Severity: low / cosmetic. Not a connectivity bug — flagged by a consumer-lane field observation + an earlier review pass, confirmed against the code.
Symptom
On a node probing the full default derpmap,
ts_magicsocklogs"STUN in-flight set full, dropping new request (fail-safe)"repeatedly, and the trailing STUN servers in a probe round never get a request.Cause
run_stun_prober→probe_stun_servers_once(ts_runtime/src/direct.rs) firessend_stun_requestat every server in the derpmap sequentially in one round. The defaultcontrolplane.tailscale.comderpmap has ~25–30 FixedAddr-v4 STUN servers, butMAX_STUN_IN_FLIGHT = 16(ts_magicsock/src/sock.rs). So the first ~16 fill the in-flight set and the rest are dropped fail-safe every round — and because the iteration order is stable, it's always the same trailing servers that are starved.Why it's NOT a defect (the set drains fine)
send_stun_requestprunes the in-flight set bySTUN_TX_TTL(5s) before the cap check, so entries self-evict ≤5s after being sent whether or not a response arrives — the set can never permanently wedge. And STUN only feeds direct-path discovery (learning our reflexive address); the DERP relay floor carries traffic regardless, so a starved STUN round never blocks connectivity. (A consumer saw the log spam while a peer was absent from the netmap; it vanished once the peer became reachable — symptom, not cause.)Suggested fix (low priority)
Make
probe_stun_servers_oncerespect the cap rather than over-fire: e.g. probe at mostMAX_STUN_IN_FLIGHT(or a small N) servers per round, round-robin across the derpmap on successive ticks so no server is permanently starved, and/or shuffle the order. A reflexive address is learned from any one server, so probing all ~30 every round is unnecessary anyway. Optionally downgrade the "in-flight full" log from the hot path to once-per-round.Found via: nk8s consumer-lane field report (raised at low confidence as "probably a symptom") + review-code! pass 1.