fix: poll for admin approval instead of dying when no auth URL#262
Conversation
A registration that comes back not-authorized with no interactive URL means the node holds a valid key and is awaiting admin approval on an approval-gated tailnet, a transient and recoverable state. It was being mapped to the generic Internal MachineAuthorization error, which the control runner check_auth loop caught in its catch-all terminal arm and turned into a permanent Failed AuthRejected that stopped the runner for good. On an approval-gated tailnet a freshly-registered node whose key control had not yet approved would die where Go comes up automatically once the admin approves. Surface the no-URL await-approval case as a distinct typed error and poll for approval, matching Go's NeedsMachineAuth to Starting auto transition with no re-registration. - ts_control Error gains a NeedsMachineAuth variant, distinct from Internal MachineAuthorization and from the URL-carrying MachineNotAuthorized. register.rs maps MachineNotAuthorized None to it. - ts_runtime DeviceState gains a transient NeedsMachineAuth variant treated like Connecting by wait_for_running and carrying no browse_to_url. The From conversion maps the control error to a non-permanent NetworkUnreachable. - The check_auth loop and the connect bring-up loop both gain a transient NeedsMachineAuth arm that publishes the state, polls 5s, and retries instead of stopping. A genuine Rejected bad-key still routes to the terminal Failed arm, so real auth failures are not turned into infinite polls. Signed-off-by: Sergio Fernández <9169332+GeiserX@users.noreply.github.com>
|
Warning Review limit reached
More reviews will be available in 17 minutes and 9 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Problem (P1 interop break)
On an approval-gated tailnet, a freshly-registered node whose key control has not yet approved — and for which control offers no interactive URL — was permanently killed. Registration returned
MachineNotAuthorized(None), which mapped to a genericError::Internal(MachineAuthorization, _), fell into the control runner's catch-all error arm, and published terminalDeviceState::Failed(AuthRejected)+ stopped the runner.Go's
nextStateLockedinstead returnsipn.State::NeedsMachineAuthhere — a transient, recoverable state: the node holds a valid key, waits for admin approval, then auto-transitions to Running with no re-registration when approved. The fork died where Go comes up automatically.Fix
Treat no-URL machine-auth as a transient poll-for-approval, matching Go:
ts_control::Error::NeedsMachineAuth, distinct from the genericInternal(MachineAuthorization)and from the URL-carryingMachineNotAuthorized(url).register.rs'sFrom<RegistrationError>now mapsMachineNotAuthorized(None)to it. The classifier is unchanged — it already separates a hardRejected(msg)(bad key) from the await-approvalMachineNotAuthorized(None).DeviceState::NeedsMachineAuth(no field — there is no URL; that is the distinction fromNeedsLogin).wait_for_runningtreats it likeConnecting/Reauthenticating(keep waiting), and the IPN bus derives nobrowse_to_urlfor it.check_authloop and theconnectbring-up loop now publishNeedsMachineAuth, sleep, and poll (retry) — neverFailed/stop. When the admin approves, the next attempt returnsOkand the node comes up with no re-registration.Rejected(msg)(bad/expired/unknown key) maps toError::Registration(notNeedsMachineAuth) and still hits the terminalFailedarm — pinned by a regression test.Tests
FrommapsMachineNotAuthorized(None)toNeedsMachineAuth(and explicitly not the generic Internal); the URL variant is unchanged; a hardRejectedstays terminal (regression).wait_for_runningdoes not settle onNeedsMachineAuth; the fullNeedsMachineAuth -> Runningapproval recovery resolvesOk.Verification
Needs a live approval-gated tailnet (headscale) for the full e2e; the state-machine + mapping logic is unit-tested. Gates (local, all green):
cargo test -p geiserx_ts_control -p geiserx_ts_runtime(247 + 349) · clippy-D warnings(default +tunlane) ·cargo fmt --check·cargo doc(broken_intra_doc_links=deny) ·cargo run -p checks.Sibling of the auto-reauth fix (key-expiry recovery) — same control-runner auth-state machine, the other half of "recover instead of dying" parity.
Signed-off-by: Sergio sergio@geiser.cloud
Created using Claude Code (Opus 4.8)