fix(auth): coordinate credential refresh across processes and guard logins#235
Merged
Conversation
…ogins Multiple panda servers on one host share a single credentials file (the path is keyed only by issuer+client+resource). Each runs a background refresh, and because the provider rotates refresh tokens (issuing a new one and revoking the old), concurrent refreshes revoke each other and produce invalid_grant storms. Separately, a login that returns no refresh token (e.g. an outdated device flow without offline_access) silently overwrote a working credential and the restarted server then expired at the next access-token expiry. - store.refresh now coordinates across processes with a non-blocking flock: the winner reloads and refreshes; losers reuse the on-disk token and never replay a revoked one. - Save/Clear take the lock too (blocking up to WriteLockWait, fail-closed with ErrCredentialBusy on contention) so logins/logouts serialize with refreshes. - Save refuses to replace a refreshable credential with one that cannot refresh (ErrCredentialDowngrade), failing closed on read errors. - Credentials are written atomically (temp + sync + rename). - Added structured logging for refresh/rotation/contention/invalid_grant and a login warning when no refresh token is issued.
Contributor
🐼 Smoke eval —
|
| question | result | tokens | tools |
|---|---|---|---|
forky_node_coverage |
✅ | 14,876 | 6 |
tracoor_node_coverage |
✅ | 19,684 | 4 |
mainnet_block_arrival_p50 |
✅ | 17,270 | 13 |
list_datasources |
✅ | 11,496 | 1 |
block_count_24h |
✅ | 17,580 | 16 |
missed_slots_24h |
✅ | 14,789 | 6 |
🔭 Langfuse traces (6 runs; ⚠️ = failed)
The report walks this branch's commits against the master baseline and the most recent release. A self-contained copy is in the run's eval-smoke-* artifact.
…tail in status The 'upgrade panda' advice on a missing refresh token was wrong: this version already requests offline_access, so a missing refresh token is the provider not granting it. Reword the login warning and downgrade error accordingly. panda auth status now reports whether a refresh token is present and when it was last rotated, and a new --verify flag actively tests the refresh token against the provider (rotating it).
…meout A refresh holds the file lock across its token request (capped at 30s by the auth client), so a 5s wait made login/logout fail spuriously during a slow refresh. Wait 35s so an interactive write rides through a slow-but- valid refresh and only fails when one is genuinely stuck.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Several panda servers on one host share a single credentials file (the path is keyed only by issuer+client+resource), and concurrent background refreshes against a rotating provider revoke each other's refresh tokens, producing
invalid_grantstorms.store.refreshtakes a non-blocking file lock (flock) so one process drives each rotation; the others reload and reuse its token instead of replaying a revoked one.Save/Cleartake the same lock (waiting up toWriteLockWait, failing closed withErrCredentialBusyunder contention) so logins/logouts serialize with refreshes.Saverefuses to replace a refreshable credential with one that has no refresh token (ErrCredentialDowngrade), failing closed on read errors.invalid_grantevents are logged, and login warns when no refresh token is issued.