Skip to content

fix(auth): coordinate credential refresh across processes and guard logins#235

Merged
samcm merged 3 commits into
masterfrom
worktree-auth-refresh-filelock
Jun 19, 2026
Merged

fix(auth): coordinate credential refresh across processes and guard logins#235
samcm merged 3 commits into
masterfrom
worktree-auth-refresh-filelock

Conversation

@samcm

@samcm samcm commented Jun 19, 2026

Copy link
Copy Markdown
Member

Several panda servers on one host share a single credentials file (the path is keyed only by issuer+client+resource), and concurrent background refreshes against a rotating provider revoke each other's refresh tokens, producing invalid_grant storms.

  • store.refresh takes a non-blocking file lock (flock) so one process drives each rotation; the others reload and reuse its token instead of replaying a revoked one.
  • Save/Clear take the same lock (waiting up to WriteLockWait, failing closed with ErrCredentialBusy under contention) so logins/logouts serialize with refreshes.
  • Save refuses to replace a refreshable credential with one that has no refresh token (ErrCredentialDowngrade), failing closed on read errors.
  • Credentials are written atomically (temp file + sync + rename).
  • Refresh, rotation, contention and invalid_grant events are logged, and login warns when no refresh token is issued.

…ogins

Multiple panda servers on one host share a single credentials file (the
path is keyed only by issuer+client+resource). Each runs a background
refresh, and because the provider rotates refresh tokens (issuing a new
one and revoking the old), concurrent refreshes revoke each other and
produce invalid_grant storms. Separately, a login that returns no refresh
token (e.g. an outdated device flow without offline_access) silently
overwrote a working credential and the restarted server then expired at
the next access-token expiry.

- store.refresh now coordinates across processes with a non-blocking
  flock: the winner reloads and refreshes; losers reuse the on-disk token
  and never replay a revoked one.
- Save/Clear take the lock too (blocking up to WriteLockWait, fail-closed
  with ErrCredentialBusy on contention) so logins/logouts serialize with
  refreshes.
- Save refuses to replace a refreshable credential with one that cannot
  refresh (ErrCredentialDowngrade), failing closed on read errors.
- Credentials are written atomically (temp + sync + rename).
- Added structured logging for refresh/rotation/contention/invalid_grant
  and a login warning when no refresh token is issued.
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

🐼 Smoke eval — 6748b52: ✅ 6/6 pass

📊 Interactive report — tokens p50 16,073 · tokens/solve 15,949.

Reference points: master@57f865d 100% · worktree-auth-refresh-filelock@5ded2fc 100% · worktree-auth-refresh-filelock@e664b97 100%.

question result tokens tools
forky_node_coverage 14,876 6
tracoor_node_coverage 19,684 4
mainnet_block_arrival_p50 17,270 13
list_datasources 11,496 1
block_count_24h 17,580 16
missed_slots_24h 14,789 6
🔭 Langfuse traces (6 runs; ⚠️ = failed)

The report walks this branch's commits against the master baseline and the most recent release. A self-contained copy is in the run's eval-smoke-* artifact.

samcm added 2 commits June 19, 2026 14:20
…tail in status

The 'upgrade panda' advice on a missing refresh token was wrong: this
version already requests offline_access, so a missing refresh token is
the provider not granting it. Reword the login warning and downgrade
error accordingly.

panda auth status now reports whether a refresh token is present and when
it was last rotated, and a new --verify flag actively tests the refresh
token against the provider (rotating it).
…meout

A refresh holds the file lock across its token request (capped at 30s by
the auth client), so a 5s wait made login/logout fail spuriously during a
slow refresh. Wait 35s so an interactive write rides through a slow-but-
valid refresh and only fails when one is genuinely stuck.
@samcm samcm merged commit 3d7dfaa into master Jun 19, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant