Add static stability to credential refresh#3733
Conversation
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/credential-refresh #3733 +/- ##
=============================================================
Coverage ? 92.40%
=============================================================
Files ? 68
Lines ? 15883
Branches ? 0
=============================================================
Hits ? 14677
Misses ? 1206
Partials ? 0 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
|
||
| def _refresh(self): | ||
| if DEFAULT_NEW_CREDENTIAL_REFRESH: | ||
| return self._refresh_with_backoff() |
There was a problem hiding this comment.
[nit, non-blocking]
This is a bit confusing to me since we have two return statements that both will always return None. The return keywords on this line and the next don't do anything, but so it's a stylistic choice.
Just removing them would cause a double refresh, but I'd find this more readable as an if/else statement:
if DEFAULT_NEW_CREDENTIAL_REFRESH:
self._refresh_with_backoff()
else:
self._refresh_legacy()
It's not a functional issue, so if you disagree and you like this pattern better, just resolve this comment.
| # In the common case where we don't need a refresh, we | ||
| # can immediately exit and not require acquiring the | ||
| # refresh lock. | ||
| if not self.refresh_needed(self._advisory_refresh_timeout): |
There was a problem hiding this comment.
[non-actionable, just here for posterity]
The PR description mentions that the entire legacy path will be removed. If we would allow an opt-out instead and allow customers to control this behavior, I'd rather see us not violate DRY here and find a way to merge this new codepath with the legacy.
Since the entire legacy path will be deleted (or this new one will), IMO it's alright for us to keep these code paths a little bit repetitive. It keeps one clean, centralized gate for the new behavior.
| provider=self.method, error_msg=str(error) | ||
| ) | ||
| backoff = self._enter_refresh_backoff() | ||
| logger.warning( |
There was a problem hiding this comment.
[Discussion]
This is good for a single call and I'm glad we are warning here. Since we may still make repeated calls for up to ten minutes, it may make sense for us to add a per-call debug log as well. This way we link each call to the failed refresh and someone won't have to scroll through up to ten minutes of logs to figure out why they got a 403.
What do you think?
| return None | ||
|
|
||
| def _handle_refresh_failure(self, error): | ||
| if self._frozen_credentials is None: |
There was a problem hiding this comment.
[question, maybe blocking?]
This code path should be dead since we never set it to None in the __init__ method. Shouldn't this be on the DeferredRefreshableCredentials class, or am I missing something?
Issue #, if available:
Python-8911
Overview
This PR adds static stability to credential refresh in
RefreshableCredentials. When a credential source fails during refresh (outage, exception, missing keys, or expired response), the SDK now falls back to the most recently cached credentials and enters a jittered backoff window (5-10 minutes) instead of raising or hammering the source. If no cached credentials exist yet (e.g.DeferredRefreshableCredentialson first fetch), the error is still surfaced.The new behavior is gated behind
DEFAULT_NEW_CREDENTIAL_REFRESH, a flag that defaults toFalsein the public distribution. External users are unaffected until GA, at which point the flag and legacy path will be removed.Implementation
The new behavior lives in
_refresh_with_backoffand its helpers. The legacy path (_protected_refresh,_set_from_data) is unchanged and kept under_refresh_legacy. A flag check in_refreshroutes between the two. Credentials are validated before being cached so we never store missing or expired data.Testing
Following the same pattern as test_standard_retry_v2_1.py from the retries internal release.
The file contains full copies of
TestRefreshableCredentialsandTestRefreshLogicwith the flag forced on. Tests that change behavior have updated assertions (refresh failures return cached creds instead of raising). Unchanged tests are included so the full class can be swapped in at GA without cherry-picking individual methods.The standalone pytest functions at the bottom cover backoff-specific behavior that does not exist in the legacy path (retry timing, not re-hitting the source immediately after failure, first-fetch failure raising).
At GA this file gets folded into
test_credentials.pyand removed.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.