Skip to content

Add static stability to credential refresh#3733

Open
alexgromero wants to merge 1 commit into
boto:feature/credential-refreshfrom
alexgromero:cred-refresh/static-stability
Open

Add static stability to credential refresh#3733
alexgromero wants to merge 1 commit into
boto:feature/credential-refreshfrom
alexgromero:cred-refresh/static-stability

Conversation

@alexgromero

Copy link
Copy Markdown
Contributor

Issue #, if available:
Python-8911

Overview

This PR adds static stability to credential refresh in RefreshableCredentials. When a credential source fails during refresh (outage, exception, missing keys, or expired response), the SDK now falls back to the most recently cached credentials and enters a jittered backoff window (5-10 minutes) instead of raising or hammering the source. If no cached credentials exist yet (e.g. DeferredRefreshableCredentials on first fetch), the error is still surfaced.

The new behavior is gated behind DEFAULT_NEW_CREDENTIAL_REFRESH, a flag that defaults to False in the public distribution. External users are unaffected until GA, at which point the flag and legacy path will be removed.

Implementation

The new behavior lives in _refresh_with_backoff and its helpers. The legacy path (_protected_refresh, _set_from_data) is unchanged and kept under _refresh_legacy. A flag check in _refresh routes between the two. Credentials are validated before being cached so we never store missing or expired data.

Testing

Following the same pattern as test_standard_retry_v2_1.py from the retries internal release.

The file contains full copies of TestRefreshableCredentials and TestRefreshLogic with the flag forced on. Tests that change behavior have updated assertions (refresh failures return cached creds instead of raising). Unchanged tests are included so the full class can be swapped in at GA without cherry-picking individual methods.

The standalone pytest functions at the bottom cover backoff-specific behavior that does not exist in the legacy path (retry timing, not re-hitting the source immediately after failure, first-fetch failure raising).

At GA this file gets folded into test_credentials.py and removed.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov-commenter

codecov-commenter commented Jun 23, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 90.66667% with 7 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (feature/credential-refresh@fb482b1). Learn more about missing BASE report.

Files with missing lines Patch % Lines
botocore/credentials.py 90.66% 7 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@                      Coverage Diff                      @@
##             feature/credential-refresh    #3733   +/-   ##
=============================================================
  Coverage                              ?   92.40%           
=============================================================
  Files                                 ?       68           
  Lines                                 ?    15883           
  Branches                              ?        0           
=============================================================
  Hits                                  ?    14677           
  Misses                                ?     1206           
  Partials                              ?        0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread botocore/credentials.py

def _refresh(self):
if DEFAULT_NEW_CREDENTIAL_REFRESH:
return self._refresh_with_backoff()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit, non-blocking]

This is a bit confusing to me since we have two return statements that both will always return None. The return keywords on this line and the next don't do anything, but so it's a stylistic choice.

Just removing them would cause a double refresh, but I'd find this more readable as an if/else statement:

    if DEFAULT_NEW_CREDENTIAL_REFRESH:
        self._refresh_with_backoff()
    else:
        self._refresh_legacy()

It's not a functional issue, so if you disagree and you like this pattern better, just resolve this comment.

Comment thread botocore/credentials.py
# In the common case where we don't need a refresh, we
# can immediately exit and not require acquiring the
# refresh lock.
if not self.refresh_needed(self._advisory_refresh_timeout):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[non-actionable, just here for posterity]

The PR description mentions that the entire legacy path will be removed. If we would allow an opt-out instead and allow customers to control this behavior, I'd rather see us not violate DRY here and find a way to merge this new codepath with the legacy.

Since the entire legacy path will be deleted (or this new one will), IMO it's alright for us to keep these code paths a little bit repetitive. It keeps one clean, centralized gate for the new behavior.

Comment thread botocore/credentials.py
provider=self.method, error_msg=str(error)
)
backoff = self._enter_refresh_backoff()
logger.warning(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Discussion]

This is good for a single call and I'm glad we are warning here. Since we may still make repeated calls for up to ten minutes, it may make sense for us to add a per-call debug log as well. This way we link each call to the failed refresh and someone won't have to scroll through up to ten minutes of logs to figure out why they got a 403.

What do you think?

Comment thread botocore/credentials.py
return None

def _handle_refresh_failure(self, error):
if self._frozen_credentials is None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question, maybe blocking?]

This code path should be dead since we never set it to None in the __init__ method. Shouldn't this be on the DeferredRefreshableCredentials class, or am I missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants