Skip to content

Let state machine handle CheckAuthorizationStatus#7379

Open
skhamis wants to merge 2 commits into
mozilla:mainfrom
skhamis:fix-fenix-pw-notif
Open

Let state machine handle CheckAuthorizationStatus#7379
skhamis wants to merge 2 commits into
mozilla:mainfrom
skhamis:fix-fenix-pw-notif

Conversation

@skhamis
Copy link
Copy Markdown
Contributor

@skhamis skhamis commented May 19, 2026

When I was adding password change notification in Fenix via: https://phabricator.services.mozilla.com/D301414

I noticed in app-services we were potentially dropping push events on transient errors. The old version used ? on the introspect call, so I think any network/server issues propagated as Err out of handle_push_message. The Kotlin side's handleFxaExceptions would log it and ignore. So no event reached the FSM, no UI change leading to -> https://bugzilla.mozilla.org/show_bug.cgi?id=1915765

So we should lean into the state machine and let it be the source of truth imo. The state machine can decide whether the token is valid/retry/etc instead of push.

Pull Request checklist

  • Breaking changes: This PR follows our breaking change policy
    • This PR follows the breaking change policy:
      • This PR has no breaking API changes, or
      • There are corresponding PRs for our consumer applications that resolve the breaking changes and have been approved
  • Quality: This PR builds and tests run cleanly
    • Note:
      • For changes that need extra cross-platform testing, consider adding [ci full] to the PR title.
      • If this pull request includes a breaking change, consider cutting a new release after merging.
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry in CHANGELOG.md or an explanation of why it does not need one
    • Any breaking changes to Swift or Kotlin binding APIs are noted explicitly
  • Dependencies: This PR follows our dependency management guidelines
    • Any new dependencies are accompanied by a summary of the due diligence applied in selecting them.

@skhamis skhamis requested review from bendk and mhammond May 19, 2026 21:55
account.disconnect();
Ok(S::Disconnected)
}
(S::AuthIssues, FxaEvent::CheckAuthorizationStatus) => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea in general, however I'm not sure about this - in most cases when we get one of those push notifications we are in a "good" state, and the event is to tell us that we should almost certainly move to a "bad" state. Unlike the web channel, the push message never gives us the new tokens - the password needs to be re-entered.

In fact, once we are in an AuthIssues state I'm not sure we'd be able to get the push message anyway?

I think I'm saying that the above changes are correct, but this isn't needed (but also wont hurt), because we seems to already correctly handle this message in the (S::Connected, FxaEvent::CheckAuthorizationStatus) transition. What does seem to be missing though is S::Authenticating, which would be unusual but possible, especially given https://bugzilla.mozilla.org/show_bug.cgi?id=2030890.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a little more - a downside of this approach is that an error checking the auth status means the account would be likely remain connected. For that push notification specifically, that would be a bad outcome - the default there should be to move to AuthIssues. I'm not sure about the (S::Connected, FxaEvent::CheckAuthorizationStatus) case in general though - should that assume a good or bad default outcome?

IOW, I wonder if the existing code but not propagating the error and instead assuming AuthIssues is the way to go? Or maybe stick with this but have (S::Connected, FxaEvent::CheckAuthorizationStatus) assume the worst case? I'm not sure what existing cases trigger CheckAuthorizationStatus - if it's used for things like access tokens expiring it would be bad.

Really not sure - wdyt?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about a couple things here, but maybe that's just because I'm missing the bigger picture.

What does seem to be missing though is S::Authenticating, which would be unusual but possible

S::Authenticating mean we're in the middle of an oauth flow, IIRC. If that's true it seems like we should stay in that state when a password changes and in general we should stay there regardless of if our auth tokens are valid or not.

Thinking a little more - a downside of this approach is that an error checking the auth status means the account would be likely remain connected. For that push notification specifically, that would be a bad outcome - the default there should be to move to AuthIssues

I'm not sure we'd remain connected, since the code block below will only transition to AuthIssues unless FxA tells us our auth token is valid. That shouldn't happen in the password change scenario. That said, maybe this is a wasted check and we should just unconditionally move to AuthIssues when we see a password change/reset.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S::Authenticating mean we're in the middle of an oauth flow, IIRC. If that's true it seems like we should stay in that state when a password changes and in general we should stay there regardless of if our auth tokens are valid or not.

I guess so, yeah - although if they really are authenticating it's probably going to end in tears. If the reason they are in that state is bug 2030890, then staying in Authenticating seems fine, but only because they aren't really authenticating.

I'm not sure we'd remain connected, since the code block below will only transition to AuthIssues unless

oops, yes, I misread that. It is still "odd" that a network error is going to force AuthIssues, but that's mitigated by the fact that AuthIssues is where we are going to end up anyway for this scenario. It's more of a concern in the general case - if we are checking the auth state when we do not have a signal that the account is in a bad state, a network error putting the account into a bad state is quite bad.

That said, maybe this is a wasted check and we should just unconditionally move to AuthIssues when we see a password change/reset.

Yeah - which is roughly the old code, just without that check. However, desktop does check the status after the message, so that would likely suffer from the same problem.

I'm still a little torn about how to move ahead here though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm I've been going back and forth on this a little. I think kinda landed on:

In handle_push_message, clear the refresh token when we get PasswordChanged/PasswordReset. Since the FxA (server) has already invalidated it we just force the local state to be that? Then I guess we don't need a new FxaEvent at all. The existing (Connected, CheckAuthorizationStatus) flow that clients already trigger after the push lands at AuthIssues via Err(NoRefreshToken). That way we kinda sidestep this whole "where do we go", kinda towards what ben was saying about just moving to AuthIssues when we see a pw change/reset.

Copy link
Copy Markdown
Member

@mhammond mhammond May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure how I feel about skipping the account check entirely though - eg, let's imagine the push notification was delayed - a real possibility then is (a) user changes password elsewhere, (b) this device tries to sync, sees the 401, moves to AuthIssues, (c) user resolves by re-entering their password, (d) delayed push notification arrives, we kick the account back out.

It seems like assuming a worst-case if the check fails is justifiable, not sure skipping the check entirely is, especially given desktop has always done it

Copy link
Copy Markdown
Contributor Author

@skhamis skhamis May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree we should not skip the check for all scenarios, sorry to untangle this in my brain a lil:

  1. Remote password change: Push -> introspect (we got an old token) -> AuthIssues.
  2. Local pw change -> FxA gives us the new token already -> Connected.
  3. Network error during introspect: Straight to AuthIssues (not much we can do really about this)
  4. (your scenario) delayed push, user has already re-entered pw: Push notif -> introspect (we have the new token) -> stays Connected

Is this the mental modal ya'll are thinking about this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stepping back a little, the question we are asking is "how do we handle the introspect failing?".

IIUC, currently:

  1. When getting those push messages, we pass that error back up the stack. This seems clearly bad.
  2. In the current code, on getting a CheckAuthorizationStatus event while connected handles this better - it assumes a "NeedsReauth" state.
  3. Your new code would work the same - however as I mentioned, I don't think your new code actually runs in that scenario, as the most likely state to be in when we get that push is "Connected" and that code already exists - but I do agree your new code should exist for the general case (ie, that CheckAuthorizationStatus while already in a "needs reauth" state seems like it should be fine and might even get us out of the bad state?)

I ended up going on a tangent about (2) suggesting that assuming "needs reauth" in that failure might be the wrong thing to do, because if it is being called without a clear signal that the account might be bad, entering a needs reauth state would be quite a bad outcome. But that's unrelated to your patch, which does have a clear signal the state is probably bad. So I'm back to thinking your patch is perfectly fine as it stands 😅 I still kinda think my thoughts on (2) are correct, but we don't need to fix that here, and it might turn out that after more investigation we will determine I'm simply wrong - that in almost all cases assuming reauth would be correct anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh - also, I only just noticed that not having a refresh token will cause check_authorization_status() to return Err(Error::NoRefreshToken) - so I expect that CheckAuthorizationStatus while in a "needs reauth" state can't possibly work - we've dropped the token at that point, right? So maybe that new state you added doesn't make sense? Or maybe check_authorization_status() should just return false in that case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah no thanks I think i was getting my lines crossed there, however based on the conversations it did give me an idea to slightly refactor a lil. I made the webchannel into its own HandleWebChannelPasswordChange event (which is the only thing that can actually recover from AuthIssues since it has the new session token), left the push going through the existing CheckAuthorizationStatus, and dropped the (AuthIssues, CheckAuthorizationStatus) arm entirely (your NoRefreshToken catch made it clear it couldn't do anything useful). This seems to cover both situations imo and feels like it simplifies things on the fenix side, which i'd prefer the rust side doing as much as possible.

Copy link
Copy Markdown
Member

@mhammond mhammond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lgtm, thanks - am a little concerned about iOS now re-registering twice though?

}
// A WebChannel password change while an OAuth flow is in progress
// is a no-op; let the flow finish.
(s @ S::Authenticating { .. }, FxaEvent::HandleWebChannelPasswordChange { .. }) => Ok(s),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's still "unexpected" though - so maybe log a warning? I agree doing nothing is correct though.

.to_state_machine_err(|| S::AuthIssues)?;
// Token swap succeeded; auth is valid.
let dc = account.device_config().clone();
if let Err(e) = account.initialize_device(&dc.name, dc.device_type, &dc.capabilities) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense, but doesn't this mean ios will be doing the re-registration twice?

(also slightly confused why the oauth code explicitly fetches the current device config at https://github.com/mozilla/application-services/blob/main/components/fxa-client/src/internal/oauth.rs#L382 before re-registering it, when it seems like what you do here is easier and less error prone?)

.handle_web_channel_password_change(&json_payload)
.to_state_machine_err(|| S::AuthIssues)?;
// Token swap succeeded; auth is valid.
let dc = account.device_config().clone();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this clone shouldn't be necessary?

///
/// This event is valid for the `Connected` and `AuthIssues` states. In `Authenticating` it
/// is a no-op so the in-progress OAuth flow is not disrupted.
HandleWebChannelPasswordChange { json_payload: String },
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total nit, but it seems the "Handle" part doesn't make sense here, even though it kinda made sense on the function.

And I like the idea of this becoming an event, but think handle_web_channel_login should also become an event for consistency, then the functions can be removed in a followup?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants