Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block refetching when reauthenticating after an error #23

Closed
wants to merge 1 commit into from

Conversation

erquhart
Copy link

@erquhart erquhart commented Jan 22, 2025

Bug

The websocket connection can quietly drop without recovery. This happens often when a native app is brought from background after token has expired, but can happen outside of this context as well.

Walkthrough

  1. Once token authenticate succeeds, a refetch is scheduled for just before the new token expires.
  2. Around the time of the scheduled refetch, the user initiates a query or mutation with the near-expired token.
  3. The server finds the token to be expired and returns an auth error.
  4. Reauthentication is attempted, which stops the websocket connection and fetches a new token.
  5. In a race condition, the scheduled refetch runs at this point, rendering the still-running reauthentication attempt outdated.
  6. The reauthentication attempt returns early due to being outdated, never restarts the connection (which would happen here).
  7. The scheduled refetch resolves, but no authentication call can be made as the connection is stopped.
  8. The scheduled refetch ends with a socket resume attempt, which does not work on stopped connections.

The socket remains closed. Authentication state is unchanged, client appears authenticated to the user.

Root cause

tryToReauthenticate() only runs in response to auth errors, refetchToken() is general purpose - for initialization and preemptive reauth. tryToReauthenticate() stops the connection and restarts at the end, refetchToken() does not. Both functions wait for token refetch and will bail if another token refetch (from either function) has started during the wait. If refetchToken() runs while tryToReauthenticate() is fetching, this bug occurs, as tryToReauthenticate() has stopped the connection and will bail before restarting, and refetchToken() does not restart the connection (nor should it).

This solution

This solution blocks any calls to refetchToken() from running if tryToReauthenticate() has started and has not yet succeeded.

Other solutions considered

We could schedule refetches sooner to help close the race condition gap here, but there's still no guarantee it won't occur. The chosen solution seemed closer to root cause.

@erquhart erquhart force-pushed the fix-dropped-connections branch 2 times, most recently from f3fb710 to ef6428a Compare January 23, 2025 01:35
@erquhart erquhart marked this pull request as draft January 23, 2025 01:37
@erquhart erquhart force-pushed the fix-dropped-connections branch from ef6428a to 0cb7372 Compare January 23, 2025 01:38
@erquhart erquhart changed the title replace single remaining connection stop/restart with pause/resume block refetching when reauthenticating after an error Jan 23, 2025
@erquhart
Copy link
Author

Each solution attempt here has complicated the auth/connect state machines and introduced new issues. I've also found a historical root cause that may inform the path forward. Moved findings to #22 for futher discussion.

@erquhart erquhart closed this Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant