fix(coderd/externalauth): detect concurrent refresh race to prevent cache poisoning by jasonwbarnett · Pull Request #24228 · coder/coder

jasonwbarnett · 2026-04-09T21:28:09Z

Builds on #24332 and #24334 which addressed token persistence and rate limit handling.

Problem

When multiple concurrent requests race to refresh an expiring external auth token, providers with single-use refresh tokens (e.g., GitHub Apps) reject all but the first refresh attempt with bad_refresh_token. The losing request caches this transient error in the oauth_refresh_failure_reason database column and clears the refresh token, blocking all subsequent refresh attempts until the user manually re-authenticates.

This is common for users with multiple terminals, IDE connections, or workspaces open, all of which poll the external auth endpoint and trigger concurrent refreshes when the token nears expiry. Database analysis showed 5 of 7 affected users failed within 5-10 seconds of token expiry, matching the Go oauth2 library's expiryDelta window.

Fix

Before caching a bad_refresh_token failure, re-read the external auth link from the database. If the refresh token has changed (indicating a concurrent caller already refreshed successfully), return the winner's updated link instead of writing a failure. An empty-string guard ensures a token cleared by another loser isn't mistaken for a winner's successful refresh.

github-actions · 2026-04-09T21:28:24Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

jasonwbarnett · 2026-04-10T19:33:21Z

We built a custom release binary from a fork of v2.31.9 with these two fix commits cherry-picked on top and deployed it to our production environment. Release: https://github.com/altana-ai/coder/releases/tag/v2.31.9-fix

We reproduced the race condition using the same method as before (setting oauth_expiry = NOW() + interval '5 seconds' and firing 10 concurrent requests). With the fix applied, the database remains clean after the race — no cached bad_refresh_token failure, and the refresh token is preserved. Before the fix, the same test consistently poisoned the database.

	Before fix	After fix
`oauth_refresh_failure_reason`	`bad_refresh_token` (poisoned)	empty (clean)
`refresh_token_cleared`	`true` (lost forever)	`false` (preserved)
Token usable after race	No — stuck until manual re-auth	Yes — works immediately

geokat

Great PR - thank you for submitting!

…rom request context Address review feedback from geokat on coder#24228: 1. Add empty-string check to concurrent refresh detection: a refresh token cleared by another loser (empty string) should not be treated as a successful concurrent refresh by a winner. 2. Use context.WithoutCancel for post-refresh work: once a single-use refresh token has been consumed, the validation and DB persistence must complete even if the caller's HTTP request context is cancelled. Use a 10-second timeout detached from the parent context. 3. Persist the new token to the database BEFORE validation: the refresh token has already been consumed by the provider, so if validation fails (network error, rate limit, context cancellation), the new token would be lost forever. Persist first, then validate. Refs coder#17069 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

jasonwbarnett · 2026-04-15T22:38:06Z

Cut a new release (v2.31.9-fix2) incorporating the review feedback and shipped it to production. This includes the empty-string guard, context.WithoutCancel, and persist-before-validate changes. Will report back if there are any issues.

gorangasic · 2026-04-22T18:23:59Z

Hi @jasonwbarnett, we're also seeing intermittent GitHub auth failures at my workplace and your PR looks quite promising. I see that you've already rebased on top of #24332. Do you think it's ready for another review?

@geokat would you be the best person to review it? Or should we pull in @johnstcn and @hugodutka?

jasonwbarnett · 2026-04-22T22:31:54Z

Hi @jasonwbarnett, we're also seeing intermittent GitHub auth failures at my workplace and your PR looks quite promising. I see that you've already rebased on top of #24332. Do you think it's ready for another review?

@gorangasic Yes, I think it's ready for another review.

I'm not sure if they want to merge #24334 first or not; it's still in draft.

mafredri · 2026-04-28T15:10:16Z

Hey @jasonwbarnett, really appreciate the contribution and sorry about your PR getting stuck in limbo (I independently discovered this issue because it was starting to block my work). Both #24332 and #24334 are now merged, but the concurrent race detection in your PR was a nice find and not covered by mine. Are you still up for working on this and rebasing on main? Happy to take a look if you have get the chance. 👍🏻

jasonwbarnett · 2026-04-29T12:14:53Z

@mafredri rebased and pushed. Let me know if there are any issues. Great work in #24332 and #24334 🎉

mafredri

We should revert the ctx change, but otherwise LGTM! Thanks!

…ache poisoning When multiple concurrent requests race to refresh an expiring external auth token, providers with single-use refresh tokens (e.g., GitHub Apps) reject all but the first refresh attempt with "bad_refresh_token". The losing request was caching this transient error in the database, which cleared the refresh token and blocked all subsequent refresh attempts until the user manually re-authenticated. Before caching a refresh failure, re-read the external auth link from the database. If the refresh token has changed (indicating a concurrent caller already refreshed successfully), return the winner's updated token instead of writing a failure. An empty-string guard ensures a token cleared by another loser is not mistaken for a winner's successful refresh. Fixes coder#17069 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

jasonwbarnett · 2026-04-29T13:01:41Z

@mafredri let's go! 🚀

mafredri

🎉

github-actions Bot added the community Pull Requests and issues created by the community. label Apr 9, 2026

github-actions Bot assigned jasonwbarnett Apr 9, 2026

jasonwbarnett mentioned this pull request Apr 9, 2026

bug: github external auth intermittently fails to refresh token #17069

Closed

1 task

jasonwbarnett force-pushed the fix/external-auth-refresh-race branch from 8845515 to f89a424 Compare April 9, 2026 21:33

jasonwbarnett marked this pull request as ready for review April 9, 2026 21:34

geokat reviewed Apr 13, 2026

View reviewed changes

Comment thread coderd/externalauth/externalauth.go

Comment thread coderd/externalauth/externalauth.go Outdated

jasonwbarnett force-pushed the fix/external-auth-refresh-race branch from 94cb3cf to f984ef3 Compare April 15, 2026 22:23

jasonwbarnett requested a review from geokat April 15, 2026 22:25

geokat mentioned this pull request Apr 15, 2026

fix(coderd/externalauth): save refreshed token before validation #24332

Merged

jasonwbarnett force-pushed the fix/external-auth-refresh-race branch from f984ef3 to f7a1de2 Compare April 20, 2026 22:31

jasonwbarnett force-pushed the fix/external-auth-refresh-race branch from f7a1de2 to 7789460 Compare April 29, 2026 12:13

jasonwbarnett changed the title ~~fix(coderd/externalauth): prevent concurrent token refresh from poisoning cache~~ fix(coderd/externalauth): detect concurrent refresh race to prevent cache poisoning Apr 29, 2026

mafredri reviewed Apr 29, 2026

View reviewed changes

Comment thread coderd/externalauth/externalauth.go Outdated

jasonwbarnett force-pushed the fix/external-auth-refresh-race branch from 2723885 to e3e87b5 Compare April 29, 2026 13:01

mafredri approved these changes Apr 29, 2026

View reviewed changes

f0ssel added the cherry-pick label May 4, 2026

Merge branch 'main' into fix/external-auth-refresh-race

17e2956

f0ssel merged commit da6e708 into coder:main May 4, 2026
26 checks passed

github-actions Bot locked and limited conversation to collaborators May 4, 2026

f0ssel added the backport label May 19, 2026

Conversation

jasonwbarnett commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Uh oh!

github-actions Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonwbarnett commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geokat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jasonwbarnett commented Apr 15, 2026

Uh oh!

gorangasic commented Apr 22, 2026

Uh oh!

jasonwbarnett commented Apr 22, 2026

Uh oh!

mafredri commented Apr 28, 2026

Uh oh!

jasonwbarnett commented Apr 29, 2026

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jasonwbarnett commented Apr 29, 2026

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jasonwbarnett commented Apr 9, 2026 •

edited

Loading

github-actions Bot commented Apr 9, 2026 •

edited

Loading

jasonwbarnett commented Apr 10, 2026 •

edited

Loading