Skip to content

Break poll loop out into subroutine guaranteed to quit before producer stop#941

Merged
brandur merged 1 commit into
masterfrom
brandur-fix-fetch-poll-loop
Jun 4, 2025
Merged

Break poll loop out into subroutine guaranteed to quit before producer stop#941
brandur merged 1 commit into
masterfrom
brandur-fix-fetch-poll-loop

Conversation

@brandur

@brandur brandur commented Jun 4, 2025

Copy link
Copy Markdown
Contributor

This one's aimed at fixing the race detector failure as observed in [1].
Honestly, I couldn't repro this locally even at 1,000 iteration, but
reading the code, I'm pretty sure I know what the problem is.

The fetchAndRunLoop would start a separate goroutine that'd use a
timer to trigger a poll at FetchPollInterval so that fetches are
occasionally performed even if listen/notify missed something. This
inner goroutine would stop on context cancellation, but it wasn't
guaranteed to return before a stop finished. Therefore, as the stress
test called Start again, fetchLimiter could be modified while the
goroutine was still accessing it from the last run of the producer.

Fix this by making the fetch loop a "subroutine", which is a convention
used already for other goroutines in Start. Start waits on all these
goroutines to finish before completing a stop, therefore guaranteeing
that the next iteration of Start can do so safely.

[1] https://github.com/riverqueue/river/actions/runs/15419505042/job/43390545084

@brandur brandur requested a review from bgentry June 4, 2025 00:24

@bgentry bgentry left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯, this definitely looks like the cause. Nice fix! 🎉

…r stop

This one's aimed at fixing the race detector failure as observed in [1].
Honestly, I couldn't repro this locally even at 1,000 iteration, but
reading the code, I'm pretty sure I know what the problem is.

The `fetchAndRunLoop` would start a separate goroutine that'd use a
timer to trigger a poll at `FetchPollInterval` so that fetches are
occasionally performed even if listen/notify missed something. This
inner goroutine would stop on context cancellation, but it wasn't
guaranteed to return before a stop finished. Therefore, as the stress
test called `Start` again, `fetchLimiter` could be modified while the
goroutine was still accessing it from the last run of the producer.

Fix this by making the fetch loop a "subroutine", which is a convention
used already for other goroutines in `Start`. `Start` waits on all these
goroutines to finish before completing a stop, therefore guaranteeing
that the next iteration of `Start` can do so safely.

[1] https://github.com/riverqueue/river/actions/runs/15419505042/job/43390545084
@brandur brandur force-pushed the brandur-fix-fetch-poll-loop branch from 6cf8b8c to 38bd0b1 Compare June 4, 2025 04:41
@brandur brandur changed the title Break poll loop out into subroutine guaranteed to quite before producer stop Break poll loop out into subroutine guaranteed to quit before producer stop Jun 4, 2025
@brandur

brandur commented Jun 4, 2025

Copy link
Copy Markdown
Contributor Author

Thanks!

And oops — a few typos in titles/description that are fixed now.

@brandur brandur merged commit 89c7b62 into master Jun 4, 2025
10 checks passed
@brandur brandur deleted the brandur-fix-fetch-poll-loop branch June 4, 2025 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants