Skip to content

Add ORDER BY to JobSetStateIfRunningMany to try and fix intermittency#920

Merged
brandur merged 1 commit into
masterfrom
brandur-set-state-if-running-intermittency
May 24, 2025
Merged

Add ORDER BY to JobSetStateIfRunningMany to try and fix intermittency#920
brandur merged 1 commit into
masterfrom
brandur-set-state-if-running-intermittency

Conversation

@brandur

@brandur brandur commented May 23, 2025

Copy link
Copy Markdown
Contributor

Here, try to fix a test intermittency problem that I ran into while
trying to fix a different intermittency problem. Here's the error:

--- FAIL: TestDriverRiverDatabaseSQLPgx (0.01s)
driver_test.go:72: Reusing idle postgres schema "river_2025_05_22t23_56_07_schema_07" [user facing: "river_2025_05_22t23_56_07_schema_07"] after cleaning in 12.716292ms [25 generated] [506 reused]
driver_test.go:72: Driver does not support listener; skipping listener tests
--- FAIL: TestDriverRiverDatabaseSQLPgx/JobSetStateIfRunningMany_MultipleJobsAtOnce (0.00s)
    riverdrivertest.go:2663: TestTx using postgres schema: river_2025_05_22t23_56_07_schema_14
    riverdrivertest.go:2679:
        Error Trace:    /Users/brandur/Documents/projects/river/internal/riverinternaltest/riverdrivertest/riverdrivertest.go:2679
        Error:          Not equal:
                expected: "completed"
                actual  : "retryable"

                Diff:
                --- Expected
                +++ Actual
                @@ -1,2 +1,2 @@
                -(rivertype.JobState) (len=9) "completed"
                +(rivertype.JobState) (len=9) "retryable"

        Test:           TestDriverRiverDatabaseSQLPgx/JobSetStateIfRunningMany_MultipleJobsAtOnce
riverdbtest.go:289: Checked in postgres schema "river_2025_05_22t23_56_07_schema_07"; 8 idle schema(s) [30 generated] [1000 reused]
FAIL
FAIL    github.com/riverqueue/river     22.974s
FAIL

It stems from these lines in the driver tests:

jobsAfter, err := exec.JobSetStateIfRunningMany(ctx, setStateManyParams(
    riverdriver.JobSetStateCompleted(job1.ID, now, []byte(`{"a":"b"}`)),
    riverdriver.JobSetStateErrorRetryable(job2.ID, future, makeErrPayload(t, now), nil),
    riverdriver.JobSetStateCancelled(job3.ID, now, makeErrPayload(t, now), nil),
))
require.NoError(t, err)
completedJob := jobsAfter[0]
require.Equal(t, rivertype.JobStateCompleted, completedJob.State)

I think the problem is that when selecting rows to return from
JobSetStateIfRunningMany. Usually these return in input order, but SQL
doesn't guarantee it, so occasionally jobs come back in the wrong order
and we get the retryable job before the completed job.

Here, we add an ORDER BY on the final SELECT. I only add it on one
side of the UNION which if I'm reading this correctly [1], should be
enough.

[1] https://www.postgresql.org/message-id/16814.1280268424%40sss.pgh.pa.us

@brandur brandur force-pushed the brandur-set-state-if-running-intermittency branch from bf81230 to c712cab Compare May 23, 2025 07:16
…ency

Here, try to fix a test intermittency problem that I ran into while
trying to fix a different intermittency problem. Here's the error:

    --- FAIL: TestDriverRiverDatabaseSQLPgx (0.01s)
    driver_test.go:72: Reusing idle postgres schema "river_2025_05_22t23_56_07_schema_07" [user facing: "river_2025_05_22t23_56_07_schema_07"] after cleaning in 12.716292ms [25 generated] [506 reused]
    driver_test.go:72: Driver does not support listener; skipping listener tests
    --- FAIL: TestDriverRiverDatabaseSQLPgx/JobSetStateIfRunningMany_MultipleJobsAtOnce (0.00s)
        riverdrivertest.go:2663: TestTx using postgres schema: river_2025_05_22t23_56_07_schema_14
        riverdrivertest.go:2679:
            Error Trace:    /Users/brandur/Documents/projects/river/internal/riverinternaltest/riverdrivertest/riverdrivertest.go:2679
            Error:          Not equal:
                    expected: "completed"
                    actual  : "retryable"

                    Diff:
                    --- Expected
                    +++ Actual
                    @@ -1,2 +1,2 @@
                    -(rivertype.JobState) (len=9) "completed"
                    +(rivertype.JobState) (len=9) "retryable"

            Test:           TestDriverRiverDatabaseSQLPgx/JobSetStateIfRunningMany_MultipleJobsAtOnce
    riverdbtest.go:289: Checked in postgres schema "river_2025_05_22t23_56_07_schema_07"; 8 idle schema(s) [30 generated] [1000 reused]
    FAIL
    FAIL    github.com/riverqueue/river     22.974s
    FAIL

It stems from these lines in the driver tests:

    jobsAfter, err := exec.JobSetStateIfRunningMany(ctx, setStateManyParams(
        riverdriver.JobSetStateCompleted(job1.ID, now, []byte(`{"a":"b"}`)),
        riverdriver.JobSetStateErrorRetryable(job2.ID, future, makeErrPayload(t, now), nil),
        riverdriver.JobSetStateCancelled(job3.ID, now, makeErrPayload(t, now), nil),
    ))
    require.NoError(t, err)
    completedJob := jobsAfter[0]
    require.Equal(t, rivertype.JobStateCompleted, completedJob.State)

I think the problem is that when selecting rows to return from
`JobSetStateIfRunningMany`. Usually these return in input order, but SQL
doesn't guarantee it, so occasionally jobs come back in the wrong order
and we get the `retryable` job before the `completed` job.

Here, we add an `ORDER BY` on the final `SELECT`. I only add it on one
side of the `UNION` which if I'm reading this correctly [1], should be
enough.

[1] https://www.postgresql.org/message-id/16814.1280268424%40sss.pgh.pa.us
@brandur brandur force-pushed the brandur-set-state-if-running-intermittency branch from c712cab to 18da744 Compare May 23, 2025 07:17
@brandur brandur requested a review from bgentry May 23, 2025 07:19
@brandur brandur merged commit 0edd616 into master May 24, 2025
10 checks passed
@brandur brandur deleted the brandur-set-state-if-running-intermittency branch May 24, 2025 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants