From c079afd39965f59adb39cc2b7660cc04dfb120a0 Mon Sep 17 00:00:00 2001 From: Vic van Gool Date: Wed, 20 May 2026 07:40:51 +0100 Subject: [PATCH] Fix Sidekiq stop_sequence signal and add long-running-jobs guide The procfile_metadata example used TTIN, which in Sidekiq 5+ only dumps thread backtraces and does not start quiet mode, so workers kept fetching new jobs throughout the shutdown wait window. Switch the example to TSTP (the correct quiet-mode signal since Sidekiq 5.0) and add a Callout explaining the TSTP/TTIN/USR1 semantics and version compatibility. Also adds a dedicated how-to page so the pattern is discoverable by the terms customers search ("long-running sidekiq across deploys") rather than only under the signal-centric "Process Signal Options" heading. Linear: SUP-975 (absorbs SUP-979) Co-Authored-By: Claude Opus 4.7 (1M context) --- .../long-running-sidekiq-across-deploys.mdx | 50 +++++++++++++++++++ manifest/_processes.mdx | 10 +++- 2 files changed, 58 insertions(+), 2 deletions(-) create mode 100644 build-and-config/long-running-sidekiq-across-deploys.mdx diff --git a/build-and-config/long-running-sidekiq-across-deploys.mdx b/build-and-config/long-running-sidekiq-across-deploys.mdx new file mode 100644 index 0000000..00ba776 --- /dev/null +++ b/build-and-config/long-running-sidekiq-across-deploys.mdx @@ -0,0 +1,50 @@ +--- +title: Long-running Sidekiq jobs across deploys +products: ['rails', 'deploy'] +--- + +## Overview + +When you deploy, Cloud 66 restarts your background processes — including Sidekiq workers. If a Sidekiq job is mid-execution when its worker is restarted, the job is interrupted and (depending on its retry settings) may be retried, abandoned, or lost. For long-running jobs this is a real risk: a job that takes 90 seconds will rarely finish in the few seconds between a `TERM` signal and a forced `KILL`. + +The fix is to tell Cloud 66's process manager how to drain Sidekiq gracefully: first signal Sidekiq to stop fetching new jobs but keep working on the ones already in flight, then give it enough time to finish, and only then send `TERM` and `KILL`. This is configured via `procfile_metadata.stop_sequence` in your [manifest file](/:product/:version?/manifest/manifest). + +## Which signal does what + +Sidekiq's signal semantics changed across versions. The signals that matter for graceful drains are: + +| Signal | Sidekiq 5.0+ behaviour | Notes | +|--------|------------------------|-------| +| `TSTP` | Quiet mode — finish current jobs, stop fetching new ones | The signal you want for graceful drain | +| `TERM` | Graceful shutdown — wait up to `-t` timeout, then exit | Final shutdown signal | +| `TTIN` | Dumps thread backtraces to the log | Diagnostic only; does **not** stop fetching | +| `USR1` | Quiet mode (Sidekiq ≤ 6 only) | Deprecated in Sidekiq 7.0; use `TSTP` instead | +| `KILL` | Immediate termination | Last resort | + +If you use `TTIN` in a `stop_sequence` — as some older examples show — Sidekiq keeps pulling new jobs throughout the wait window, which defeats the point of waiting. + +## Worked example + +Suppose your worker jobs can take up to two minutes to finish. In your manifest: + +```yaml +procfile_metadata: + worker: + stop_sequence: tstp, 120, term, 30, kill +``` + +This sequence: + +1. Sends `TSTP` — Sidekiq enters quiet mode and stops fetching new jobs. +2. Waits 120 seconds for in-flight jobs to finish. +3. Sends `TERM` — Sidekiq begins graceful shutdown, allowing remaining jobs up to its own internal timeout to finish. +4. Waits 30 seconds. +5. Sends `KILL` if the process is still running. + +Adjust the wait values to match your worst-case job duration. Note that Sidekiq's own `-t` (timeout) setting also bounds how long `TERM` will wait; the `stop_sequence` wait should be at least as long as Sidekiq's `-t`, otherwise `KILL` will fire before Sidekiq finishes its graceful shutdown. + +## Related + +- [Processes Configuration](/:product/:version?/manifest/_processes) — full `procfile_metadata` reference +- [Running background processes](/:product/:version?/build-and-config/proc-files) — Procfile basics +- [Sidekiq Signals (upstream wiki)](https://github.com/sidekiq/sidekiq/wiki/Signals) diff --git a/manifest/_processes.mdx b/manifest/_processes.mdx index 1347436..57459d5 100644 --- a/manifest/_processes.mdx +++ b/manifest/_processes.mdx @@ -13,7 +13,7 @@ If you would like more flexibility over the signals used to control the processe ```yaml procfile_metadata: worker: - stop_sequence: ttin, 120, term, 5, kill + stop_sequence: tstp, 120, term, 5, kill web: restart_signal: usr1 stop_sequence: usr1, 30, kill @@ -21,7 +21,13 @@ procfile_metadata: restart_on_deploy: false ``` -In this example, a process called `worker` is stopped using a `TTIN` signal first. After waiting for 120 seconds, if the process is still running, a `TERM` signal will be sent. If it is still running after 5 seconds, it will be killed. +In this example, a process called `worker` is stopped using a `TSTP` signal first. After waiting for 120 seconds, if the process is still running, a `TERM` signal will be sent. If it is still running after 5 seconds, it will be killed. + + +If your worker process is Sidekiq, the signal that puts it into "quiet mode" (finish current jobs, stop fetching new ones) is `TSTP` — supported since Sidekiq 5.0. `TTIN` only logs thread backtraces and does **not** stop job fetching, so using `TTIN` in a `stop_sequence` leaves Sidekiq pulling new jobs throughout the wait window. `USR1` was the legacy signal for quiet mode and is deprecated in Sidekiq 7.0+. + +For a worked example see [Long-running Sidekiq jobs across deploys](/:product/:version?/build-and-config/long-running-sidekiq-across-deploys). + As for `web` or `custom_web` processes, you can specify a `restart_signal` which will be sent to the process serving web. This is useful for web servers that can do "phased" or zero-downtime restarts.