Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions build-and-config/long-running-sidekiq-across-deploys.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: Long-running Sidekiq jobs across deploys
products: ['rails', 'deploy']
---

## Overview

When you deploy, Cloud 66 restarts your background processes — including Sidekiq workers. If a Sidekiq job is mid-execution when its worker is restarted, the job is interrupted and (depending on its retry settings) may be retried, abandoned, or lost. For long-running jobs this is a real risk: a job that takes 90 seconds will rarely finish in the few seconds between a `TERM` signal and a forced `KILL`.

The fix is to tell Cloud 66's process manager how to drain Sidekiq gracefully: first signal Sidekiq to stop fetching new jobs but keep working on the ones already in flight, then give it enough time to finish, and only then send `TERM` and `KILL`. This is configured via `procfile_metadata.stop_sequence` in your [manifest file](/:product/:version?/manifest/manifest).

## Which signal does what

Sidekiq's signal semantics changed across versions. The signals that matter for graceful drains are:

| Signal | Sidekiq 5.0+ behaviour | Notes |
|--------|------------------------|-------|
| `TSTP` | Quiet mode — finish current jobs, stop fetching new ones | The signal you want for graceful drain |
| `TERM` | Graceful shutdown — wait up to `-t` timeout, then exit | Final shutdown signal |
| `TTIN` | Dumps thread backtraces to the log | Diagnostic only; does **not** stop fetching |
| `USR1` | Quiet mode (Sidekiq ≤ 6 only) | Deprecated in Sidekiq 7.0; use `TSTP` instead |
| `KILL` | Immediate termination | Last resort |

If you use `TTIN` in a `stop_sequence` — as some older examples show — Sidekiq keeps pulling new jobs throughout the wait window, which defeats the point of waiting.

## Worked example

Suppose your worker jobs can take up to two minutes to finish. In your manifest:

```yaml
procfile_metadata:
worker:
stop_sequence: tstp, 120, term, 30, kill
```

This sequence:

1. Sends `TSTP` — Sidekiq enters quiet mode and stops fetching new jobs.
2. Waits 120 seconds for in-flight jobs to finish.
3. Sends `TERM` — Sidekiq begins graceful shutdown, allowing remaining jobs up to its own internal timeout to finish.
4. Waits 30 seconds.
5. Sends `KILL` if the process is still running.

Adjust the wait values to match your worst-case job duration. Note that Sidekiq's own `-t` (timeout) setting also bounds how long `TERM` will wait; the `stop_sequence` wait should be at least as long as Sidekiq's `-t`, otherwise `KILL` will fire before Sidekiq finishes its graceful shutdown.

## Related

- [Processes Configuration](/:product/:version?/manifest/_processes) — full `procfile_metadata` reference
- [Running background processes](/:product/:version?/build-and-config/proc-files) — Procfile basics
- [Sidekiq Signals (upstream wiki)](https://github.com/sidekiq/sidekiq/wiki/Signals)
10 changes: 8 additions & 2 deletions manifest/_processes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,21 @@ If you would like more flexibility over the signals used to control the processe
```yaml
procfile_metadata:
worker:
stop_sequence: ttin, 120, term, 5, kill
stop_sequence: tstp, 120, term, 5, kill
web:
restart_signal: usr1
stop_sequence: usr1, 30, kill
nsq:
restart_on_deploy: false
```

In this example, a process called `worker` is stopped using a `TTIN` signal first. After waiting for 120 seconds, if the process is still running, a `TERM` signal will be sent. If it is still running after 5 seconds, it will be killed.
In this example, a process called `worker` is stopped using a `TSTP` signal first. After waiting for 120 seconds, if the process is still running, a `TERM` signal will be sent. If it is still running after 5 seconds, it will be killed.

<Callout type="warning" title="Use TSTP, not TTIN, for graceful Sidekiq drains">
If your worker process is Sidekiq, the signal that puts it into "quiet mode" (finish current jobs, stop fetching new ones) is `TSTP` — supported since Sidekiq 5.0. `TTIN` only logs thread backtraces and does **not** stop job fetching, so using `TTIN` in a `stop_sequence` leaves Sidekiq pulling new jobs throughout the wait window. `USR1` was the legacy signal for quiet mode and is deprecated in Sidekiq 7.0+.

For a worked example see [Long-running Sidekiq jobs across deploys](/:product/:version?/build-and-config/long-running-sidekiq-across-deploys).
</Callout>

As for `web` or `custom_web` processes, you can specify a `restart_signal` which will be sent to the process serving web. This is useful for web servers that can do "phased" or zero-downtime restarts.

Expand Down