Follow-ups deliberately left out of #61 (the #53 zombie-reaping fix) to keep that change focused. Each is independent.
1. Bound a hung (not disconnected) served git with a timeout
#61 tears down git on client disconnect, but a wedged upload-pack/pack-objects that neither finishes nor disconnects can still block the request indefinitely and hold a PID. Wrap the wait_with_output() in run_git_service (crates/gitlawb-node/src/git/smart_http.rs) in a tokio::time::timeout. On expiry the existing KillGroupOnDrop guard already fires when the future is dropped, so the process group gets cleaned up; the work is choosing a sane bound (large repos need headroom) and surfacing a clear error.
2. Cross-environment process cap, not just compose
pids_limit: 1024 lives in docker-compose.yml only, and the cgroup pids controller counts threads, so per-request fan-out (git + pack-objects + helper threads) eats into it faster than one PID per request suggests. It also does not apply on Fly. Two parts:
- An application-level concurrency cap (a semaphore in front of
command.spawn() in the serve path) so the node sheds load with a clean 503 instead of every concurrent git op failing with a 500 mid-stream once the table is exhausted.
- A matching limit wherever the node runs outside compose (e.g. Fly), so the safety net is not compose-only.
3. Wiring test for the disconnect/reap path
The new tests in #61 exercise the KillGroupOnDrop guard in isolation. There is no test asserting that run_git_service actually applies process_group(0) and arms the guard, so a refactor that dropped either would pass CI. A fake git on PATH (a script that sleeps, or exits immediately to force the stdin-write-error path) makes both the disconnect-teardown and the reap-on-error paths deterministically testable end to end.
Refs #53, #61.
Follow-ups deliberately left out of #61 (the #53 zombie-reaping fix) to keep that change focused. Each is independent.
1. Bound a hung (not disconnected) served git with a timeout
#61 tears down git on client disconnect, but a wedged
upload-pack/pack-objectsthat neither finishes nor disconnects can still block the request indefinitely and hold a PID. Wrap thewait_with_output()inrun_git_service(crates/gitlawb-node/src/git/smart_http.rs) in atokio::time::timeout. On expiry the existingKillGroupOnDropguard already fires when the future is dropped, so the process group gets cleaned up; the work is choosing a sane bound (large repos need headroom) and surfacing a clear error.2. Cross-environment process cap, not just compose
pids_limit: 1024lives indocker-compose.ymlonly, and the cgroup pids controller counts threads, so per-request fan-out (git + pack-objects + helper threads) eats into it faster than one PID per request suggests. It also does not apply on Fly. Two parts:command.spawn()in the serve path) so the node sheds load with a clean 503 instead of every concurrent git op failing with a 500 mid-stream once the table is exhausted.3. Wiring test for the disconnect/reap path
The new tests in #61 exercise the
KillGroupOnDropguard in isolation. There is no test asserting thatrun_git_serviceactually appliesprocess_group(0)and arms the guard, so a refactor that dropped either would pass CI. A fakegitonPATH(a script that sleeps, or exits immediately to force the stdin-write-error path) makes both the disconnect-teardown and the reap-on-error paths deterministically testable end to end.Refs #53, #61.