Skip to content

Add --pause to space out file downloads by a random delay#438

Open
xroche wants to merge 1 commit into
masterfrom
feat/pause-files-185
Open

Add --pause to space out file downloads by a random delay#438
xroche wants to merge 1 commit into
masterfrom
feat/pause-files-185

Conversation

@xroche

@xroche xroche commented Jun 27, 2026

Copy link
Copy Markdown
Owner

A new --pause MIN[:MAX] (seconds, -%G) waits a random delay between MIN and MAX before each file, so a crawl looks less like a bot and is gentler on the server. A single value (--pause 8) is a fixed delay; off by default. Issue #185.

It reuses the existing non-blocking launch gate, back_pluggable_sockets_strict, the same one that enforces --connection-per-second. A blocking Sleep() would be wrong here: downloads run through one select() pump over several sockets, so sleeping would freeze the in-flight transfers on the other sockets. The gate instead just withholds new file launches until the delay elapses, one file per gap, while the existing transfers keep draining.

The per-gap delay is derived from the last-request timestamp rather than a fresh rand() per call. The gate is evaluated many times within one gap, so sampling on every pass would fire as soon as the smallest roll dipped below the elapsed time, biasing the real delay toward MIN. A value that stays constant between launches keeps the target stable within a gap and rerolls it on the next launch, with no scratch state to store.

Two int fields are appended at the httrackp tail, so the exported ABI is unchanged and there's no soname bump (same shape as the --cookies-file change). The pause is global across the crawl, matching how --connection-per-second already behaves.

Tests cover the delay function directly (range and spread, with teeth against the bias-toward-MIN bug, seeded with consecutive-millisecond values like real launch timestamps), a copy_htsopt round-trip, command-line validation including non-finite input such as nan, and a local-server crawl that asserts the pause measurably slows a multi-file mirror.

Closes #185

A new --pause MIN[:MAX] (seconds, -%G) waits a random MIN..MAX between
files so a crawl looks less like a bot and is gentler on the server; a
single value is a fixed delay. Disabled by default.

It reuses the existing non-blocking launch gate
(back_pluggable_sockets_strict): rather than Sleep() -- which would freeze
the single select() pump and stall the other in-flight transfers -- the
gate just withholds new launches until the delay elapses, one file per
gap. The per-gap target is derived from the last-request timestamp so it
stays stable across the many gate evaluations within a gap yet rerolls on
each launch; sampling rand() per evaluation would instead bias the
realized delay toward MIN.

Two int fields appended at the httrackp tail (ABI-stable, no soname bump).
Covered by a pure-function self-test (range + spread, with teeth against
the min-bias bug) and a local-server crawl that asserts the pause slows a
multi-file mirror.

Closes #185

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
@xroche xroche force-pushed the feat/pause-files-185 branch from 733849c to 896a589 Compare June 27, 2026 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add option to pause X to Y seconds between files downloaded

1 participant