Add --pause to space out file downloads by a random delay#438
Open
xroche wants to merge 1 commit into
Open
Conversation
A new --pause MIN[:MAX] (seconds, -%G) waits a random MIN..MAX between files so a crawl looks less like a bot and is gentler on the server; a single value is a fixed delay. Disabled by default. It reuses the existing non-blocking launch gate (back_pluggable_sockets_strict): rather than Sleep() -- which would freeze the single select() pump and stall the other in-flight transfers -- the gate just withholds new launches until the delay elapses, one file per gap. The per-gap target is derived from the last-request timestamp so it stays stable across the many gate evaluations within a gap yet rerolls on each launch; sampling rand() per evaluation would instead bias the realized delay toward MIN. Two int fields appended at the httrackp tail (ABI-stable, no soname bump). Covered by a pure-function self-test (range + spread, with teeth against the min-bias bug) and a local-server crawl that asserts the pause slows a multi-file mirror. Closes #185 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>
733849c to
896a589
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A new
--pause MIN[:MAX](seconds,-%G) waits a random delay between MIN and MAX before each file, so a crawl looks less like a bot and is gentler on the server. A single value (--pause 8) is a fixed delay; off by default. Issue #185.It reuses the existing non-blocking launch gate,
back_pluggable_sockets_strict, the same one that enforces--connection-per-second. A blockingSleep()would be wrong here: downloads run through oneselect()pump over several sockets, so sleeping would freeze the in-flight transfers on the other sockets. The gate instead just withholds new file launches until the delay elapses, one file per gap, while the existing transfers keep draining.The per-gap delay is derived from the last-request timestamp rather than a fresh
rand()per call. The gate is evaluated many times within one gap, so sampling on every pass would fire as soon as the smallest roll dipped below the elapsed time, biasing the real delay toward MIN. A value that stays constant between launches keeps the target stable within a gap and rerolls it on the next launch, with no scratch state to store.Two
intfields are appended at thehttrackptail, so the exported ABI is unchanged and there's no soname bump (same shape as the--cookies-filechange). The pause is global across the crawl, matching how--connection-per-secondalready behaves.Tests cover the delay function directly (range and spread, with teeth against the bias-toward-MIN bug, seeded with consecutive-millisecond values like real launch timestamps), a
copy_htsoptround-trip, command-line validation including non-finite input such asnan, and a local-server crawl that asserts the pause measurably slows a multi-file mirror.Closes #185