Skip to content

Keep size-based filter rules neutral until the file size is known (#143)#431

Merged
xroche merged 1 commit into
masterfrom
fix/filters-143-144
Jun 26, 2026
Merged

Keep size-based filter rules neutral until the file size is known (#143)#431
xroche merged 1 commit into
masterfrom
fix/filters-143-144

Conversation

@xroche

@xroche xroche commented Jun 26, 2026

Copy link
Copy Markdown
Owner

A scan rule like -*.jpg*[<10] should fetch every JPG and then drop the ones under 10KB once their size is known. Instead it could forbid all of them before anything was downloaded, logging (wizard) explicit forbidden (-*.jpg*[<10]). At scan time the size is not known yet, so the wizard calls fa_strjoker with no size, but fa_strjoker always handed strjoker the address of an uninitialized local sz. The *[<10] predicate then compared against stack garbage, and whenever that garbage fell in [0,10) the rule matched and the link was dropped up front. The size-aware second pass (after download) already worked.

The fix passes no size pointer when the size is unknown, reusing strjoker's existing "test impossible, no match" path so size rules stay neutral at scan time and only fire once the real size is in. The size-known path is untouched. A new filtersize engine self-test drives fa_strjoker through both phases, and a block in 01_engine-filter.test pins the scenario (scan time keeps the JPG, under 10KB cancels it, 10KB or more keeps it); forcing the old uninitialized read into the [0,10) regime makes the scan-time case forbid and the test fail.

This also locks #144 as working-as-intended. The *[name]/*[file]/*[path] classes never span ? mid-string; the query string the reporter saw is tolerated by the same global rule that lets *.aspx match page.aspx?y=2, not by the class. Tests pin that too.

Closes #143

A rule such as -*.jpg*[<10] is meant to fetch every JPG, then delete the
ones under 10KB once their size is known. Instead it could forbid all of
them up front: at scan time the wizard calls fa_strjoker with no size, but
fa_strjoker always handed strjoker the address of an uninitialized local sz,
so the *[<10] predicate ran against stack garbage. When that garbage fell in
[0,10) the rule "matched" and the link was dropped before it was ever
downloaded ("(wizard) explicit forbidden (-*.jpg*[<10])").

Pass no size pointer when the size is unknown, routing into strjoker's
existing "test impossible -> no match" path so size rules stay neutral at
scan time and only fire once the real size is in. The size-known path is
unchanged.

Add a filtersize engine self-test that drives fa_strjoker through both
phases and a tests/01_engine-filter.test block locking the scenario.

Also lock #144: the *[name]/*[file]/*[path] classes do not span '?'; a
trailing query is tolerated by the same global rule that lets *.aspx match
page.aspx?y=2, not by the class. Working as intended.

Closes #143

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
@xroche xroche merged commit 3de4743 into master Jun 26, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scan rules based on size not working as expected

1 participant