Keep size-based filter rules neutral until the file size is known (#143)#431
Merged
Conversation
A rule such as -*.jpg*[<10] is meant to fetch every JPG, then delete the
ones under 10KB once their size is known. Instead it could forbid all of
them up front: at scan time the wizard calls fa_strjoker with no size, but
fa_strjoker always handed strjoker the address of an uninitialized local sz,
so the *[<10] predicate ran against stack garbage. When that garbage fell in
[0,10) the rule "matched" and the link was dropped before it was ever
downloaded ("(wizard) explicit forbidden (-*.jpg*[<10])").
Pass no size pointer when the size is unknown, routing into strjoker's
existing "test impossible -> no match" path so size rules stay neutral at
scan time and only fire once the real size is in. The size-known path is
unchanged.
Add a filtersize engine self-test that drives fa_strjoker through both
phases and a tests/01_engine-filter.test block locking the scenario.
Also lock #144: the *[name]/*[file]/*[path] classes do not span '?'; a
trailing query is tolerated by the same global rule that lets *.aspx match
page.aspx?y=2, not by the class. Working as intended.
Closes #143
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A scan rule like
-*.jpg*[<10]should fetch every JPG and then drop the ones under 10KB once their size is known. Instead it could forbid all of them before anything was downloaded, logging(wizard) explicit forbidden (-*.jpg*[<10]). At scan time the size is not known yet, so the wizard callsfa_strjokerwith no size, butfa_strjokeralways handedstrjokerthe address of an uninitialized localsz. The*[<10]predicate then compared against stack garbage, and whenever that garbage fell in[0,10)the rule matched and the link was dropped up front. The size-aware second pass (after download) already worked.The fix passes no size pointer when the size is unknown, reusing
strjoker's existing "test impossible, no match" path so size rules stay neutral at scan time and only fire once the real size is in. The size-known path is untouched. A newfiltersizeengine self-test drivesfa_strjokerthrough both phases, and a block in01_engine-filter.testpins the scenario (scan time keeps the JPG, under 10KB cancels it, 10KB or more keeps it); forcing the old uninitialized read into the[0,10)regime makes the scan-time case forbid and the test fail.This also locks #144 as working-as-intended. The
*[name]/*[file]/*[path]classes never span?mid-string; the query string the reporter saw is tolerated by the same global rule that lets*.aspxmatchpage.aspx?y=2, not by the class. Tests pin that too.Closes #143