Skip to content

Warn when NSRL Bloom filter actual count exceeds estimate#164

Merged
steffenfritz merged 1 commit intomainfrom
fix/nsrl-bloom-oversize-warning
Apr 7, 2026
Merged

Warn when NSRL Bloom filter actual count exceeds estimate#164
steffenfritz merged 1 commit intomainfrom
fix/nsrl-bloom-oversize-warning

Conversation

@steffenfritz
Copy link
Copy Markdown
Owner

Summary

  • Prints Bloom filter stats (estimated items, actual items inserted, target FPR) after CreateNSRLBloom completes
  • Emits a clear WARNING with the recommended --nsrl-estimate value when actual item count exceeds the estimate
  • The size mismatch is the primary cause of the elevated false positive rate reported in production

Test plan

  • Build admftrove and run --create-nsrl with a hash file larger than the estimate — verify warning is printed
  • Run with a correctly sized estimate — verify only the stats line is printed, no warning

🤖 Generated with Claude Code

Print stats (estimated, actual, target FPR) after building the filter,
and emit a warning with the recommended --nsrl-estimate value when the
actual item count exceeds the estimate — the primary cause of elevated
false positive rates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@steffenfritz steffenfritz self-assigned this Apr 3, 2026
@steffenfritz
Copy link
Copy Markdown
Owner Author

Fixed filewalk_test.go. Missing "testdata/noaccess.rtf" and "testdata/testpipe" added to want lists.

@steffenfritz
Copy link
Copy Markdown
Owner Author

PASS
ok github.com/steffenfritz/FileTrove 0.906s

@steffenfritz
Copy link
Copy Markdown
Owner Author

steffen@ceres ~/g/s/g/s/F/c/admftrove (pr-164)> ./admftrove_arm64_darwin_static --creatensrl nsrlinputtest.txt
Bloom filter stats: estimated items: 40000000, actual items inserted: 4, target FPR: 0.000100

@steffenfritz
Copy link
Copy Markdown
Owner Author

I fed a text file with four lines as an input to admftrove to create a nsrl.bloom file. The file size is 91 MB. There is a short header, and then approx. 90.9MB (padded?) zeros. This is a strange behaviour and I think bloom is not working as expected. -> Issue

@steffenfritz steffenfritz merged commit a4c9d3a into main Apr 7, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant