Summary
apply_filter cannot tell the difference between "the LLM looked at this finding and rejected it" and "the LLM never saw this finding because its batch failed". Both end up dropped. Today that gap is masked by the all-or-nothing fallback (a failed batch aborts the whole pass and everything is kept), but the moment you add any per-batch resilience, this turns into a silent under-report: real findings vanish from the report because of an infrastructure hiccup, not because of a verdict.
I am filing this separately from the abort bug on purpose, because fixing the abort path without fixing this one would quietly make the scanner less safe.
Where it happens
Commit 8c9f5cc, v2.0.0, src/skillspector/nodes/meta_analyzer.py, the keep/drop loop in apply_filter (around :314-:326):
for f in findings:
exact_key = (f.file, f.rule_id, f.start_line, f.end_line)
start_only_key = (f.file, f.rule_id, f.start_line, None)
coarse_key = (f.file, f.rule_id)
if exact_key in confirmed_granular:
...
elif start_only_key in confirmed_granular:
...
elif coarse_key in confirmed_coarse:
...
else:
continue # <- finding dropped
A finding is kept only if it appears in confirmed_*, which is built solely from batches that returned. A finding whose batch never returned (timeout, 429, an oversized-chunk 400) falls through to the else: continue and is silently removed. The function has no notion of "this file was not analysed, so I have no verdict and should keep the finding rather than delete it".
Why this matters
For a security scanner the two failure directions are not equal. Dropping a finding the LLM explicitly rejected is the intended filter behaviour. Dropping a finding the LLM never evaluated is a false negative manufactured by an infra error, and a false negative in a tool whose whole job is "is this skill safe to install" is the dangerous direction to fail.
Right now this is latent because of the coupled abort behaviour (see the related issue): any batch failure aborts the pass and _fallback_filtered keeps everything, so you never reach a state where some batches succeeded and others were skipped. As soon as the abort is fixed (which it should be), partial success becomes the normal case and this drop becomes live.
Suggested direction
Make the keep/drop decision distinguish rejection from non-analysis:
- Track which files (or batches) actually came back from the LLM. Call that the analysed set.
- For findings whose file is in the analysed set, keep the current confirm-or-drop logic; that is the real filter doing its job.
- For findings whose file is not in the analysed set (their batch failed or was skipped), keep them unfiltered via the existing
_fallback_filtered path rather than dropping them. No verdict means conservative keep, not delete.
- Log a count of how many findings were kept unfiltered because their file could not be analysed, so the gap is visible rather than silent.
Net effect: the filter still removes what the LLM rejects, but an infrastructure failure can only ever cost you enrichment on a file, never the finding itself.
Related: this is the third of three coupled resilience issues in the Stage 2 path (the abort-everything fallback, the missing retry/backoff, and this drop). They are individually fixable but want to be considered together, because fixing one in isolation can move the failure mode rather than remove it.
Summary
apply_filtercannot tell the difference between "the LLM looked at this finding and rejected it" and "the LLM never saw this finding because its batch failed". Both end up dropped. Today that gap is masked by the all-or-nothing fallback (a failed batch aborts the whole pass and everything is kept), but the moment you add any per-batch resilience, this turns into a silent under-report: real findings vanish from the report because of an infrastructure hiccup, not because of a verdict.I am filing this separately from the abort bug on purpose, because fixing the abort path without fixing this one would quietly make the scanner less safe.
Where it happens
Commit
8c9f5cc, v2.0.0,src/skillspector/nodes/meta_analyzer.py, the keep/drop loop inapply_filter(around:314-:326):A finding is kept only if it appears in
confirmed_*, which is built solely from batches that returned. A finding whose batch never returned (timeout, 429, an oversized-chunk 400) falls through to theelse: continueand is silently removed. The function has no notion of "this file was not analysed, so I have no verdict and should keep the finding rather than delete it".Why this matters
For a security scanner the two failure directions are not equal. Dropping a finding the LLM explicitly rejected is the intended filter behaviour. Dropping a finding the LLM never evaluated is a false negative manufactured by an infra error, and a false negative in a tool whose whole job is "is this skill safe to install" is the dangerous direction to fail.
Right now this is latent because of the coupled abort behaviour (see the related issue): any batch failure aborts the pass and
_fallback_filteredkeeps everything, so you never reach a state where some batches succeeded and others were skipped. As soon as the abort is fixed (which it should be), partial success becomes the normal case and this drop becomes live.Suggested direction
Make the keep/drop decision distinguish rejection from non-analysis:
_fallback_filteredpath rather than dropping them. No verdict means conservative keep, not delete.Net effect: the filter still removes what the LLM rejects, but an infrastructure failure can only ever cost you enrichment on a file, never the finding itself.
Related: this is the third of three coupled resilience issues in the Stage 2 path (the abort-everything fallback, the missing retry/backoff, and this drop). They are individually fixable but want to be considered together, because fixing one in isolation can move the failure mode rather than remove it.