Stage 2: apply_filter silently drops findings for files the LLM never analyzed (false negatives)

## Summary

`apply_filter` cannot tell the difference between "the LLM looked at this finding and rejected it" and "the LLM never saw this finding because its batch failed". Both end up dropped. Today that gap is masked by the all-or-nothing fallback (a failed batch aborts the whole pass and everything is kept), but the moment you add any per-batch resilience, this turns into a silent under-report: real findings vanish from the report because of an infrastructure hiccup, not because of a verdict.

I am filing this separately from the abort bug on purpose, because fixing the abort path without fixing this one would quietly make the scanner less safe.

## Where it happens

Commit `8c9f5cc`, v2.0.0, `src/skillspector/nodes/meta_analyzer.py`, the keep/drop loop in `apply_filter` (around `:314`-`:326`):

```python
for f in findings:
    exact_key = (f.file, f.rule_id, f.start_line, f.end_line)
    start_only_key = (f.file, f.rule_id, f.start_line, None)
    coarse_key = (f.file, f.rule_id)
    if exact_key in confirmed_granular:
        ...
    elif start_only_key in confirmed_granular:
        ...
    elif coarse_key in confirmed_coarse:
        ...
    else:
        continue   # <- finding dropped
```

A finding is kept only if it appears in `confirmed_*`, which is built solely from batches that returned. A finding whose batch never returned (timeout, 429, an oversized-chunk 400) falls through to the `else: continue` and is silently removed. The function has no notion of "this file was not analysed, so I have no verdict and should keep the finding rather than delete it".

## Why this matters

For a security scanner the two failure directions are not equal. Dropping a finding the LLM explicitly rejected is the intended filter behaviour. Dropping a finding the LLM never evaluated is a false negative manufactured by an infra error, and a false negative in a tool whose whole job is "is this skill safe to install" is the dangerous direction to fail.

Right now this is latent because of the coupled abort behaviour (see the related issue): any batch failure aborts the pass and `_fallback_filtered` keeps everything, so you never reach a state where some batches succeeded and others were skipped. As soon as the abort is fixed (which it should be), partial success becomes the normal case and this drop becomes live.

## Suggested direction

Make the keep/drop decision distinguish rejection from non-analysis:

- Track which files (or batches) actually came back from the LLM. Call that the analysed set.
- For findings whose file is in the analysed set, keep the current confirm-or-drop logic; that is the real filter doing its job.
- For findings whose file is not in the analysed set (their batch failed or was skipped), keep them unfiltered via the existing `_fallback_filtered` path rather than dropping them. No verdict means conservative keep, not delete.
- Log a count of how many findings were kept unfiltered because their file could not be analysed, so the gap is visible rather than silent.

Net effect: the filter still removes what the LLM rejects, but an infrastructure failure can only ever cost you enrichment on a file, never the finding itself.

Related: this is the third of three coupled resilience issues in the Stage 2 path (the abort-everything fallback, the missing retry/backoff, and this drop). They are individually fixable but want to be considered together, because fixing one in isolation can move the failure mode rather than remove it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stage 2: apply_filter silently drops findings for files the LLM never analyzed (false negatives) #11

Summary

Where it happens

Why this matters

Suggested direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Stage 2: apply_filter silently drops findings for files the LLM never analyzed (false negatives) #11

Description

Summary

Where it happens

Why this matters

Suggested direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions