Skip to content

Stage 2: one failed batch aborts the entire meta-analyzer pass and silently falls back to static #9

@wernerkasselman-au

Description

@wernerkasselman-au

Summary

I was running the Anthropic provider against a large skill tree and noticed that a single failed LLM call on one file quietly turned off the semantic filter for every file in the scan, not just the file that failed. The scan still exits 0 and prints a normal report, so the degradation is invisible unless you read the WARNING log.

The cause is that the meta-analyzer has no per-batch isolation. One exception anywhere in the batch fan-out aborts the whole Stage 2 pass and falls back to static-only results.

Where it happens

Two coupled spots (commit 8c9f5cc, v2.0.0):

  1. src/skillspector/llm_analyzer_base.py:397

    return list(await asyncio.gather(*[_process(b) for b in batches]))

    asyncio.gather is called without return_exceptions=True, so the first _process coroutine that raises (a 429, a request timeout, a 400 on an oversized chunk) propagates straight out of arun_batches and cancels the rest.

  2. src/skillspector/nodes/meta_analyzer.py:394 then :405

    batch_results = asyncio.run(analyzer.arun_batches(batches, metadata_text=metadata_text))
    ...
    except Exception as e:
        logger.warning("LLM call failed, using fallback: %s", e)
        return {"filtered_findings": _fallback_filtered(findings)}

    The whole fan-out sits under one try/except, so the propagated exception from a single batch lands here and _fallback_filtered returns every finding unfiltered.

Why this matters

The blast radius is wrong. If file A's batch times out, I lose the semantic filtering for files B through Z as well, even though their calls would have succeeded. On a 190-file scan I watched one bad batch discard the enrichment for all 190 (0 files actually filtered), and the printed risk score was identical to a --no-llm run while the report still claimed the LLM pass had run. A user reasonably believes they are getting the precision pass when they are getting static-only.

To be fair, an all-or-nothing fallback is a defensible first cut, and on a tiny single-file skill it is harmless. On anything large it is not, because the probability that at least one of N batches hits a transient error climbs with N, so the larger and more interesting the skill, the more likely the filter silently switches itself off.

Reproduce

  1. Point at any skill tree with more than ~50 files that have findings.
  2. Use a provider/tier where at least one call will 429 or time out (the NVIDIA build tier rate-caps readily; a low Anthropic concurrency limit does too).
  3. Run skillspector scan <dir> --verbose.
  4. Observe a single using fallback line, 0 analyzed in the meta-analyzer debug line, and a final report that is byte-identical to --no-llm while exit code stays 0.

Suggested direction

Isolate each batch so one failure cannot cancel the others, and reserve the static fallback for the batches that actually failed rather than the whole set:

  • In arun_batches, either pass return_exceptions=True to gather and drop the failures, or wrap _process in its own try/except that logs and returns a sentinel; then filter the sentinels out before returning.
  • In meta_analyzer, stop treating "one batch raised" as "the whole pass failed". Apply the filter to the batches that came back and handle the missing ones separately (see the related issue on apply_filter dropping unanalyzed findings).
  • Separately, surface the degradation rather than only logging it at WARNING; a static-only fallback that is invisible at the default log level is the part that actually burns people.

Related: the schema-400 report (#4) is one trigger for this same fallback, but the abort-everything behaviour is the deeper bug and would still bite on any transient 429 or timeout even after the schema issue is fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions