Skip to content

feat(cache): incremental persistence + full-cache-hit short-circuit (Plan A)#16

Merged
codeninja merged 6 commits into
mainfrom
feat/plan-a-cache-persistence
May 4, 2026
Merged

feat(cache): incremental persistence + full-cache-hit short-circuit (Plan A)#16
codeninja merged 6 commits into
mainfrom
feat/plan-a-cache-persistence

Conversation

@codeninja
Copy link
Copy Markdown
Owner

Summary

  • Stages 3 & 4 now persist their cache entries incrementally, so a Ctrl-C / OOM mid-walk no longer rewinds all aggregation work to the end of stage 2.
  • Stage 4 gained its own cache layer (derivation.json) with a revised flag so --review walks don't silently inherit non-revised bodies.
  • A wikifi walk against an unchanged repo with a healthy cache now skips stages 3 & 4 entirely — no LLM calls, no full-section rewrites, walk reports "No changes detected. Wiki is current."
  • Cache schema bumped v1 → v2; v1 entries load to empty (one-time re-extraction on upgrade).

Plan: docs/plan-a-cache-persistence.md

Why

Re-walking the wikifi repo with no changes used to regenerate every primary section even though the per-file extraction cache was 100% hit, because (a) stage 3's cache updates were only persisted at end-of-walk so any interruption lost them, and (b) stage 4 had no cache at all. CI/iterative-walk use cases were burning LLM calls on a no-op.

What's new

  • Cache schema v2: CachedDerivation + CachedIntrospection dataclasses; new derivation.json and introspection.json files alongside the existing extraction.json and aggregation.json.
  • Per-stage save helpers: save_extraction, save_aggregation, save_derivation, save_introspection so each stage's persist callback only writes its own scope.
  • aggregate_all / derive_all accept a persist_cache callback invoked after every successful section update.
  • Short-circuit predicate (orchestrator._evaluate_short_circuit) requires five conditions:
    1. cache on
    2. non-empty repo
    3. every stage-2 file was cache-hit or specialized
    4. introspection scope hash matches prior walk
    5. every primary section's aggregation cache + every derivative section's derivation cache is fresh against current notes / upstream bodies
  • aggregation_fully_cached(layout, cache) / derivation_fully_cached(layout, cache, *, review) helpers encapsulate condition (5). They guard the partial-prior-walk case where extraction is fully populated but section bodies on disk are stale from a crashed stage 3.
  • CLI walk report collapses to "skipped — every file cache-hit and introspection scope unchanged" rows when fully_cached.

Test plan

  • make test — 212 passed, 93% coverage
  • make lint — clean
  • New tests cover:
    • per-stage save helpers don't touch other scope files
    • CachedDerivation round-trip; revised-flag asymmetry
    • CachedIntrospection round-trip; scope-hash stability across list order
    • aggregate_all / derive_all persist_cache callbacks fire per section
    • aggregation cache survives mid-stage-3 simulated crash (sections that completed are still on disk)
    • derive_all cache hit / miss / revised-mismatch
    • full-cache-hit second walk → 0 LLM calls in stages 3 & 4 (fully_cached=True)
    • introspection scope change defeats the short-circuit
    • missing aggregation cache entry defeats the short-circuit (the partial-prior-walk bug the predicate strengthening guards)
    • stale aggregation notes_hash defeats the short-circuit
    • missing derivation cache entry defeats the short-circuit
    • first walk on a fresh wiki never short-circuits
  • Manual: make walk twice on this repo — second walk should report "No changes detected"

🤖 Generated with Claude Code

When `wikifi walk` runs twice with no source changes, stages 3 and 4
now skip entirely — no LLM calls, no full-section rewrites, and the
on-disk wiki is treated as already current. Restores cache integrity
across interrupted walks.

Two distinct bugs fixed together:

1. Aggregation cache updates were only persisted at the end of stage 4.
   A walk that crashed mid-stage-3 (Ctrl-C, OOM, deriver LLM failure)
   lost every cache entry the aggregator had computed, so the next walk
   re-aggregated from scratch even on unchanged content. Fixed by
   threading per-section `persist_cache` callbacks through `aggregate_all`
   and `derive_all`, mirroring stage 2's existing `_persist` pattern.

2. Stage 4 had no cache layer at all. Derivation ran unconditionally on
   every walk regardless of whether upstream sections had changed. Added
   a `CachedDerivation` scope keyed on a hash of upstream bodies, with a
   `revised` flag so `--review` walks don't silently inherit a body
   produced without the critic loop.

Cache schema bumped to v2 (adds `derivation.json` + `introspection.json`
files; v1 entries load to empty so upgraded wikis re-extract on first
walk). Short-circuit predicate requires all five conditions: caching on,
non-empty repo, every file cache-hit or specialized-extracted,
introspection scope unchanged, *and* every primary/derivative section
covered by a fresh cache entry. The last condition is what keeps a
prior-walk crash from freezing stale section bodies on disk forever —
without it the predicate would short-circuit past sections that never
got aggregated.

Plan documented in `docs/plan-a-cache-persistence.md`.

Tests: 212 passing, 93% coverage. New coverage spans the per-stage save
helpers, derivation cache hit/miss/revised-mismatch, mid-stage-3 crash
survival, scope-change defeating the early-out, and partial/stale
aggregation/derivation caches defeating the early-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds stage-scoped cache persistence and a stage-3/4 short-circuit so unchanged walks can avoid downstream LLM work, extending the existing cache pipeline in wikifi.

Changes:

  • Split cache persistence into per-stage save helpers and added new derivation/introspection cache scopes with schema version v2.
  • Added short-circuit evaluation in the orchestrator to skip aggregation/derivation when extraction is fully reusable and the prior walk scope still matches.
  • Updated CLI reporting and expanded tests around cache round-trips, per-stage persistence, and short-circuit behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
wikifi/orchestrator.py Wires new cache save helpers, introspection scope hashing, and full-cache short-circuit into the walk pipeline.
wikifi/deriver.py Adds derivation cache lookup/recording, cache-aware stats, and derivative freshness checks.
wikifi/cli.py Changes walk output to show skipped downstream stages and a “No changes detected” message.
wikifi/cache.py Introduces derivation/introspection cache models, new cache files, per-scope save helpers, and hash helpers.
wikifi/aggregator.py Adds per-section cache persistence callback support and helper logic for aggregation-cache freshness checks.
tests/test_orchestrator.py Adds short-circuit and cache freshness regression coverage for the orchestrator.
tests/test_deriver.py Adds derivation cache hit/miss, review-mode, and persist-callback tests.
tests/test_cache.py Adds cache round-trip, per-scope save, reset, hashing, and version-bump tests.
tests/test_aggregator.py Adds per-section persistence and crash-resume tests for aggregation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread wikifi/aggregator.py Outdated
Comment on lines +303 to +306
if not notes:
# Empty-note sections render the same placeholder every walk
# — no aggregation work to short-circuit around.
continue
Comment thread wikifi/deriver.py Outdated
Comment on lines +169 to +172
for section in DERIVATIVE_SECTIONS:
upstream_bodies = _collect_upstream(layout, section)
if not upstream_bodies:
continue
Comment thread wikifi/orchestrator.py
Comment on lines +277 to +279
no_llm_files = extraction.cache_hits + extraction.specialized_files
if no_llm_files != extraction.files_seen:
return False
Comment thread wikifi/orchestrator.py
Comment on lines +277 to +279
no_llm_files = extraction.cache_hits + extraction.specialized_files
if no_llm_files != extraction.files_seen:
return False
Comment thread wikifi/deriver.py Outdated
Comment on lines +146 to +150
cache.record_derivation(
section.id,
upstream_hash=upstream_hash,
body=body,
revised=revised,
Comment thread wikifi/cli.py Outdated
if report.fully_cached:
table.add_row(
"3. Aggregation",
"[dim]skipped — every file cache-hit and introspection scope unchanged[/dim]",
Comment thread wikifi/deriver.py Outdated
Comment on lines +137 to +140
stats.review_outcomes.append(outcome)
if outcome.revised:
stats.sections_revised += 1
revised = True
Comment thread wikifi/cache.py Outdated
Comment on lines +117 to +120
records whether the critic + reviser loop ran on this body, so a
cached body produced under ``--review`` is not silently reused on a
subsequent walk that turns review off (the bodies are similar but
the contract differs).
codeninja and others added 5 commits May 4, 2026 14:56
Closes 8 review comments on PR #16 covering correctness gaps in the
stage-3/4 short-circuit and the derivation/extraction caches.

Cache short-circuit predicate
- Aggregator and deriver now record an empty-state cache entry when
  notes / upstream bodies are empty, and the ``fully_cached`` checks
  no longer auto-skip those sections. Without this, deleting a
  contributing file between walks would let the next walk skip stage 3
  while leaving the prior walk's prose on disk indefinitely. (#1, #2)
- Extractor stops recording an extraction cache entry when any chunk
  failed — partial findings would otherwise satisfy the short-circuit
  predicate while masking lost work. (#3)
- ``specialized_files`` now increments only after the specialized
  extractor returns successfully, so files whose extractor raised no
  longer pad the "no LLM-side work" total used by the predicate. (#4)

Derivation cache hygiene
- Deriver no longer records a cache entry when ``provider.complete_json``
  fell back to the error body — a transient stage-4 outage would
  otherwise be replayed (or short-circuited around) on later walks. (#5)
- ``revised`` semantic is now "review loop ran on this body" rather
  than "reviser changed text." A ``--review`` walk whose critic accepts
  unchanged still records ``revised=True`` so the next ``--review``
  walk's freshness check accepts the cache entry. ``CachedDerivation``
  docstring rewritten to match the actual ``lookup_derivation``
  asymmetry. ``sections_revised`` on the cache-replay path no longer
  inherits the cached flag — it counts reviser work performed in this
  walk only. (#7, #8)

CLI
- Skip-reason row mentions both cache-hit and specialized files so the
  message stays accurate on repos dominated by SQL/OpenAPI/etc. (#6)

Regression coverage
- 10 new tests pin each fix: deletion-between-walks scenario,
  partial-chunk-failure cache hygiene, failed-specialized counter
  ordering, derivation error-body cache exclusion, ``--review`` walk
  with critic accepting unchanged, empty-state predicate behavior on
  both stages, and ``sections_revised`` staying zero on cache replay.
- 222 tests pass, 93% coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Building on Plan A's cache layer, the aggregator gains a third path
between full cache hit and full rewrite. When a section's finding set
overlaps the cached set within a configurable threshold, the LLM is
asked to *edit* the cached body around the added/removed delta rather
than re-derive the section from scratch. Preserves established prose,
addresses the "potentially omitting or changing key details" concern,
and cuts tokens on partially-changed walks.

Per-section decision tree (in priority order):

1. **Cache hit** — notes_hash matches cached entry exactly. Re-render
   cached bundle, no LLM call. (Plan A behavior, unchanged.)
2. **Unchanged** — finding ids match but notes_hash differs (a source
   line shifted, summary changed, etc.). Refresh the cache key, re-
   render the cached body, no LLM call.
3. **Surgical** — symmetric difference of finding ids is at or below
   `surgical_edit_threshold` (default 0.3). Send cached body + delta
   to LLM, merge edit with cached claims (drop removed, add new).
4. **Rewrite** — too much churn (or no prior body to anchor on). Run
   the original from-scratch aggregation prompt. (Plan A behavior.)

Identity: each note gets a stable `finding_id =
sha256(file::section_id::finding_text)[:16]`. The cache stores
`finding_ids` aligned with note order so claim source_indices map
back to ids. Reword of finding text → new id (semantically a
delete-plus-insert, since prose written around the old wording can no
longer be assumed grounded).

Citation re-anchoring is mechanical: cached claims minus
`removed_claim_indices` plus `new_claims` resolved against the added
notes only. Contradictions are fully replaced from the model output
since they're a small set and cleaner to re-emit than diff.

New module `wikifi/surgical.py`:
- `compute_finding_id` / `note_finding_ids` (in cache.py — adjacent to
  the other hash helpers)
- `SectionChange` dataclass + `classify_section_change` for the diff
- `SurgicalEdit` Pydantic schema + `SURGICAL_SYSTEM_PROMPT`
- `surgical_aggregate` runs the LLM call and merges the result

Cache schema bumped v2 → v3 to add `finding_ids: list[str]` to
`CachedSection`. v2 entries load to empty so an upgraded wiki
re-aggregates on the first walk; subsequent walks light up surgical.

Configuration:
- `Settings.use_surgical_edits: bool = True`
- `Settings.surgical_edit_threshold: float = 0.3`
- `wikifi walk --no-surgical` opt-out flag

Walk report adds `sections_edited` and `sections_rewritten` columns
alongside the existing `sections_cached`.

Tests: 231 passing, 93% coverage. New `tests/test_surgical.py` covers:
- `compute_finding_id` / `note_finding_ids` determinism
- classifier truth table (no-cache, legacy v2, identical sets, small
  delta, large delta, threshold exactly equal, disabled threshold)
- `surgical_aggregate` claim merging (drop removed, resolve new
  against added notes, replace contradictions, ignore bad indices)
- aggregator decision tree records `finding_ids` on every cache write
- unchanged-finding-ids path takes no LLM call and refreshes notes_hash
- surgical path selected for small delta; rewrite for large delta
- `use_surgical_edits=False` collapses to Plan A's two-path tree
- surgical-path LLM failure falls back to rewrite cleanly
- stability: when LLM honors the contract, unchanged paragraphs are
  byte-preserved through the merge

Plan documented in `docs/plan-b-surgical-aggregation.md`. Stacked on
Plan A (#16).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First walk under cache schema v3 — the v2 cache from prior walks
invalidates, so every section took the rewrite path
(sections_rewritten=8, sections_edited=0). Subsequent walks with
small per-file changes will exercise the surgical edit path.

Walk stats:
- Stage 1 introspection: include=1 exclude=12 langs=Python
- Stage 2 extraction: 31 files seen, 29 contributed, 277 findings
- Stage 3 aggregation: 8 sections written (all rewrite path)
- Stage 4 derivation: personas + user_stories OK; diagrams hit a
  json_invalid response from the model (long mermaid output truncated
  by the SDK's parse layer) and fell back to upstream-bodies-preserved
  output — the deriver's per-section catch behaving as designed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses every distinct concern raised by the AI review on PR #17.
Comment IDs in parentheses; many were duplicated across the two review
passes (pre-rebase + post-rebase).

Identity / cache schema
- Align finding_id length contract with the 12-char implementation. The
  Hard Specifications, the `compute_finding_id` docstring, and the
  Cache Entities table all said 16 hex chars but the function delegates
  to `fingerprint.hash_text` which truncates to FINGERPRINT_LENGTH=12.
  Updated the docs (cache.py, hard_specifications.md) to match what the
  code actually does — the wider hash isn't needed at typical scale and
  reusing one constant keeps the system coherent. (3183975867,
  3184733330, 3184733407)
- `note_finding_ids` now returns "" when a note is missing `file` or
  `finding`. Hashing empty inputs would otherwise produce a stable
  non-empty id and let two malformed notes from different walks compare
  equal as "unchanged", routing the section onto the cache/surgical
  paths despite having no real identity to anchor on. The classifier
  picks up the slack: any "" id in either set forces "rewrite" so two
  malformed notes can't accidentally cancel out via set equality.
  (3183975908, 3184733368)
- Rewrote `CachedSection.finding_ids` docstring. The old text claimed
  the list aligned with `claim.source_indices`, but cached claims store
  fully-resolved SourceRefs, not indices. Documented the actual
  consumer: section-delta classification by the surgical classifier.
  (3184733384)

Aggregator decision tree
- Fixed Path 2's cache lie: stopped refreshing `notes_hash` to match
  live notes when the unchanged-finding-ids fast path fires. The cached
  claims still carry SourceRefs from the prior walk's notes (file,
  lines, fingerprint), and updating the cache key without refreshing
  those refs would let `aggregation_fully_cached` flag the section as
  fresh and let the orchestrator's stage-3 short-circuit lock the
  drifting citations in place. Path 2 still skips the LLM call this
  walk; the cache key only refreshes when a real Path 4 rewrite runs.
  (3183975957)
- Clarified `use_surgical_edits` docs in both `aggregator.aggregate_all`
  and `Settings`. The flag only gates Path 3 (the LLM-side surgical
  edit); the no-LLM cache-reuse paths (full cache hit and unchanged-
  finding-ids re-render) still fire regardless. The old text claimed
  the flag collapsed the tree to Plan A's two paths, which never
  matched the actual code. (3183975934, 3184733492, 3184733517)

Surgical helpers
- Renamed `_resolve_removed_claim_texts` to `_removed_finding_descriptors`
  and rewrote the docstring to match the implementation: it builds the
  removed-notes payload as `{finding_id: ...}` dicts and does not use
  the `cached` parameter (the cache doesn't store original note bodies,
  only ids). The old name promised richer reconstruction. (3184733469)
- `SectionChange.churn_ratio` now returns 1.0 when `total_live == 0`
  and there are removed ids, mirroring the classifier's
  `max(total_live, 1)` guard. The old 0.0 return for the
  remove-everything case would mislead any downstream caller checking
  `churn_ratio` directly. (3184733532)

Wiki artifact
- Restored `.wikifi/diagrams.md` to its pre-chore-commit version. The
  chore commit (refresh wiki via Plan B walk) committed an LLM JSON
  truncation as the diagrams body — `Derivation failed for **Diagrams**
  (... Invalid JSON: EOF while parsing ...)` rather than actual
  Mermaid. Forward-fix approach (revert just the broken file) rather
  than amending the chore commit; the next walk against this repo will
  regenerate it cleanly. (3183975985, 3184733450)

Regression tests added
- `note_finding_ids` returns "" for malformed notes (missing/blank
  file or finding, including null `finding`).
- `classify_section_change` force-rewrites when any finding_id is
  empty in either set, even when the sets look identical.
- `SectionChange.churn_ratio` is 1.0 for the remove-everything edge
  case and 0.0 for the truly-empty case.
- Aggregator's Path 2 leaves the cache entry's `notes_hash` alone and
  `aggregation_fully_cached` correctly refuses to flag the section as
  fresh.

Gate: 245 passed, 94% coverage, lint clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(aggregator): surgical edits when only some findings change (Plan B)
@codeninja codeninja merged commit faa7bf3 into main May 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants