feat(cache): incremental persistence + full-cache-hit short-circuit (Plan A)#16
Merged
Conversation
When `wikifi walk` runs twice with no source changes, stages 3 and 4 now skip entirely — no LLM calls, no full-section rewrites, and the on-disk wiki is treated as already current. Restores cache integrity across interrupted walks. Two distinct bugs fixed together: 1. Aggregation cache updates were only persisted at the end of stage 4. A walk that crashed mid-stage-3 (Ctrl-C, OOM, deriver LLM failure) lost every cache entry the aggregator had computed, so the next walk re-aggregated from scratch even on unchanged content. Fixed by threading per-section `persist_cache` callbacks through `aggregate_all` and `derive_all`, mirroring stage 2's existing `_persist` pattern. 2. Stage 4 had no cache layer at all. Derivation ran unconditionally on every walk regardless of whether upstream sections had changed. Added a `CachedDerivation` scope keyed on a hash of upstream bodies, with a `revised` flag so `--review` walks don't silently inherit a body produced without the critic loop. Cache schema bumped to v2 (adds `derivation.json` + `introspection.json` files; v1 entries load to empty so upgraded wikis re-extract on first walk). Short-circuit predicate requires all five conditions: caching on, non-empty repo, every file cache-hit or specialized-extracted, introspection scope unchanged, *and* every primary/derivative section covered by a fresh cache entry. The last condition is what keeps a prior-walk crash from freezing stale section bodies on disk forever — without it the predicate would short-circuit past sections that never got aggregated. Plan documented in `docs/plan-a-cache-persistence.md`. Tests: 212 passing, 93% coverage. New coverage spans the per-stage save helpers, derivation cache hit/miss/revised-mismatch, mid-stage-3 crash survival, scope-change defeating the early-out, and partial/stale aggregation/derivation caches defeating the early-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
There was a problem hiding this comment.
Pull request overview
Adds stage-scoped cache persistence and a stage-3/4 short-circuit so unchanged walks can avoid downstream LLM work, extending the existing cache pipeline in wikifi.
Changes:
- Split cache persistence into per-stage save helpers and added new derivation/introspection cache scopes with schema version
v2. - Added short-circuit evaluation in the orchestrator to skip aggregation/derivation when extraction is fully reusable and the prior walk scope still matches.
- Updated CLI reporting and expanded tests around cache round-trips, per-stage persistence, and short-circuit behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
wikifi/orchestrator.py |
Wires new cache save helpers, introspection scope hashing, and full-cache short-circuit into the walk pipeline. |
wikifi/deriver.py |
Adds derivation cache lookup/recording, cache-aware stats, and derivative freshness checks. |
wikifi/cli.py |
Changes walk output to show skipped downstream stages and a “No changes detected” message. |
wikifi/cache.py |
Introduces derivation/introspection cache models, new cache files, per-scope save helpers, and hash helpers. |
wikifi/aggregator.py |
Adds per-section cache persistence callback support and helper logic for aggregation-cache freshness checks. |
tests/test_orchestrator.py |
Adds short-circuit and cache freshness regression coverage for the orchestrator. |
tests/test_deriver.py |
Adds derivation cache hit/miss, review-mode, and persist-callback tests. |
tests/test_cache.py |
Adds cache round-trip, per-scope save, reset, hashing, and version-bump tests. |
tests/test_aggregator.py |
Adds per-section persistence and crash-resume tests for aggregation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+303
to
+306
| if not notes: | ||
| # Empty-note sections render the same placeholder every walk | ||
| # — no aggregation work to short-circuit around. | ||
| continue |
Comment on lines
+169
to
+172
| for section in DERIVATIVE_SECTIONS: | ||
| upstream_bodies = _collect_upstream(layout, section) | ||
| if not upstream_bodies: | ||
| continue |
Comment on lines
+277
to
+279
| no_llm_files = extraction.cache_hits + extraction.specialized_files | ||
| if no_llm_files != extraction.files_seen: | ||
| return False |
Comment on lines
+277
to
+279
| no_llm_files = extraction.cache_hits + extraction.specialized_files | ||
| if no_llm_files != extraction.files_seen: | ||
| return False |
Comment on lines
+146
to
+150
| cache.record_derivation( | ||
| section.id, | ||
| upstream_hash=upstream_hash, | ||
| body=body, | ||
| revised=revised, |
| if report.fully_cached: | ||
| table.add_row( | ||
| "3. Aggregation", | ||
| "[dim]skipped — every file cache-hit and introspection scope unchanged[/dim]", |
Comment on lines
+137
to
+140
| stats.review_outcomes.append(outcome) | ||
| if outcome.revised: | ||
| stats.sections_revised += 1 | ||
| revised = True |
Comment on lines
+117
to
+120
| records whether the critic + reviser loop ran on this body, so a | ||
| cached body produced under ``--review`` is not silently reused on a | ||
| subsequent walk that turns review off (the bodies are similar but | ||
| the contract differs). |
Closes 8 review comments on PR #16 covering correctness gaps in the stage-3/4 short-circuit and the derivation/extraction caches. Cache short-circuit predicate - Aggregator and deriver now record an empty-state cache entry when notes / upstream bodies are empty, and the ``fully_cached`` checks no longer auto-skip those sections. Without this, deleting a contributing file between walks would let the next walk skip stage 3 while leaving the prior walk's prose on disk indefinitely. (#1, #2) - Extractor stops recording an extraction cache entry when any chunk failed — partial findings would otherwise satisfy the short-circuit predicate while masking lost work. (#3) - ``specialized_files`` now increments only after the specialized extractor returns successfully, so files whose extractor raised no longer pad the "no LLM-side work" total used by the predicate. (#4) Derivation cache hygiene - Deriver no longer records a cache entry when ``provider.complete_json`` fell back to the error body — a transient stage-4 outage would otherwise be replayed (or short-circuited around) on later walks. (#5) - ``revised`` semantic is now "review loop ran on this body" rather than "reviser changed text." A ``--review`` walk whose critic accepts unchanged still records ``revised=True`` so the next ``--review`` walk's freshness check accepts the cache entry. ``CachedDerivation`` docstring rewritten to match the actual ``lookup_derivation`` asymmetry. ``sections_revised`` on the cache-replay path no longer inherits the cached flag — it counts reviser work performed in this walk only. (#7, #8) CLI - Skip-reason row mentions both cache-hit and specialized files so the message stays accurate on repos dominated by SQL/OpenAPI/etc. (#6) Regression coverage - 10 new tests pin each fix: deletion-between-walks scenario, partial-chunk-failure cache hygiene, failed-specialized counter ordering, derivation error-body cache exclusion, ``--review`` walk with critic accepting unchanged, empty-state predicate behavior on both stages, and ``sections_revised`` staying zero on cache replay. - 222 tests pass, 93% coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Building on Plan A's cache layer, the aggregator gains a third path between full cache hit and full rewrite. When a section's finding set overlaps the cached set within a configurable threshold, the LLM is asked to *edit* the cached body around the added/removed delta rather than re-derive the section from scratch. Preserves established prose, addresses the "potentially omitting or changing key details" concern, and cuts tokens on partially-changed walks. Per-section decision tree (in priority order): 1. **Cache hit** — notes_hash matches cached entry exactly. Re-render cached bundle, no LLM call. (Plan A behavior, unchanged.) 2. **Unchanged** — finding ids match but notes_hash differs (a source line shifted, summary changed, etc.). Refresh the cache key, re- render the cached body, no LLM call. 3. **Surgical** — symmetric difference of finding ids is at or below `surgical_edit_threshold` (default 0.3). Send cached body + delta to LLM, merge edit with cached claims (drop removed, add new). 4. **Rewrite** — too much churn (or no prior body to anchor on). Run the original from-scratch aggregation prompt. (Plan A behavior.) Identity: each note gets a stable `finding_id = sha256(file::section_id::finding_text)[:16]`. The cache stores `finding_ids` aligned with note order so claim source_indices map back to ids. Reword of finding text → new id (semantically a delete-plus-insert, since prose written around the old wording can no longer be assumed grounded). Citation re-anchoring is mechanical: cached claims minus `removed_claim_indices` plus `new_claims` resolved against the added notes only. Contradictions are fully replaced from the model output since they're a small set and cleaner to re-emit than diff. New module `wikifi/surgical.py`: - `compute_finding_id` / `note_finding_ids` (in cache.py — adjacent to the other hash helpers) - `SectionChange` dataclass + `classify_section_change` for the diff - `SurgicalEdit` Pydantic schema + `SURGICAL_SYSTEM_PROMPT` - `surgical_aggregate` runs the LLM call and merges the result Cache schema bumped v2 → v3 to add `finding_ids: list[str]` to `CachedSection`. v2 entries load to empty so an upgraded wiki re-aggregates on the first walk; subsequent walks light up surgical. Configuration: - `Settings.use_surgical_edits: bool = True` - `Settings.surgical_edit_threshold: float = 0.3` - `wikifi walk --no-surgical` opt-out flag Walk report adds `sections_edited` and `sections_rewritten` columns alongside the existing `sections_cached`. Tests: 231 passing, 93% coverage. New `tests/test_surgical.py` covers: - `compute_finding_id` / `note_finding_ids` determinism - classifier truth table (no-cache, legacy v2, identical sets, small delta, large delta, threshold exactly equal, disabled threshold) - `surgical_aggregate` claim merging (drop removed, resolve new against added notes, replace contradictions, ignore bad indices) - aggregator decision tree records `finding_ids` on every cache write - unchanged-finding-ids path takes no LLM call and refreshes notes_hash - surgical path selected for small delta; rewrite for large delta - `use_surgical_edits=False` collapses to Plan A's two-path tree - surgical-path LLM failure falls back to rewrite cleanly - stability: when LLM honors the contract, unchanged paragraphs are byte-preserved through the merge Plan documented in `docs/plan-b-surgical-aggregation.md`. Stacked on Plan A (#16). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First walk under cache schema v3 — the v2 cache from prior walks invalidates, so every section took the rewrite path (sections_rewritten=8, sections_edited=0). Subsequent walks with small per-file changes will exercise the surgical edit path. Walk stats: - Stage 1 introspection: include=1 exclude=12 langs=Python - Stage 2 extraction: 31 files seen, 29 contributed, 277 findings - Stage 3 aggregation: 8 sections written (all rewrite path) - Stage 4 derivation: personas + user_stories OK; diagrams hit a json_invalid response from the model (long mermaid output truncated by the SDK's parse layer) and fell back to upstream-bodies-preserved output — the deriver's per-section catch behaving as designed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses every distinct concern raised by the AI review on PR #17. Comment IDs in parentheses; many were duplicated across the two review passes (pre-rebase + post-rebase). Identity / cache schema - Align finding_id length contract with the 12-char implementation. The Hard Specifications, the `compute_finding_id` docstring, and the Cache Entities table all said 16 hex chars but the function delegates to `fingerprint.hash_text` which truncates to FINGERPRINT_LENGTH=12. Updated the docs (cache.py, hard_specifications.md) to match what the code actually does — the wider hash isn't needed at typical scale and reusing one constant keeps the system coherent. (3183975867, 3184733330, 3184733407) - `note_finding_ids` now returns "" when a note is missing `file` or `finding`. Hashing empty inputs would otherwise produce a stable non-empty id and let two malformed notes from different walks compare equal as "unchanged", routing the section onto the cache/surgical paths despite having no real identity to anchor on. The classifier picks up the slack: any "" id in either set forces "rewrite" so two malformed notes can't accidentally cancel out via set equality. (3183975908, 3184733368) - Rewrote `CachedSection.finding_ids` docstring. The old text claimed the list aligned with `claim.source_indices`, but cached claims store fully-resolved SourceRefs, not indices. Documented the actual consumer: section-delta classification by the surgical classifier. (3184733384) Aggregator decision tree - Fixed Path 2's cache lie: stopped refreshing `notes_hash` to match live notes when the unchanged-finding-ids fast path fires. The cached claims still carry SourceRefs from the prior walk's notes (file, lines, fingerprint), and updating the cache key without refreshing those refs would let `aggregation_fully_cached` flag the section as fresh and let the orchestrator's stage-3 short-circuit lock the drifting citations in place. Path 2 still skips the LLM call this walk; the cache key only refreshes when a real Path 4 rewrite runs. (3183975957) - Clarified `use_surgical_edits` docs in both `aggregator.aggregate_all` and `Settings`. The flag only gates Path 3 (the LLM-side surgical edit); the no-LLM cache-reuse paths (full cache hit and unchanged- finding-ids re-render) still fire regardless. The old text claimed the flag collapsed the tree to Plan A's two paths, which never matched the actual code. (3183975934, 3184733492, 3184733517) Surgical helpers - Renamed `_resolve_removed_claim_texts` to `_removed_finding_descriptors` and rewrote the docstring to match the implementation: it builds the removed-notes payload as `{finding_id: ...}` dicts and does not use the `cached` parameter (the cache doesn't store original note bodies, only ids). The old name promised richer reconstruction. (3184733469) - `SectionChange.churn_ratio` now returns 1.0 when `total_live == 0` and there are removed ids, mirroring the classifier's `max(total_live, 1)` guard. The old 0.0 return for the remove-everything case would mislead any downstream caller checking `churn_ratio` directly. (3184733532) Wiki artifact - Restored `.wikifi/diagrams.md` to its pre-chore-commit version. The chore commit (refresh wiki via Plan B walk) committed an LLM JSON truncation as the diagrams body — `Derivation failed for **Diagrams** (... Invalid JSON: EOF while parsing ...)` rather than actual Mermaid. Forward-fix approach (revert just the broken file) rather than amending the chore commit; the next walk against this repo will regenerate it cleanly. (3183975985, 3184733450) Regression tests added - `note_finding_ids` returns "" for malformed notes (missing/blank file or finding, including null `finding`). - `classify_section_change` force-rewrites when any finding_id is empty in either set, even when the sets look identical. - `SectionChange.churn_ratio` is 1.0 for the remove-everything edge case and 0.0 for the truly-empty case. - Aggregator's Path 2 leaves the cache entry's `notes_hash` alone and `aggregation_fully_cached` correctly refuses to flag the section as fresh. Gate: 245 passed, 94% coverage, lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(aggregator): surgical edits when only some findings change (Plan B)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
derivation.json) with arevisedflag so--reviewwalks don't silently inherit non-revised bodies.wikifi walkagainst an unchanged repo with a healthy cache now skips stages 3 & 4 entirely — no LLM calls, no full-section rewrites, walk reports "No changes detected. Wiki is current."Plan: docs/plan-a-cache-persistence.md
Why
Re-walking the wikifi repo with no changes used to regenerate every primary section even though the per-file extraction cache was 100% hit, because (a) stage 3's cache updates were only persisted at end-of-walk so any interruption lost them, and (b) stage 4 had no cache at all. CI/iterative-walk use cases were burning LLM calls on a no-op.
What's new
CachedDerivation+CachedIntrospectiondataclasses; newderivation.jsonandintrospection.jsonfiles alongside the existingextraction.jsonandaggregation.json.save_extraction,save_aggregation,save_derivation,save_introspectionso each stage's persist callback only writes its own scope.aggregate_all/derive_allaccept apersist_cachecallback invoked after every successful section update.orchestrator._evaluate_short_circuit) requires five conditions:aggregation_fully_cached(layout, cache)/derivation_fully_cached(layout, cache, *, review)helpers encapsulate condition (5). They guard the partial-prior-walk case where extraction is fully populated but section bodies on disk are stale from a crashed stage 3.fully_cached.Test plan
make test— 212 passed, 93% coveragemake lint— cleanCachedDerivationround-trip;revised-flag asymmetryCachedIntrospectionround-trip; scope-hash stability across list orderaggregate_all/derive_allpersist_cachecallbacks fire per sectionderive_allcache hit / miss / revised-mismatchfully_cached=True)notes_hashdefeats the short-circuitmake walktwice on this repo — second walk should report "No changes detected"🤖 Generated with Claude Code