feat(cache): incremental persistence + full-cache-hit short-circuit (Plan A) by codeninja · Pull Request #16 · codeninja/wikifi

codeninja · 2026-05-04T16:44:05Z

Summary

Stages 3 & 4 now persist their cache entries incrementally, so a Ctrl-C / OOM mid-walk no longer rewinds all aggregation work to the end of stage 2.
Stage 4 gained its own cache layer (derivation.json) with a revised flag so --review walks don't silently inherit non-revised bodies.
A wikifi walk against an unchanged repo with a healthy cache now skips stages 3 & 4 entirely — no LLM calls, no full-section rewrites, walk reports "No changes detected. Wiki is current."
Cache schema bumped v1 → v2; v1 entries load to empty (one-time re-extraction on upgrade).

Why

Re-walking the wikifi repo with no changes used to regenerate every primary section even though the per-file extraction cache was 100% hit, because (a) stage 3's cache updates were only persisted at end-of-walk so any interruption lost them, and (b) stage 4 had no cache at all. CI/iterative-walk use cases were burning LLM calls on a no-op.

What's new

Cache schema v2: CachedDerivation + CachedIntrospection dataclasses; new derivation.json and introspection.json files alongside the existing extraction.json and aggregation.json.
Per-stage save helpers: save_extraction, save_aggregation, save_derivation, save_introspection so each stage's persist callback only writes its own scope.
aggregate_all / derive_all accept a persist_cache callback invoked after every successful section update.
Short-circuit predicate (orchestrator._evaluate_short_circuit) requires five conditions:
1. cache on
2. non-empty repo
3. every stage-2 file was cache-hit or specialized
4. introspection scope hash matches prior walk
5. every primary section's aggregation cache + every derivative section's derivation cache is fresh against current notes / upstream bodies
aggregation_fully_cached(layout, cache) / derivation_fully_cached(layout, cache, *, review) helpers encapsulate condition (5). They guard the partial-prior-walk case where extraction is fully populated but section bodies on disk are stale from a crashed stage 3.
CLI walk report collapses to "skipped — every file cache-hit and introspection scope unchanged" rows when fully_cached.

Test plan

make test — 212 passed, 93% coverage
make lint — clean
New tests cover:
- per-stage save helpers don't touch other scope files
- CachedDerivation round-trip; revised-flag asymmetry
- CachedIntrospection round-trip; scope-hash stability across list order
- aggregate_all / derive_all persist_cache callbacks fire per section
- aggregation cache survives mid-stage-3 simulated crash (sections that completed are still on disk)
- derive_all cache hit / miss / revised-mismatch
- full-cache-hit second walk → 0 LLM calls in stages 3 & 4 (fully_cached=True)
- introspection scope change defeats the short-circuit
- missing aggregation cache entry defeats the short-circuit (the partial-prior-walk bug the predicate strengthening guards)
- stale aggregation notes_hash defeats the short-circuit
- missing derivation cache entry defeats the short-circuit
- first walk on a fresh wiki never short-circuits
Manual: make walk twice on this repo — second walk should report "No changes detected"

🤖 Generated with Claude Code

When `wikifi walk` runs twice with no source changes, stages 3 and 4 now skip entirely — no LLM calls, no full-section rewrites, and the on-disk wiki is treated as already current. Restores cache integrity across interrupted walks. Two distinct bugs fixed together: 1. Aggregation cache updates were only persisted at the end of stage 4. A walk that crashed mid-stage-3 (Ctrl-C, OOM, deriver LLM failure) lost every cache entry the aggregator had computed, so the next walk re-aggregated from scratch even on unchanged content. Fixed by threading per-section `persist_cache` callbacks through `aggregate_all` and `derive_all`, mirroring stage 2's existing `_persist` pattern. 2. Stage 4 had no cache layer at all. Derivation ran unconditionally on every walk regardless of whether upstream sections had changed. Added a `CachedDerivation` scope keyed on a hash of upstream bodies, with a `revised` flag so `--review` walks don't silently inherit a body produced without the critic loop. Cache schema bumped to v2 (adds `derivation.json` + `introspection.json` files; v1 entries load to empty so upgraded wikis re-extract on first walk). Short-circuit predicate requires all five conditions: caching on, non-empty repo, every file cache-hit or specialized-extracted, introspection scope unchanged, *and* every primary/derivative section covered by a fresh cache entry. The last condition is what keeps a prior-walk crash from freezing stale section bodies on disk forever — without it the predicate would short-circuit past sections that never got aggregated. Plan documented in `docs/plan-a-cache-persistence.md`. Tests: 212 passing, 93% coverage. New coverage spans the per-stage save helpers, derivation cache hit/miss/revised-mismatch, mid-stage-3 crash survival, scope-change defeating the early-out, and partial/stale aggregation/derivation caches defeating the early-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds stage-scoped cache persistence and a stage-3/4 short-circuit so unchanged walks can avoid downstream LLM work, extending the existing cache pipeline in wikifi.

Changes:

Split cache persistence into per-stage save helpers and added new derivation/introspection cache scopes with schema version v2.
Added short-circuit evaluation in the orchestrator to skip aggregation/derivation when extraction is fully reusable and the prior walk scope still matches.
Updated CLI reporting and expanded tests around cache round-trips, per-stage persistence, and short-circuit behavior.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`wikifi/orchestrator.py`	Wires new cache save helpers, introspection scope hashing, and full-cache short-circuit into the walk pipeline.
`wikifi/deriver.py`	Adds derivation cache lookup/recording, cache-aware stats, and derivative freshness checks.
`wikifi/cli.py`	Changes walk output to show skipped downstream stages and a “No changes detected” message.
`wikifi/cache.py`	Introduces derivation/introspection cache models, new cache files, per-scope save helpers, and hash helpers.
`wikifi/aggregator.py`	Adds per-section cache persistence callback support and helper logic for aggregation-cache freshness checks.
`tests/test_orchestrator.py`	Adds short-circuit and cache freshness regression coverage for the orchestrator.
`tests/test_deriver.py`	Adds derivation cache hit/miss, review-mode, and persist-callback tests.
`tests/test_cache.py`	Adds cache round-trip, per-scope save, reset, hashing, and version-bump tests.
`tests/test_aggregator.py`	Adds per-section persistence and crash-resume tests for aggregation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        if not notes:
+            # Empty-note sections render the same placeholder every walk
+            # — no aggregation work to short-circuit around.
+            continue


+    for section in DERIVATIVE_SECTIONS:
+        upstream_bodies = _collect_upstream(layout, section)
+        if not upstream_bodies:
+            continue


+    no_llm_files = extraction.cache_hits + extraction.specialized_files
+    if no_llm_files != extraction.files_seen:
+        return False


+    no_llm_files = extraction.cache_hits + extraction.specialized_files
+    if no_llm_files != extraction.files_seen:
+        return False


+            cache.record_derivation(
+                section.id,
+                upstream_hash=upstream_hash,
+                body=body,
+                revised=revised,


+    if report.fully_cached:
+        table.add_row(
+            "3. Aggregation",
+            "[dim]skipped — every file cache-hit and introspection scope unchanged[/dim]",


            stats.review_outcomes.append(outcome)
            if outcome.revised:
                stats.sections_revised += 1
+                revised = True


+    records whether the critic + reviser loop ran on this body, so a
+    cached body produced under ``--review`` is not silently reused on a
+    subsequent walk that turns review off (the bodies are similar but
+    the contract differs).


Closes 8 review comments on PR #16 covering correctness gaps in the stage-3/4 short-circuit and the derivation/extraction caches. Cache short-circuit predicate - Aggregator and deriver now record an empty-state cache entry when notes / upstream bodies are empty, and the ``fully_cached`` checks no longer auto-skip those sections. Without this, deleting a contributing file between walks would let the next walk skip stage 3 while leaving the prior walk's prose on disk indefinitely. (#1, #2) - Extractor stops recording an extraction cache entry when any chunk failed — partial findings would otherwise satisfy the short-circuit predicate while masking lost work. (#3) - ``specialized_files`` now increments only after the specialized extractor returns successfully, so files whose extractor raised no longer pad the "no LLM-side work" total used by the predicate. (#4) Derivation cache hygiene - Deriver no longer records a cache entry when ``provider.complete_json`` fell back to the error body — a transient stage-4 outage would otherwise be replayed (or short-circuited around) on later walks. (#5) - ``revised`` semantic is now "review loop ran on this body" rather than "reviser changed text." A ``--review`` walk whose critic accepts unchanged still records ``revised=True`` so the next ``--review`` walk's freshness check accepts the cache entry. ``CachedDerivation`` docstring rewritten to match the actual ``lookup_derivation`` asymmetry. ``sections_revised`` on the cache-replay path no longer inherits the cached flag — it counts reviser work performed in this walk only. (#7, #8) CLI - Skip-reason row mentions both cache-hit and specialized files so the message stays accurate on repos dominated by SQL/OpenAPI/etc. (#6) Regression coverage - 10 new tests pin each fix: deletion-between-walks scenario, partial-chunk-failure cache hygiene, failed-specialized counter ordering, derivation error-body cache exclusion, ``--review`` walk with critic accepting unchanged, empty-state predicate behavior on both stages, and ``sections_revised`` staying zero on cache replay. - 222 tests pass, 93% coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Building on Plan A's cache layer, the aggregator gains a third path between full cache hit and full rewrite. When a section's finding set overlaps the cached set within a configurable threshold, the LLM is asked to *edit* the cached body around the added/removed delta rather than re-derive the section from scratch. Preserves established prose, addresses the "potentially omitting or changing key details" concern, and cuts tokens on partially-changed walks. Per-section decision tree (in priority order): 1. **Cache hit** — notes_hash matches cached entry exactly. Re-render cached bundle, no LLM call. (Plan A behavior, unchanged.) 2. **Unchanged** — finding ids match but notes_hash differs (a source line shifted, summary changed, etc.). Refresh the cache key, re- render the cached body, no LLM call. 3. **Surgical** — symmetric difference of finding ids is at or below `surgical_edit_threshold` (default 0.3). Send cached body + delta to LLM, merge edit with cached claims (drop removed, add new). 4. **Rewrite** — too much churn (or no prior body to anchor on). Run the original from-scratch aggregation prompt. (Plan A behavior.) Identity: each note gets a stable `finding_id = sha256(file::section_id::finding_text)[:16]`. The cache stores `finding_ids` aligned with note order so claim source_indices map back to ids. Reword of finding text → new id (semantically a delete-plus-insert, since prose written around the old wording can no longer be assumed grounded). Citation re-anchoring is mechanical: cached claims minus `removed_claim_indices` plus `new_claims` resolved against the added notes only. Contradictions are fully replaced from the model output since they're a small set and cleaner to re-emit than diff. New module `wikifi/surgical.py`: - `compute_finding_id` / `note_finding_ids` (in cache.py — adjacent to the other hash helpers) - `SectionChange` dataclass + `classify_section_change` for the diff - `SurgicalEdit` Pydantic schema + `SURGICAL_SYSTEM_PROMPT` - `surgical_aggregate` runs the LLM call and merges the result Cache schema bumped v2 → v3 to add `finding_ids: list[str]` to `CachedSection`. v2 entries load to empty so an upgraded wiki re-aggregates on the first walk; subsequent walks light up surgical. Configuration: - `Settings.use_surgical_edits: bool = True` - `Settings.surgical_edit_threshold: float = 0.3` - `wikifi walk --no-surgical` opt-out flag Walk report adds `sections_edited` and `sections_rewritten` columns alongside the existing `sections_cached`. Tests: 231 passing, 93% coverage. New `tests/test_surgical.py` covers: - `compute_finding_id` / `note_finding_ids` determinism - classifier truth table (no-cache, legacy v2, identical sets, small delta, large delta, threshold exactly equal, disabled threshold) - `surgical_aggregate` claim merging (drop removed, resolve new against added notes, replace contradictions, ignore bad indices) - aggregator decision tree records `finding_ids` on every cache write - unchanged-finding-ids path takes no LLM call and refreshes notes_hash - surgical path selected for small delta; rewrite for large delta - `use_surgical_edits=False` collapses to Plan A's two-path tree - surgical-path LLM failure falls back to rewrite cleanly - stability: when LLM honors the contract, unchanged paragraphs are byte-preserved through the merge Plan documented in `docs/plan-b-surgical-aggregation.md`. Stacked on Plan A (#16). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

First walk under cache schema v3 — the v2 cache from prior walks invalidates, so every section took the rewrite path (sections_rewritten=8, sections_edited=0). Subsequent walks with small per-file changes will exercise the surgical edit path. Walk stats: - Stage 1 introspection: include=1 exclude=12 langs=Python - Stage 2 extraction: 31 files seen, 29 contributed, 277 findings - Stage 3 aggregation: 8 sections written (all rewrite path) - Stage 4 derivation: personas + user_stories OK; diagrams hit a json_invalid response from the model (long mermaid output truncated by the SDK's parse layer) and fell back to upstream-bodies-preserved output — the deriver's per-section catch behaving as designed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses every distinct concern raised by the AI review on PR #17. Comment IDs in parentheses; many were duplicated across the two review passes (pre-rebase + post-rebase). Identity / cache schema - Align finding_id length contract with the 12-char implementation. The Hard Specifications, the `compute_finding_id` docstring, and the Cache Entities table all said 16 hex chars but the function delegates to `fingerprint.hash_text` which truncates to FINGERPRINT_LENGTH=12. Updated the docs (cache.py, hard_specifications.md) to match what the code actually does — the wider hash isn't needed at typical scale and reusing one constant keeps the system coherent. (3183975867, 3184733330, 3184733407) - `note_finding_ids` now returns "" when a note is missing `file` or `finding`. Hashing empty inputs would otherwise produce a stable non-empty id and let two malformed notes from different walks compare equal as "unchanged", routing the section onto the cache/surgical paths despite having no real identity to anchor on. The classifier picks up the slack: any "" id in either set forces "rewrite" so two malformed notes can't accidentally cancel out via set equality. (3183975908, 3184733368) - Rewrote `CachedSection.finding_ids` docstring. The old text claimed the list aligned with `claim.source_indices`, but cached claims store fully-resolved SourceRefs, not indices. Documented the actual consumer: section-delta classification by the surgical classifier. (3184733384) Aggregator decision tree - Fixed Path 2's cache lie: stopped refreshing `notes_hash` to match live notes when the unchanged-finding-ids fast path fires. The cached claims still carry SourceRefs from the prior walk's notes (file, lines, fingerprint), and updating the cache key without refreshing those refs would let `aggregation_fully_cached` flag the section as fresh and let the orchestrator's stage-3 short-circuit lock the drifting citations in place. Path 2 still skips the LLM call this walk; the cache key only refreshes when a real Path 4 rewrite runs. (3183975957) - Clarified `use_surgical_edits` docs in both `aggregator.aggregate_all` and `Settings`. The flag only gates Path 3 (the LLM-side surgical edit); the no-LLM cache-reuse paths (full cache hit and unchanged- finding-ids re-render) still fire regardless. The old text claimed the flag collapsed the tree to Plan A's two paths, which never matched the actual code. (3183975934, 3184733492, 3184733517) Surgical helpers - Renamed `_resolve_removed_claim_texts` to `_removed_finding_descriptors` and rewrote the docstring to match the implementation: it builds the removed-notes payload as `{finding_id: ...}` dicts and does not use the `cached` parameter (the cache doesn't store original note bodies, only ids). The old name promised richer reconstruction. (3184733469) - `SectionChange.churn_ratio` now returns 1.0 when `total_live == 0` and there are removed ids, mirroring the classifier's `max(total_live, 1)` guard. The old 0.0 return for the remove-everything case would mislead any downstream caller checking `churn_ratio` directly. (3184733532) Wiki artifact - Restored `.wikifi/diagrams.md` to its pre-chore-commit version. The chore commit (refresh wiki via Plan B walk) committed an LLM JSON truncation as the diagrams body — `Derivation failed for **Diagrams** (... Invalid JSON: EOF while parsing ...)` rather than actual Mermaid. Forward-fix approach (revert just the broken file) rather than amending the chore commit; the next walk against this repo will regenerate it cleanly. (3183975985, 3184733450) Regression tests added - `note_finding_ids` returns "" for malformed notes (missing/blank file or finding, including null `finding`). - `classify_section_change` force-rewrites when any finding_id is empty in either set, even when the sets look identical. - `SectionChange.churn_ratio` is 1.0 for the remove-everything edge case and 0.0 for the truly-empty case. - Aggregator's Path 2 leaves the cache entry's `notes_hash` alone and `aggregation_fully_cached` correctly refuses to flag the section as fresh. Gate: 245 passed, 94% coverage, lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(aggregator): surgical edits when only some findings change (Plan B)

codeninja mentioned this pull request May 4, 2026

feat(aggregator): surgical edits when only some findings change (Plan B) #17

Merged

4 tasks

codeninja requested a review from Copilot May 4, 2026 19:24

Copilot started reviewing on behalf of codeninja May 4, 2026 19:25 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

codeninja and others added 5 commits May 4, 2026 14:56

Merge pull request #17 from codeninja/feat/plan-b-surgical-aggregation

58ad367

feat(aggregator): surgical edits when only some findings change (Plan B)

codeninja merged commit faa7bf3 into main May 4, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cache): incremental persistence + full-cache-hit short-circuit (Plan A)#16

feat(cache): incremental persistence + full-cache-hit short-circuit (Plan A)#16
codeninja merged 6 commits into
mainfrom
feat/plan-a-cache-persistence

codeninja commented May 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

codeninja commented May 4, 2026

Summary

Why

What's new

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants