feat: premium-grade extraction & wikification engine#15
Conversation
Lands nine premium features positioned in five shared-module groups:
Group A — Caching & resumability
- wikifi/fingerprint.py: stable file/text fingerprints
- wikifi/cache.py: content-addressed extraction + aggregation cache
with atomic persistence; resumability falls out of cache reuse
Group B — Evidence & citations
- wikifi/evidence.py: SourceRef / Claim / Contradiction models
plus a section renderer that threads numbered citations and a
"Conflicts in source" block into the final markdown
Group C — Repo intelligence
- wikifi/repograph.py: regex-driven import graph + FileKind classifier
- wikifi/specialized/{sql,openapi,protobuf,graphql}.py: deterministic
extractors that bypass the LLM for schema files and migrations
Group D — Quality
- wikifi/critic.py: critic + reviser loop with score-based revision
acceptance; CoverageStats helper
- wikifi/report.py + `wikifi report` CLI command for per-section
coverage and quality scoring
Group E — Premium provider
- wikifi/providers/anthropic_provider.py: hosted Anthropic backend
with prompt-cached system prompt, messages.parse-based structured
output, adaptive-thinking + effort mapping, and APIError →
RuntimeError translation
Pipeline wiring:
- extractor: cache lookup + replay, specialized routing, neighbor
context injection, structured SourceRefs on every finding
- aggregator: structured EvidenceBundle output (claims +
contradictions), notes-hash section cache
- deriver: optional critic loop (--review)
- orchestrator: graph build, cache load/save with per-file persist,
anthropic dispatch
- cli: --no-cache, --review, --provider flags; new `report` command
Tests + coverage:
- 156 tests pass (was 88)
- 93% total coverage; every new module ≥ 86%
- Dedicated test files for fingerprint, cache, evidence, repograph,
specialized, critic, report, anthropic_provider
- Existing extractor / aggregator / orchestrator / cli suites
extended for cache, graph, citations, contradictions, anthropic
dispatch, and CLI flag plumbing
See TESTING-AND-DEMO.md for end-to-end demo recipes.
https://claude.ai/code/session_01K3H5GMhcvfc5HB63NhykcL
Third backend alongside Ollama (default) and Anthropic. Selected via
WIKIFI_PROVIDER=openai plus an OPENAI_API_KEY.
Implementation notes:
- Structured output via client.chat.completions.parse — returns a
schema-validated Pydantic instance directly, same protocol contract
as the Anthropic path.
- Prompt caching is automatic (≥ 1024-token prefixes, ~5-10 min). No
cache_control marker needed; system prompt sits at message[0] so the
multi-KB extraction prompt is what the prefix cache catches.
- Reasoning effort: think={"low","medium","high"} routes to
reasoning_effort on o*/gpt-5 models and is stripped on plain models
to avoid future-strict 400s. Reasoning models also receive
max_completion_tokens in place of max_tokens.
- APIError → RuntimeError, mirroring the Anthropic provider so
per-call fallback paths in extractor/aggregator/deriver are
unchanged.
Wiring:
- wikifi/config.py: openai_api_key, openai_base_url, openai_max_tokens
- wikifi/orchestrator.build_provider: dispatches openai with a default
guard (model id stripped to gpt-4o if it doesn't look like a GPT/o-
series id, mirroring the anthropic guard)
- wikifi/cli.py: --provider help text mentions all three options
Tests:
- tests/test_openai_provider.py (10 cases): parse path returns
Pydantic, fallback to validate_json, APIError mapping, text + chat,
reasoning-effort + max_completion_tokens routing on reasoning models,
full (model, think) translation table.
- tests/test_orchestrator.py: build_provider dispatch + model-default
preservation cases.
168 tests pass (was 156); 93% total coverage. Lint clean.
https://claude.ai/code/session_01K3H5GMhcvfc5HB63NhykcL
There was a problem hiding this comment.
Remove business logic from init files.
There was a problem hiding this comment.
The providers should all inherit from a base class and implement the interface.
There was a problem hiding this comment.
Pull request overview
This PR expands wikifi’s pipeline with premium extraction and synthesis capabilities: deterministic parsers for structured artifacts, evidence/citation modeling, cache-backed incremental walks, repository graph context, hosted providers, and a new reporting surface. It fits into the core wiki-generation flow by upgrading each stage of the existing walk → aggregate → derive pipeline rather than introducing a parallel path.
Changes:
- Adds shared premium modules for caching, evidence modeling, repo graph analysis, quality scoring, reporting, and hosted providers.
- Extends extraction/aggregation/derivation orchestration to use caches, structured source refs, specialized extractors, optional review, and provider selection.
- Adds broad test coverage plus user-facing documentation for the new pipeline and commands.
Reviewed changes
Copilot reviewed 35 out of 36 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
wikifi/specialized/sql.py |
Adds deterministic SQL/migration extraction for schema entities, FKs, constraints, and indexes. |
wikifi/specialized/protobuf.py |
Adds deterministic protobuf extraction for messages, services, and RPCs. |
wikifi/specialized/openapi.py |
Adds deterministic OpenAPI extraction with JSON/YAML parsing fallback. |
wikifi/specialized/graphql.py |
Adds deterministic GraphQL SDL extraction for types and root operations. |
wikifi/specialized/__init__.py |
Adds specialized extractor result models and file-kind routing. |
wikifi/report.py |
Adds wiki coverage/quality report generation. |
wikifi/repograph.py |
Adds file classification and lightweight import/reference graph building. |
wikifi/providers/openai_provider.py |
Adds hosted OpenAI provider implementation. |
wikifi/providers/anthropic_provider.py |
Adds hosted Anthropic provider implementation with prompt caching. |
wikifi/orchestrator.py |
Wires cache, repo graph, hosted providers, and review flow into the main pipeline. |
wikifi/fingerprint.py |
Adds short SHA-256-based fingerprint helpers for files/text. |
wikifi/extractor.py |
Extends extraction with caching, source refs, neighbor context, and specialized routing. |
wikifi/evidence.py |
Adds evidence/source/claim models and section rendering helpers. |
wikifi/deriver.py |
Adds optional critic/reviser loop for derivative sections. |
wikifi/critic.py |
Adds critique schemas, review loop, and coverage stats types. |
wikifi/config.py |
Adds premium-pipeline and hosted-provider settings. |
wikifi/cli.py |
Adds report, --no-cache, --review, and provider override support. |
wikifi/cache.py |
Adds extraction/aggregation cache storage and hashing helpers. |
wikifi/aggregator.py |
Extends aggregation to emit evidence bundles, contradictions, and cached section output. |
uv.lock |
Locks new hosted-provider dependencies and transitive packages. |
tests/test_specialized.py |
Adds tests for specialized SQL/OpenAPI/Protobuf/GraphQL routing and parsing. |
tests/test_report.py |
Adds report-generation tests. |
tests/test_repograph.py |
Adds file classification and repo graph tests. |
tests/test_orchestrator.py |
Adds provider dispatch, cache reuse, and review-flow tests. |
tests/test_openai_provider.py |
Adds OpenAI provider behavior tests. |
tests/test_fingerprint.py |
Adds fingerprint helper tests. |
tests/test_extractor.py |
Adds extractor tests for cache hits, source refs, specialized routing, and graph context. |
tests/test_evidence.py |
Adds evidence rendering and deduplication tests. |
tests/test_critic.py |
Adds critic/reviser workflow tests. |
tests/test_cli.py |
Adds CLI tests for report and cache-reset behavior. |
tests/test_cache.py |
Adds cache hit/miss, persistence, and hash tests. |
tests/test_anthropic_provider.py |
Adds Anthropic provider behavior tests. |
tests/test_aggregator.py |
Adds aggregation tests for citations, contradictions, and section caching. |
pyproject.toml |
Declares new hosted-provider dependencies. |
TESTING-AND-DEMO.md |
Documents validation and demo flows for the premium pipeline. |
README.md |
Updates user-facing docs for new commands and architecture. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from wikifi.evidence import SourceRef | ||
| from wikifi.specialized import SpecializedFinding, SpecializedResult | ||
|
|
||
| _TYPE_RE = re.compile(r"^\s*type\s+(\w+)\s*(?:implements\s+[^\{]+)?\{", re.MULTILINE) |
| FileKind.SQL: sql.extract, | ||
| FileKind.MIGRATION: sql.extract_migration, | ||
| FileKind.OPENAPI: openapi.extract, | ||
| FileKind.PROTOBUF: protobuf.extract, |
| specialized_fn = select_specialized(kind) | ||
| if specialized_fn is not None: | ||
| stats.specialized_files += 1 | ||
| try: | ||
| result = specialized_fn(rel.as_posix(), data) |
| for service_name, line in services: | ||
| related = [r for r in rpcs if line <= r[5]] | ||
| bullets = "\n".join( | ||
| f" - `{name}({_arrow(in_msg, in_stream)}) -> {_arrow(out_msg, out_stream)}`" | ||
| for name, in_msg, out_msg, in_stream, out_stream, _ in related[:25] |
| for ln in lines[start:]: | ||
| if ln.startswith("}"): | ||
| break | ||
| out.append(ln) |
| # path/identifier; resolution to a real file is handled by a separate | ||
| # heuristic. | ||
| _PY_IMPORT = re.compile( | ||
| r"^\s*(?:from\s+([A-Za-z_][\w.]*)\s+import|import\s+([A-Za-z_][\w.]*))", |
| def hash_section_notes(notes: list[dict[str, Any]]) -> str: | ||
| """Stable digest of a section's note payload for aggregation cache keys. | ||
|
|
||
| The hash spans only the *content* fields the aggregator actually reads | ||
| (file ref, summary, finding) — not timestamps or per-walk debug fields — | ||
| so regenerating identical notes on a fresh walk reuses the cached body. | ||
| """ | ||
| from wikifi.fingerprint import hash_text | ||
|
|
||
| payload = [ | ||
| { | ||
| "file": n.get("file", ""), | ||
| "summary": n.get("summary", ""), | ||
| "finding": n.get("finding", ""), | ||
| } | ||
| for n in notes | ||
| ] |
| ) | ||
| ) | ||
|
|
||
| summary = f"Migration touches {len(tables)} table(s)." if migration else f"Schema for {len(tables)} table(s)." |
| def render_section_body(bundle: EvidenceBundle) -> str: | ||
| """Render an EvidenceBundle into final markdown. | ||
|
|
||
| The body is appended with a "Sources" footer enumerating every distinct | ||
| source ref across claims and contradictions, plus an explicit | ||
| "Conflicts in source" section if any contradictions were surfaced. | ||
| """ | ||
| parts: list[str] = [] | ||
| if bundle.body.strip(): | ||
| parts.append(bundle.body.strip()) | ||
|
|
||
| if bundle.contradictions: | ||
| parts.append("") | ||
| parts.append("## Conflicts in source") | ||
| parts.append( | ||
| "_The walker found disagreements across files. Migration teams " | ||
| "should resolve these before re-implementation._" | ||
| ) | ||
| for entry in bundle.contradictions: | ||
| parts.append("") | ||
| parts.append(f"- **{entry.summary.strip()}**") | ||
| for position in entry.positions: | ||
| refs = _format_refs(position.sources) | ||
| parts.append(f" - {position.text.strip()} {refs}".rstrip()) | ||
|
|
||
| sources = _enumerate_sources(bundle) | ||
| if sources: | ||
| parts.append("") | ||
| parts.append("## Sources") | ||
| for entry in sources: | ||
| parts.append(f"{entry.index}. `{entry.ref.render()}`") | ||
|
|
||
| return "\n".join(parts).strip() |
| - `walk` — main entry point. Walks the target codebase and produces the wiki content. | ||
| - `--no-cache` — force a clean re-walk; drops the on-disk extraction + aggregation caches. | ||
| - `--review` — run the critic + reviser loop on derivative sections (personas, user stories, diagrams). | ||
| - `--provider {ollama|anthropic}` — override the configured provider for this walk. |
|
While running the wikifi walk : 2026-05-01 21:07:05,065 INFO httpx HTTP Request: POST https://api.anthropic.com/v1/messages "HTTP/1.1 200 OK" Done. Wiki at /home/codeninja/wikifi/.wikifi/ |
Addresses every review comment on PR #15 plus the user-reported walk failure where adaptive thinking exhausted ``max_tokens`` and the Anthropic SDK returned an empty structured response. Human review comments - providers: introduce nominal ``LLMProvider`` ABC; Ollama, Anthropic, OpenAI and the test ``MockProvider`` now inherit from it. Hosted providers share ``format_api_error`` instead of duplicating the helper. - specialized: split the package — ``models.py`` (dataclasses), ``dispatch.py`` (``select(kind, rel_path=…)``); ``__init__.py`` is now a docstring-only marker per the project's no-re-exports rule. Copilot review comments - specialized.dispatch: only route SQL-shaped migrations (``.sql``/``.ddl``) through the SQL parser. Python/JS/Ruby migration scripts (Alembic, Django, Knex) stay on the LLM path. - extractor: honor ``settings.use_specialized_extractors``; wire through ``orchestrator.run_walk``. - specialized.graphql: handle ``extend type Query/Mutation`` and indented closing braces in ``_block_after``; anchor line numbers on the captured name offset so the leading-newline regex artifact no longer points one line above the declaration. - specialized.protobuf: bound each service's RPCs to its own ``{ … }`` block so multi-service files stop attributing later RPCs to the first service. - specialized.sql: count both CREATE and ALTER targets in the migration summary so an ALTER-only migration no longer reports "0 table(s)". - repograph: parse Python relative imports (``from .b import x``, ``from . import helpers``, ``from ..sibling import x``) and resolve them within the package, instead of stripping the leading dots and missing every intra-package edge. - cache.hash_section_notes: include each note's ``sources`` (file, lines, fingerprint) in the digest so cache hits can't replay stale citations after lines or file fingerprints change. - report: derive coverage from the on-disk notes JSONL first; fall back to the cache only when no notes exist. ``wikifi report`` after ``walk --no-cache`` (or after a manual cache wipe) now reports accurate coverage instead of 0%. - evidence.render_section_body: insert per-claim ``[N]`` markers next to matching sentences in the body, with a "Supporting claims" list for paraphrased claims that don't appear verbatim. - README: document the ``--provider openai`` option. Anthropic empty-response bug (user-reported) - anthropic_provider: bump ``DEFAULT_MAX_TOKENS`` 16K → 32K and raise ``settings.anthropic_max_tokens`` default to match. The 16K default was leaving no room for the structured-output block when adaptive thinking ran at ``effort=high``, causing the reported ``empty parsed_output and parse fallback failed`` error on hard sections like ``hard_specifications`` and ``diagrams``. - emit a diagnostic ``RuntimeError`` that names ``stop_reason``, ``output_tokens``, ``max_tokens``, and the relevant tuning knob ("raise max_tokens", "lower think effort") instead of letting a cryptic ``Invalid JSON: EOF`` pydantic error escape. Tests - 180 tests pass (was 168) at 93% coverage. Each fixed bug has a dedicated regression test. Note: bumping the ``hash_section_notes`` shape silently invalidates existing aggregation cache entries on disk. The next walk regenerates them — no action required from users.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 52 out of 55 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Cache files live under ``.wikifi/.cache/`` so they share the wiki's | ||
| git-ignore rules but stay out of the section markdown that *is* committed. |
A `wikifi walk` against a large target was opaque between the "stage 2: extracting" header and the next stage banner — users had no visibility into which file the walker was on or whether it had hung. Add an INFO line per file so the live walk logs read like: - extracting: ./src/billing/orders.py - extracting: ./src/billing/refunds.py - extracting: ./src/main.py The CLI already configures ``logging.basicConfig(level=INFO, …)`` so this surfaces by default; no flag flip required. The line is emitted once per file regardless of route (cache hit, specialized parser, or LLM call) so cache-replay re-walks remain audit-able too.
…init
Addresses Copilot review comment on wikifi/cache.py:21. The premium
cache layer writes to `.wikifi/.cache/`, but the generated
`.wikifi/.gitignore` only ignored `.notes/`, so every walk left
unignored cache files in the target repo — exactly the noise the
wiki contract promises to avoid.
Changes
- Hoist `CACHE_DIRNAME` from `cache.py` to `wiki.py` (next to
`NOTES_DIRNAME` and `WIKI_DIRNAME`) so the layout has one source
of truth and the gitignore template can reference it without
inverting the existing `cache → wiki` import direction. `cache.py`
re-exports the name for backwards compatibility.
- `WikiLayout.cache_dir` property added; `cache.cache_dir(layout)`
delegates to it.
- `DEFAULT_GITIGNORE` now lists both `.notes/` and `.cache/` from a
single `_GITIGNORE_REQUIRED_ENTRIES` tuple so future additions
flow through automatically.
- `initialize()` now calls `_ensure_gitignore()` which:
- writes the full template on a fresh init, AND
- backfills any missing required entries into a pre-existing
`.gitignore` (the legacy ".notes/-only" case from wikis created
before the cache layer landed).
- preserves user-added lines verbatim — only appends what's
missing.
Tests
- 183 tests pass (was 180); 3 new regression tests cover fresh
init, legacy-gitignore backfill (no duplicates on re-run), and
preservation of user-authored extra entries.
- `wikifi/wiki.py` now at 100% coverage.
Follow-up to f1f51b4. The previous commit added `.cache/` to the generated gitignore template and backfilled it on `wikifi init`, but two pre-existing cache JSON files (`.wikifi/.cache/aggregation.json`, `.wikifi/.cache/extraction.json`) were already tracked from the "e2e run" snapshot in ddd193c. Git only honors gitignore for untracked paths, so future walks would still mark those two files as modified despite the new ignore rule. Untrack them here so the gitignore actually takes effect for this repo, and bring `.wikifi/.gitignore` in line with the updated template (the template change only writes on fresh inits — existing wikis upgrade through `_ensure_gitignore` on the next `wikifi init`, but this repo's file was already on disk so it needs the manual sync).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 55 out of 56 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| settings = get_settings() | ||
| provider = build_provider(settings) if score else None | ||
| wiki_report = build_report(layout=layout, provider=provider, score=score) |
| # Same default-swap guard as the Anthropic path: a user opting | ||
| # in to OpenAI shouldn't 404 because the Ollama model id is | ||
| # still in their config. | ||
| model = settings.model if _looks_like_openai_model(settings.model) else "gpt-4o" |
…ent IDs Addresses two new Copilot review comments on PR #15. cli.py:197 — `report --score` ignored target's .wikifi/config.toml - New `load_target_settings(target)` in `wikifi/config.py` reads `<target>/.wikifi/config.toml` and layers its values onto the env-derived defaults. The wiki's own config wins over per-session env vars, matching the contract printed at the top of every scaffolded `config.toml` ("overrides WIKIFI_* environment variables when present"). - `walk`, `chat`, and `report` CLI commands now use it; only `init` stays on `get_settings()` because it's the command that *creates* config.toml. - Allow-list of overridable fields (`provider`, `model`, `ollama_host`) so a stale or hand-edited config can't silently start steering fields the user didn't sign up for. - Malformed TOML logs a warning and falls back to env defaults rather than crashing the command. orchestrator.py:194 — OpenAI model swap clobbered Azure deployment IDs - `_looks_like_openai_model` was an allow-list (gpt-/o1/o3/o4/ft:) that fell back to "gpt-4o" for everything else, including valid Azure-OpenAI deployment names like `prod-gpt4o`, `eastus-chat`, or `my-team-deployment`. With `openai_base_url` now documented for Azure use, the swap silently routed users to the wrong model. - Replaced with `_looks_like_ollama_model` (deny-list): only swaps when the model id obviously looks like an Ollama identifier (`family:tag`), excluding fine-tuned OpenAI models which also contain a colon (`ft:gpt-4o:...`). Anything else passes through — Azure deployment IDs, plain proxy aliases, and untouched OpenAI defaults all keep their configured value. Tests - 189 tests pass (was 183). New regression coverage: - `load_target_settings` happy path, toml-wins-over-env, missing config, and malformed-toml warning paths - Azure deployment ID and fine-tuned OpenAI model both pass through the OpenAI provider builder unchanged - Updated `test_build_provider_returns_openai_when_selected` to use a realistic Ollama model id (`qwen3.6:27b`) since the new heuristic only swaps obviously-Ollama identifiers.
Summary
Lands the nine premium features picked from the architecture review (1, 2, 3, 4, 5, 6, 8, 9, 11), grouped into five shared modules so common functionality is factored out and reused.
wikifi/fingerprint.py,wikifi/cache.pywikifi/evidence.py(SourceRef / Claim / Contradiction / EvidenceBundle + section renderer)wikifi/repograph.py(FileKind + import graph),wikifi/specialized/{sql,openapi,protobuf,graphql}.pywikifi/critic.py,wikifi/report.pywikifi/providers/anthropic_provider.pyWhy this set is "premium"
A migration team that runs the new pipeline gets:
EvidenceBundle; the renderer adds## Sourcesand## Conflicts in sourcesections to each*.mdfile in.wikifi/.(rel_path, sha256(file_bytes)); an unchanged fingerprint replays cached findings without any LLM call. Resumability after a crash is the same mechanism — the cache is persisted after every file finishes.--reviewflag scores personas / user stories / diagrams against the brief and upstream evidence, and re-synthesizes when the score is below threshold.wikifi reportcommand produces a per-section table of files, findings, body size, and (with--score) critic-derived 0-10 scores.cache_control: ephemeralon the multi-KB system prompt makes hosted Claude economical at 10k-file scale; structured output viamessages.parsereturns validated Pydantic instances directly.Module map
Test results
ruff check/ruff format).MockProviderfixture.Test plan
make test— 156 tests pass with ≥ 93% coveragemake lint— cleanmake walk— produces a wiki with## Sources+## Conflicts in sourceblocksmake walkimmediately —cache_hits == files_seenin the walk reportuv run wikifi walk --no-cache— drops the cache and forces a clean re-walkuv run wikifi walk --review—sections_revisedincrements in the Derivation rowuv run wikifi report/uv run wikifi report --score— renders a per-section table*.sql/*.proto/*.graphql/ OpenAPI spec into a target → walk report showsspecialized=N, no LLM call made for those filesWIKIFI_PROVIDER=anthropic ANTHROPIC_API_KEY=... uv run wikifi walk— hits the hosted backend; second call showscache_read_input_tokens > 0See
TESTING-AND-DEMO.mdfor the full demo + verification recipes.https://claude.ai/code/session_01K3H5GMhcvfc5HB63NhykcL
Generated by Claude Code