feat(core): port the indexer to backend-neutral write helpers#19
Merged
Conversation
Third step of the Kuzu -> DuckDB migration. The indexer no longer emits raw Cypher — every read and write goes through GraphDB helper methods that each backend implements in its own dialect. End result: `CGH_DB=duckdb cgh index` works end-to-end. What lands: - codegraph/core/protocol.py: GraphDB grew six write/read helpers. upsert_node, ensure_edge, purge_file_data, delete_file_completely, find_node_keys, query_node_field, list_node_fields. Each one matches a specific pattern the indexer used to write in Cypher. - codegraph/core/graph_model.py: single source of truth for the graph model — NODES (label -> table + key_field) and EDGES (relationship -> table + columns + props). Both adapters consume this to know what tables to write to and how to validate labels. - codegraph/core/db_kuzu.py + db_duckdb.py: the six new helpers implemented on both adapters. Kuzu uses Cypher with parameterised queries; DuckDB uses parameterised SQL with INSERT ... ON CONFLICT for the upsert semantics. Edge tables use INSERT ... ON CONFLICT DO NOTHING to match Kuzu's MERGE-without-set semantics. Both backends validate the label / edge_type argument against the graph_model registry so callers can't sneak arbitrary identifiers into the query string. - codegraph/indexer.py: every MERGE / MATCH / DETACH DELETE call replaced with a helper call. The Cypher dialect is gone from indexer.py entirely — `grep -nE "conn\.execute" indexer.py` is empty. Stops importing kuzu. _resolve_calls and _resolve_inherits stay name-based: find_node_keys returns every matching id, ensure_edge writes one edge per. Same best-effort semantics as before, both backends. Endpoint -> handler IMPLEMENTED_BY linking now filters by file id prefix in Python since neither backend has a "MATCH WHERE name = ? AND file_path = ?" helper yet (good candidate for the next PR). Markdown DEFINES_SECTION + CONTAINS_SECTION + MD_LINKS_TO + MD_REFS_SYMBOL + MD_REFS_CLASS all ported. The ENDS WITH match in MD_LINKS_TO downgrades to exact path match for now — a suffix-match helper can come later if it turns out to matter. - tests/test_core/test_graph_helpers.py: 22 parametrized tests running every helper against both backends. upsert_node, ensure_edge, purge_file_data, find_node_keys all covered. The same fixture runs twice via pytest.fixture(params=...) so any divergence between Kuzu and DuckDB shows up as a test failure rather than at runtime. End-to-end verification on a 2-file Python repo: Kuzu: 2 files, 3 functions, 1 CALLS edge, 3 DEFINES_FN edges DuckDB: 2 files, 3 functions, 1 CALLS edge, 3 DEFINES_FN edges Test count: 252 -> 274 (+22 parity tests, all existing still green). What's NOT in this PR (future migration work): - MCP tools (server/tools_*.py) still emit Cypher. They use raw conn.execute and need their own port pass. - federation.py still uses Kuzu APIs directly via .raw — that's the next dedicated PR in the chain. - cli/commands_monitor.py stats queries are Cypher; minor port to add to the queries PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third PR in the Kuzu → DuckDB migration. The indexer no longer emits raw Cypher — every read and write goes through `GraphDB` helper methods. End result: `CGH_DB=duckdb cgh index` works end-to-end.
What lands
End-to-end verification
```bash
Kuzu (default)
cgh index --root /tmp/sample
Files: 2, Functions: 3, CALLS: 1, DEFINES_FN: 3
DuckDB
CGH_DB=duckdb cgh index --root /tmp/sample
Files: 2, Functions: 3, CALLS: 1, DEFINES_FN: 3 (identical)
```
Test count
252 → 274 (+22 parity tests, all existing still green).
Not in this PR (follow-ups)