Skip to content

feat(core): port the indexer to backend-neutral write helpers#19

Merged
joy-software merged 1 commit into
developfrom
feature/duckdb-queries
Jun 1, 2026
Merged

feat(core): port the indexer to backend-neutral write helpers#19
joy-software merged 1 commit into
developfrom
feature/duckdb-queries

Conversation

@joy-software
Copy link
Copy Markdown
Contributor

Summary

Third PR in the Kuzu → DuckDB migration. The indexer no longer emits raw Cypher — every read and write goes through `GraphDB` helper methods. End result: `CGH_DB=duckdb cgh index` works end-to-end.

What lands

File What's new
`core/protocol.py` 6 new helpers on `GraphDB`: `upsert_node`, `ensure_edge`, `purge_file_data`, `delete_file_completely`, `find_node_keys`, `query_node_field`, `list_node_fields`
`core/graph_model.py` Single source of truth for NODES + EDGES. Both adapters consume this for label/edge validation + table name lookup.
`core/db_kuzu.py` / `db_duckdb.py` All 6 helpers implemented on both backends. DuckDB uses `INSERT ... ON CONFLICT` for upsert + `DO NOTHING` for edges.
`indexer.py` Every `MERGE` / `MATCH` / `DETACH DELETE` replaced with a helper call. Stops importing `kuzu`. `grep -nE "conn\.execute" indexer.py` is empty.
`tests/test_core/test_graph_helpers.py` 22 parametrized tests running every helper against both backends via `pytest.fixture(params=[...])`.

End-to-end verification

```bash

Kuzu (default)

cgh index --root /tmp/sample

Files: 2, Functions: 3, CALLS: 1, DEFINES_FN: 3

DuckDB

CGH_DB=duckdb cgh index --root /tmp/sample

Files: 2, Functions: 3, CALLS: 1, DEFINES_FN: 3 (identical)

```

Test count

252 → 274 (+22 parity tests, all existing still green).

Not in this PR (follow-ups)

  • MCP tools (`server/tools_*.py`) still emit Cypher → next port pass
  • `federation.py` still uses raw Kuzu APIs → dedicated next PR
  • `cli/commands_monitor.py` stats queries → small bundle with the MCP port

Third step of the Kuzu -> DuckDB migration. The indexer no longer
emits raw Cypher — every read and write goes through GraphDB
helper methods that each backend implements in its own dialect.
End result: `CGH_DB=duckdb cgh index` works end-to-end.

What lands:

- codegraph/core/protocol.py: GraphDB grew six write/read helpers.
  upsert_node, ensure_edge, purge_file_data, delete_file_completely,
  find_node_keys, query_node_field, list_node_fields. Each one
  matches a specific pattern the indexer used to write in Cypher.

- codegraph/core/graph_model.py: single source of truth for the
  graph model — NODES (label -> table + key_field) and EDGES
  (relationship -> table + columns + props). Both adapters consume
  this to know what tables to write to and how to validate labels.

- codegraph/core/db_kuzu.py + db_duckdb.py: the six new helpers
  implemented on both adapters. Kuzu uses Cypher with parameterised
  queries; DuckDB uses parameterised SQL with INSERT ... ON CONFLICT
  for the upsert semantics. Edge tables use INSERT ... ON CONFLICT
  DO NOTHING to match Kuzu's MERGE-without-set semantics.

  Both backends validate the label / edge_type argument against the
  graph_model registry so callers can't sneak arbitrary identifiers
  into the query string.

- codegraph/indexer.py: every MERGE / MATCH / DETACH DELETE call
  replaced with a helper call. The Cypher dialect is gone from
  indexer.py entirely — `grep -nE "conn\.execute" indexer.py` is
  empty. Stops importing kuzu.

  _resolve_calls and _resolve_inherits stay name-based: find_node_keys
  returns every matching id, ensure_edge writes one edge per. Same
  best-effort semantics as before, both backends.

  Endpoint -> handler IMPLEMENTED_BY linking now filters by file id
  prefix in Python since neither backend has a "MATCH WHERE name = ?
  AND file_path = ?" helper yet (good candidate for the next PR).

  Markdown DEFINES_SECTION + CONTAINS_SECTION + MD_LINKS_TO +
  MD_REFS_SYMBOL + MD_REFS_CLASS all ported. The ENDS WITH match
  in MD_LINKS_TO downgrades to exact path match for now — a
  suffix-match helper can come later if it turns out to matter.

- tests/test_core/test_graph_helpers.py: 22 parametrized tests
  running every helper against both backends. upsert_node, ensure_edge,
  purge_file_data, find_node_keys all covered. The same fixture runs
  twice via pytest.fixture(params=...) so any divergence between
  Kuzu and DuckDB shows up as a test failure rather than at runtime.

End-to-end verification on a 2-file Python repo:

  Kuzu:    2 files, 3 functions, 1 CALLS edge, 3 DEFINES_FN edges
  DuckDB:  2 files, 3 functions, 1 CALLS edge, 3 DEFINES_FN edges

Test count: 252 -> 274 (+22 parity tests, all existing still green).

What's NOT in this PR (future migration work):

- MCP tools (server/tools_*.py) still emit Cypher. They use raw
  conn.execute and need their own port pass.
- federation.py still uses Kuzu APIs directly via .raw — that's the
  next dedicated PR in the chain.
- cli/commands_monitor.py stats queries are Cypher; minor port to
  add to the queries PR.
@joy-software joy-software merged commit 6abdcb8 into develop Jun 1, 2026
1 check passed
@joy-software joy-software deleted the feature/duckdb-queries branch June 1, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant