Skip to content

feat(core): DuckDB backend behind CGH_DB=duckdb env var#18

Merged
joy-software merged 1 commit into
developfrom
feature/duckdb-schema
Jun 1, 2026
Merged

feat(core): DuckDB backend behind CGH_DB=duckdb env var#18
joy-software merged 1 commit into
developfrom
feature/duckdb-schema

Conversation

@joy-software
Copy link
Copy Markdown
Contributor

Summary

Second step of the Kuzu → DuckDB migration. The DuckDB schema and a `GraphDB`-conforming adapter are in place; porting the indexer / MCP tool queries from Cypher to SQL is the next PR in the chain.

What lands

File Role
`pyproject.toml` `duckdb>=1.0` added as a core dep. Both backends ship in the same wheel during the migration window.
`core/schema_duckdb.py` SQL DDL mirroring `core/schema.py` table-for-table. 7 nodes + 16 edges + reverse-lookup indexes. No FK constraints (DuckDB rejects `ON DELETE CASCADE`; the indexer already does explicit purges).
`core/db_duckdb.py` `DuckDBGraphDB` + `DuckDBQueryResult` adapters implementing the protocols from #17. `.raw` escape hatch symmetric with `KuzuGraphDB`.
`core/db.py` `get_connection` / `get_readonly_connection` branch on `CGH_DB` env var. Default unchanged (kuzu). `CGH_DB=duckdb` opens `.codegraph/graph.duckdb` and runs `init_schema()` on it.
`tests/test_core/test_db_duckdb.py` 10 new tests: protocol conformance, schema-init smoke, explicit purge chain, backend selection via monkeypatched env.

End-to-end verification

```bash
$ CGH_DB=duckdb cgh init --yes --root /tmp/x
$ ls /tmp/x/.codegraph
graph.duckdb # not graph.db
$ duckdb /tmp/x/.codegraph/graph.duckdb \
"select count(*) from information_schema.tables where table_schema='main'"
23 # 7 node + 16 edge tables
```

The Cypher emit from `indexer.py` naturally fails against DuckDB — that's expected and handled in the next PR.

Tests

  • `uvx ruff check .` clean
  • 242 (was) + 10 (new) = 252 tests green
  • End-to-end smoke on a scratch repo
  • CI green on this PR

Migration chain progress

PR Status
`feature/db-protocol` (#17) ✅ merged
`feature/duckdb-schema` (this PR) open
`feature/duckdb-queries` next
`feature/duckdb-federation` after that
`release/0.5.0` flip default to DuckDB
`release/0.6.0` remove Kuzu entirely

Second step of the Kuzu -> DuckDB migration. The DuckDB schema and a
GraphDB-conforming adapter are in place; the actual port of the
indexer / MCP tool queries from Cypher to SQL is the next PR.

What lands:

- pyproject.toml: duckdb>=1.0 added as a core dep alongside kuzu.
  Both backends ship in the same wheel during the migration window.

- codegraph/core/schema_duckdb.py: SQL DDL mirroring core/schema.py
  table-for-table. 7 node tables (file, function, class, endpoint,
  tf_resource, tf_var, md_section) and 16 edge tables. Edge tables
  use composite PK on (from, to[, prop]) so the indexer's MERGE
  semantics map cleanly to INSERT OR REPLACE. No FK constraints —
  DuckDB rejects ON DELETE CASCADE, and the indexer's _purge_file
  already issues explicit deletes per node type anyway.

  Reverse-lookup indexes on the edge tables' second column so
  "who calls X" and "who imports X" don't scan the whole graph.

- codegraph/core/db_duckdb.py: DuckDBGraphDB + DuckDBQueryResult
  adapters implementing the GraphDB / QueryResult protocols from
  PR #17. .raw escape hatch symmetric with KuzuGraphDB.

- codegraph/core/db.py: get_connection() and get_readonly_connection()
  branch on CGH_DB env var. Default unchanged (kuzu). CGH_DB=duckdb
  opens .codegraph/graph.duckdb and runs init_schema() on it.

- tests/test_core/test_db_duckdb.py: 10 new tests. Protocol
  conformance, schema-init smoke, explicit purge chain (the SQL
  equivalent of Kuzu's DETACH DELETE), and backend selection via
  monkeypatched env var.

End-to-end smoke on a scratch repo:

  $ CGH_DB=duckdb cgh init --yes --root /tmp/x
  $ ls /tmp/x/.codegraph
  graph.duckdb  (not graph.db)
  $ duckdb /tmp/x/.codegraph/graph.duckdb
  > select count(*) from information_schema.tables where table_schema='main';
  23  (7 nodes + 16 edges)

The Cypher emit from indexer.py naturally fails against DuckDB —
that's expected. PR feature/duckdb-queries (next in the chain)
ports the indexer + MCP tools to SQL one file at a time, both
backends staying green via the env var the whole way.

Test count: 242 -> 252 (+10 new, all existing still green).
@joy-software joy-software merged commit 2ed9a6a into develop Jun 1, 2026
1 check passed
@joy-software joy-software deleted the feature/duckdb-schema branch June 1, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant