feat(core): DuckDB backend behind CGH_DB=duckdb env var by joy-software · Pull Request #18 · altikva/cgh

joy-software · 2026-06-01T17:03:06Z

Summary

Second step of the Kuzu → DuckDB migration. The DuckDB schema and a `GraphDB`-conforming adapter are in place; porting the indexer / MCP tool queries from Cypher to SQL is the next PR in the chain.

What lands

File	Role
`pyproject.toml`	`duckdb>=1.0` added as a core dep. Both backends ship in the same wheel during the migration window.
`core/schema_duckdb.py`	SQL DDL mirroring `core/schema.py` table-for-table. 7 nodes + 16 edges + reverse-lookup indexes. No FK constraints (DuckDB rejects `ON DELETE CASCADE`; the indexer already does explicit purges).
`core/db_duckdb.py`	`DuckDBGraphDB` + `DuckDBQueryResult` adapters implementing the protocols from #17. `.raw` escape hatch symmetric with `KuzuGraphDB`.
`core/db.py`	`get_connection` / `get_readonly_connection` branch on `CGH_DB` env var. Default unchanged (kuzu). `CGH_DB=duckdb` opens `.codegraph/graph.duckdb` and runs `init_schema()` on it.
`tests/test_core/test_db_duckdb.py`	10 new tests: protocol conformance, schema-init smoke, explicit purge chain, backend selection via monkeypatched env.

End-to-end verification

```bash
$ CGH_DB=duckdb cgh init --yes --root /tmp/x
$ ls /tmp/x/.codegraph
graph.duckdb # not graph.db
$ duckdb /tmp/x/.codegraph/graph.duckdb \
"select count(*) from information_schema.tables where table_schema='main'"
23 # 7 node + 16 edge tables
```

The Cypher emit from `indexer.py` naturally fails against DuckDB — that's expected and handled in the next PR.

Tests

`uvx ruff check .` clean
242 (was) + 10 (new) = 252 tests green
End-to-end smoke on a scratch repo
CI green on this PR

Migration chain progress

PR	Status
`feature/db-protocol` (#17)	✅ merged
`feature/duckdb-schema` (this PR)	open
`feature/duckdb-queries`	next
`feature/duckdb-federation`	after that
`release/0.5.0`	flip default to DuckDB
`release/0.6.0`	remove Kuzu entirely

Second step of the Kuzu -> DuckDB migration. The DuckDB schema and a GraphDB-conforming adapter are in place; the actual port of the indexer / MCP tool queries from Cypher to SQL is the next PR. What lands: - pyproject.toml: duckdb>=1.0 added as a core dep alongside kuzu. Both backends ship in the same wheel during the migration window. - codegraph/core/schema_duckdb.py: SQL DDL mirroring core/schema.py table-for-table. 7 node tables (file, function, class, endpoint, tf_resource, tf_var, md_section) and 16 edge tables. Edge tables use composite PK on (from, to[, prop]) so the indexer's MERGE semantics map cleanly to INSERT OR REPLACE. No FK constraints — DuckDB rejects ON DELETE CASCADE, and the indexer's _purge_file already issues explicit deletes per node type anyway. Reverse-lookup indexes on the edge tables' second column so "who calls X" and "who imports X" don't scan the whole graph. - codegraph/core/db_duckdb.py: DuckDBGraphDB + DuckDBQueryResult adapters implementing the GraphDB / QueryResult protocols from PR #17. .raw escape hatch symmetric with KuzuGraphDB. - codegraph/core/db.py: get_connection() and get_readonly_connection() branch on CGH_DB env var. Default unchanged (kuzu). CGH_DB=duckdb opens .codegraph/graph.duckdb and runs init_schema() on it. - tests/test_core/test_db_duckdb.py: 10 new tests. Protocol conformance, schema-init smoke, explicit purge chain (the SQL equivalent of Kuzu's DETACH DELETE), and backend selection via monkeypatched env var. End-to-end smoke on a scratch repo: $ CGH_DB=duckdb cgh init --yes --root /tmp/x $ ls /tmp/x/.codegraph graph.duckdb (not graph.db) $ duckdb /tmp/x/.codegraph/graph.duckdb > select count(*) from information_schema.tables where table_schema='main'; 23 (7 nodes + 16 edges) The Cypher emit from indexer.py naturally fails against DuckDB — that's expected. PR feature/duckdb-queries (next in the chain) ports the indexer + MCP tools to SQL one file at a time, both backends staying green via the env var the whole way. Test count: 242 -> 252 (+10 new, all existing still green).

joy-software merged commit 2ed9a6a into develop Jun 1, 2026
1 check passed

joy-software deleted the feature/duckdb-schema branch June 1, 2026 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): DuckDB backend behind CGH_DB=duckdb env var#18

feat(core): DuckDB backend behind CGH_DB=duckdb env var#18
joy-software merged 1 commit into
developfrom
feature/duckdb-schema

joy-software commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joy-software commented Jun 1, 2026

Summary

What lands

End-to-end verification

Tests

Migration chain progress

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant