feat(core): DuckDB backend behind CGH_DB=duckdb env var#18
Merged
Conversation
Second step of the Kuzu -> DuckDB migration. The DuckDB schema and a GraphDB-conforming adapter are in place; the actual port of the indexer / MCP tool queries from Cypher to SQL is the next PR. What lands: - pyproject.toml: duckdb>=1.0 added as a core dep alongside kuzu. Both backends ship in the same wheel during the migration window. - codegraph/core/schema_duckdb.py: SQL DDL mirroring core/schema.py table-for-table. 7 node tables (file, function, class, endpoint, tf_resource, tf_var, md_section) and 16 edge tables. Edge tables use composite PK on (from, to[, prop]) so the indexer's MERGE semantics map cleanly to INSERT OR REPLACE. No FK constraints — DuckDB rejects ON DELETE CASCADE, and the indexer's _purge_file already issues explicit deletes per node type anyway. Reverse-lookup indexes on the edge tables' second column so "who calls X" and "who imports X" don't scan the whole graph. - codegraph/core/db_duckdb.py: DuckDBGraphDB + DuckDBQueryResult adapters implementing the GraphDB / QueryResult protocols from PR #17. .raw escape hatch symmetric with KuzuGraphDB. - codegraph/core/db.py: get_connection() and get_readonly_connection() branch on CGH_DB env var. Default unchanged (kuzu). CGH_DB=duckdb opens .codegraph/graph.duckdb and runs init_schema() on it. - tests/test_core/test_db_duckdb.py: 10 new tests. Protocol conformance, schema-init smoke, explicit purge chain (the SQL equivalent of Kuzu's DETACH DELETE), and backend selection via monkeypatched env var. End-to-end smoke on a scratch repo: $ CGH_DB=duckdb cgh init --yes --root /tmp/x $ ls /tmp/x/.codegraph graph.duckdb (not graph.db) $ duckdb /tmp/x/.codegraph/graph.duckdb > select count(*) from information_schema.tables where table_schema='main'; 23 (7 nodes + 16 edges) The Cypher emit from indexer.py naturally fails against DuckDB — that's expected. PR feature/duckdb-queries (next in the chain) ports the indexer + MCP tools to SQL one file at a time, both backends staying green via the env var the whole way. Test count: 242 -> 252 (+10 new, all existing still green).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second step of the Kuzu → DuckDB migration. The DuckDB schema and a `GraphDB`-conforming adapter are in place; porting the indexer / MCP tool queries from Cypher to SQL is the next PR in the chain.
What lands
End-to-end verification
```bash
$ CGH_DB=duckdb cgh init --yes --root /tmp/x
$ ls /tmp/x/.codegraph
graph.duckdb # not graph.db
$ duckdb /tmp/x/.codegraph/graph.duckdb \
"select count(*) from information_schema.tables where table_schema='main'"
23 # 7 node + 16 edge tables
```
The Cypher emit from `indexer.py` naturally fails against DuckDB — that's expected and handled in the next PR.
Tests
Migration chain progress