Skip to content

Update duckdb tool and support .duckdb files#16

Merged
ScriptSmith merged 4 commits intomainfrom
duckdb-files
Mar 20, 2026
Merged

Update duckdb tool and support .duckdb files#16
ScriptSmith merged 4 commits intomainfrom
duckdb-files

Conversation

@ScriptSmith
Copy link
Owner

No description provided.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR extends the DuckDB tool to support .duckdb database files in addition to the existing flat-file formats (CSV, Parquet, JSON). Files are registered lazily via the BROWSER_FILEREADER protocol (no memory overhead), attached as named catalogs with ATTACH … AS "alias" (READ_ONLY), and their table schemas are enumerated and surfaced in a new schema-viewer modal. The ToolsBar gains an ensureEnabled helper that auto-enables the relevant tool when a resource is actually configured.

Key changes:

  • Worker (duckdbWorker.ts): new registerDatabaseHandle function for the BROWSER_FILEREADER path; duckdb-type branch inside registerFile for buffer-path; both include proper cleanup on ATTACH failure and single-quote escaping for filenames.
  • Service (duckdbService.ts): new registerDatabaseFile public method; dbAlias exposed in registerFile return type.
  • Upload UI (DataFileUpload.tsx): DuckDB file type wired end-to-end; schema-viewer modal with per-table column listing; TableSchema sub-component has a React key collision bug (uses only table.tableName — not unique across schemas) and the SQL query hint displays identifiers without quoting.
  • ToolsBar: ensureEnabled wired to file_search, sql_query, sub_agent, and mcp; the sub_agent trigger fires on any model-selector change rather than on actual sub-agent use.
  • Store / chat plumbing: DataFileTable gains schemaName; DataFile gains tables and dbName; AI context builder emits per-table schema lines for attached databases.

Confidence Score: 3/5

  • Safe to merge after fixing the React key collision bug; the remaining issues are cosmetic.
  • The core worker/service logic (ATTACH, DETACH, error-path cleanup, filename escaping) is solid and addresses previously raised concerns. The main issue is a P1 logic bug in DataFileUpload.tsx: using only table.tableName as a React key means multi-schema databases with colliding table names will silently drop entries from the rendered list. The unquoted SQL hint in TableSchema is misleading but display-only. The sub_agent auto-enable-on-model-change is a minor UX inconsistency.
  • ui/src/components/DataFileUpload/DataFileUpload.tsx — React key collision and unquoted SQL hint in TableSchema

Important Files Changed

Filename Overview
ui/src/services/duckdb/duckdbWorker.ts Adds registerDatabaseHandle (BROWSER_FILEREADER path) and duckdb-type branch inside registerFile, both with proper inner try/catch to dropFile on ATTACH failure and single-quote escaping for filenames. Unregister correctly DETACHes before dropping. Previously flagged issues (buffer orphan, SQL injection via filename) are addressed in this PR.
ui/src/components/DataFileUpload/DataFileUpload.tsx Adds DuckDB file support with schema-enumeration flow, a schema-viewer modal, and a TableSchema sub-component. Two issues: the TableSchema map uses only table.tableName as the React key (collision risk for same-named tables in different schemas), and the SQL query hint displayed to users interpolates schema/table names without quoting.
ui/src/components/ToolsBar/ToolsBar.tsx Introduces ensureEnabled helper and wires it to file_search (on store selection), sql_query (on file upload), sub_agent (on model change), and mcp (on config open). The sub_agent trigger fires on any model change — not just on actual sub-agent usage — which may unexpectedly auto-enable the tool.
ui/src/services/duckdb/duckdbService.ts Adds registerDatabaseFile method for the BROWSER_FILEREADER path and exposes dbAlias in the return type of registerFile. Service-layer tracking of registered files mirrors the worker correctly.
ui/src/stores/chatUIStore.ts Adds schemaName to DataFileTable, adds tables and dbName fields to DataFile, and extends updateDataFileStatus to accept the new schema shape. Clean, minimal change.
ui/src/pages/chat/useChat.ts Updates DataFileInfo to include schemaName per table and extends the SQL tool description builder to emit per-table schema lines for attached databases. Straightforward and correct.
ui/src/pages/chat/ChatPage.tsx Trivial change — passes tables and dbName through the dataFileInfos memo alongside the existing columns field.

Sequence Diagram

sequenceDiagram
    participant UI as DataFileUpload
    participant Svc as duckdbService
    participant W as duckdbWorker
    participant DB as DuckDB WASM

    UI->>Svc: registerDatabaseFile(name, File)
    Svc->>W: postMessage {type: registerDatabaseHandle, handle: File}
    W->>DB: registerFileHandle(name, handle, BROWSER_FILEREADER)
    W->>DB: ATTACH 'name' AS "alias" (READ_ONLY)
    alt ATTACH fails
        DB-->>W: error
        W->>DB: dropFile(name)
        W-->>Svc: {success: false, error}
    else ATTACH succeeds
        DB-->>W: ok
        W-->>Svc: {success: true, dbAlias}
    end
    Svc-->>UI: {success, dbAlias}

    UI->>Svc: execute(SELECT table_schema, table_name FROM information_schema.tables ...)
    Svc-->>UI: rows[]

    loop for each table
        UI->>Svc: describeTable("alias"."schema"."table")
        Svc-->>UI: columns[]
    end

    UI->>Store: updateDataFileStatus(fileId, true, {tables, dbName})
Loading

Comments Outside Diff (1)

  1. ui/src/components/ToolsBar/ToolsBar.tsx, line 486-492 (link)

    P2 sub_agent auto-enables on any model selection change

    ensureEnabled("sub_agent") fires whenever the user opens the sub-agent panel and changes the model — not only when they actually run a sub-agent task. A user who is simply previewing model options will find sub_agent silently toggled on, which differs from the file_search and sql_query patterns where enabling is tied to actually selecting a resource.

    Consider tying ensureEnabled("sub_agent") to a signal that indicates the user actually intends to use a sub-agent (e.g., a dedicated "Enable" toggle within the selector, or on first successful sub-agent invocation) rather than a passive model-change event.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: ui/src/components/ToolsBar/ToolsBar.tsx
    Line: 486-492
    
    Comment:
    **`sub_agent` auto-enables on any model selection change**
    
    `ensureEnabled("sub_agent")` fires whenever the user opens the sub-agent panel and changes the model — not only when they actually run a sub-agent task. A user who is simply previewing model options will find `sub_agent` silently toggled on, which differs from the `file_search` and `sql_query` patterns where enabling is tied to actually selecting a resource.
    
    Consider tying `ensureEnabled("sub_agent")` to a signal that indicates the user actually intends to use a sub-agent (e.g., a dedicated "Enable" toggle within the selector, or on first successful sub-agent invocation) rather than a passive model-change event.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: ui/src/components/DataFileUpload/DataFileUpload.tsx
Line: 405-407

Comment:
**Unquoted identifiers in query hint produce misleading SQL**

`schemaName` and `tableName` are interpolated directly into the hint string without quoting. A table named `my events` or a reserved-word like `order` will produce a hint like `SELECT * FROM db.main.my events` or `SELECT * FROM db.main.order` — both invalid SQL that will confuse users who copy it directly.

`dbName` is always safe (sanitised by `deriveDbAlias`), but the other two come directly from `information_schema`:

```suggestion
        <p className="text-xs text-muted-foreground mb-2 font-mono">
          SELECT * FROM &quot;{dbName}&quot;.&quot;{table.schemaName}&quot;.&quot;{table.tableName}&quot;
        </p>
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: ui/src/components/ToolsBar/ToolsBar.tsx
Line: 486-492

Comment:
**`sub_agent` auto-enables on any model selection change**

`ensureEnabled("sub_agent")` fires whenever the user opens the sub-agent panel and changes the model — not only when they actually run a sub-agent task. A user who is simply previewing model options will find `sub_agent` silently toggled on, which differs from the `file_search` and `sql_query` patterns where enabling is tied to actually selecting a resource.

Consider tying `ensureEnabled("sub_agent")` to a signal that indicates the user actually intends to use a sub-agent (e.g., a dedicated "Enable" toggle within the selector, or on first successful sub-agent invocation) rather than a passive model-change event.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: ui/src/components/DataFileUpload/DataFileUpload.tsx
Line: 377

Comment:
**Non-unique React key across schemas**

The `key` prop here only uses the table name. If two different schemas in the same database both contain a table called `orders` (or any other matching name), React will encounter duplicate keys and silently skip rendering one of the entries. The key should be a composite of both the schema and the table name to stay unique across the full set of tables.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "Review fixes"

Comment on lines 490 to 500
@@ -485,7 +498,11 @@ export function ToolsBar({
if (toolId === "sql_query") {
return (
<div className="pt-2 border-t mt-2">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 MCP tool auto-enables on config open, not on actual server connection

Opening the MCP config dialog now immediately enables the mcp tool — even if the user is just browsing the dialog and doesn't add any servers. This is different from the file_search and sql_query cases, where the tool is only enabled once the user actually selects a resource (a vector store or uploads a file).

The previous code passed onOpenMCPConfig directly, which could also be undefined; this change makes onOpenConfig always a defined function. If MCPServerStatus conditionally renders its config button based on whether onOpenConfig is defined, this change would always show that button even when there is no config handler — worth verifying.

Consider tying ensureEnabled("mcp") to an event that signals a server was actually connected/configured rather than the dialog being opened.

Prompt To Fix With AI
This is a comment left during a code review.
Path: ui/src/components/ToolsBar/ToolsBar.tsx
Line: 490-500

Comment:
**MCP tool auto-enables on config *open*, not on actual server connection**

Opening the MCP config dialog now immediately enables the `mcp` tool — even if the user is just browsing the dialog and doesn't add any servers. This is different from the `file_search` and `sql_query` cases, where the tool is only enabled once the user actually selects a resource (a vector store or uploads a file).

The previous code passed `onOpenMCPConfig` directly, which could also be `undefined`; this change makes `onOpenConfig` always a defined function. If `MCPServerStatus` conditionally renders its config button based on whether `onOpenConfig` is defined, this change would always show that button even when there is no config handler — worth verifying.

Consider tying `ensureEnabled("mcp")` to an event that signals a server was actually connected/configured rather than the dialog being opened.

How can I resolve this? If you propose a fix, please make it concise.

@ScriptSmith
Copy link
Owner Author

@greptile-apps

@ScriptSmith
Copy link
Owner Author

@greptile-apps

Comment on lines +405 to +407
<span className="text-muted-foreground shrink-0">{FILE_TYPE_LABELS[file.type]}</span>
{chipDetail && <span className="text-muted-foreground shrink-0">{chipDetail}</span>}
{hasSchema && !isLoading && (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unquoted identifiers in query hint produce misleading SQL

schemaName and tableName are interpolated directly into the hint string without quoting. A table named my events or a reserved-word like order will produce a hint like SELECT * FROM db.main.my events or SELECT * FROM db.main.order — both invalid SQL that will confuse users who copy it directly.

dbName is always safe (sanitised by deriveDbAlias), but the other two come directly from information_schema:

Suggested change
<span className="text-muted-foreground shrink-0">{FILE_TYPE_LABELS[file.type]}</span>
{chipDetail && <span className="text-muted-foreground shrink-0">{chipDetail}</span>}
{hasSchema && !isLoading && (
<p className="text-xs text-muted-foreground mb-2 font-mono">
SELECT * FROM &quot;{dbName}&quot;.&quot;{table.schemaName}&quot;.&quot;{table.tableName}&quot;
</p>
Prompt To Fix With AI
This is a comment left during a code review.
Path: ui/src/components/DataFileUpload/DataFileUpload.tsx
Line: 405-407

Comment:
**Unquoted identifiers in query hint produce misleading SQL**

`schemaName` and `tableName` are interpolated directly into the hint string without quoting. A table named `my events` or a reserved-word like `order` will produce a hint like `SELECT * FROM db.main.my events` or `SELECT * FROM db.main.order` — both invalid SQL that will confuse users who copy it directly.

`dbName` is always safe (sanitised by `deriveDbAlias`), but the other two come directly from `information_schema`:

```suggestion
        <p className="text-xs text-muted-foreground mb-2 font-mono">
          SELECT * FROM &quot;{dbName}&quot;.&quot;{table.schemaName}&quot;.&quot;{table.tableName}&quot;
        </p>
```

How can I resolve this? If you propose a fix, please make it concise.

const hasColumns = file.columns && file.columns.length > 0;
const hasTables = file.tables && file.tables.length > 0;
const isDatabase = file.type === "duckdb";
const hasSchema = hasColumns || hasTables;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Non-unique React key across schemas

The key prop here only uses the table name. If two different schemas in the same database both contain a table called orders (or any other matching name), React will encounter duplicate keys and silently skip rendering one of the entries. The key should be a composite of both the schema and the table name to stay unique across the full set of tables.

Prompt To Fix With AI
This is a comment left during a code review.
Path: ui/src/components/DataFileUpload/DataFileUpload.tsx
Line: 377

Comment:
**Non-unique React key across schemas**

The `key` prop here only uses the table name. If two different schemas in the same database both contain a table called `orders` (or any other matching name), React will encounter duplicate keys and silently skip rendering one of the entries. The key should be a composite of both the schema and the table name to stay unique across the full set of tables.

How can I resolve this? If you propose a fix, please make it concise.

@ScriptSmith ScriptSmith merged commit d6c2931 into main Mar 20, 2026
18 of 19 checks passed
@ScriptSmith ScriptSmith deleted the duckdb-files branch March 20, 2026 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant