Skip to content

feat(arcadedb): add source and tools#2961

Open
Deeven-Seru wants to merge 1 commit intogoogleapis:mainfrom
Deeven-Seru:fix-2689-arcadedb
Open

feat(arcadedb): add source and tools#2961
Deeven-Seru wants to merge 1 commit intogoogleapis:mainfrom
Deeven-Seru:fix-2689-arcadedb

Conversation

@Deeven-Seru
Copy link
Copy Markdown
Contributor

Summary

This PR implements native ArcadeDB support in the GenAI Toolbox by introducing a new arcadedb source and two associated tools for executing Cypher and SQL queries.

ArcadeDB is a multi-model database supporting graph, document, key-value, and time-series workloads through the Bolt protocol. Since it is compatible with the Neo4j driver, this implementation reuses the existing neo4j-go-driver, enabling seamless integration with minimal additional complexity.

This change resolves the feature request to add ArcadeDB as a supported source in the Toolbox.

Fixes: #2689


What This PR Adds

1) New Source: arcadedb

A fully functional source implementation that:

  • Registers a new source type: arcadedb
  • Uses the Neo4j Bolt protocol driver
  • Verifies connectivity during initialization
  • Supports configurable database selection
  • Integrates with OpenTelemetry tracing
  • Follows existing Toolbox source architecture patterns

Key capabilities:

  • Bolt protocol connectivity
  • Standard authentication support
  • Config validation
  • Structured query execution
  • Telemetry-compatible initialization

2) New Tool: arcadedb-execute-cypher

Executes OpenCypher queries against ArcadeDB.

Features:

  • Read-only enforcement mode
  • Dry-run validation mode
  • Query classification (read vs write)
  • Execution plan inspection support

Example:

kind: tool
name: query_arcadedb
type: arcadedb-execute-cypher
source: my-arcadedb-source
readOnly: true
description: Execute Cypher queries against ArcadeDB

@Deeven-Seru Deeven-Seru requested a review from a team as a code owner April 6, 2026 11:02
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for ArcadeDB as a new data source, including implementations for executing Cypher and SQL queries. The code changes include the necessary source registration, tool implementations, and documentation. I have identified a potential panic in the Cypher execution plan logic that requires a nil check, as well as several documentation files that violate the repository's style guide regarding frontmatter-only content and title naming conventions.

Comment thread internal/sources/arcadedb/arcadedb.go Outdated
Comment thread docs/en/integrations/arcadedb/_index.md
Comment thread docs/en/integrations/arcadedb/tools/_index.md
Comment thread docs/en/integrations/arcadedb/tools/arcadedb-execute-cypher.md
Comment thread docs/en/integrations/arcadedb/tools/arcadedb-execute-sql.md
Comment thread internal/tools/arcadedb/arcadedbexecutesql/arcadedbexecutesql.go
@Deeven-Seru
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds ArcadeDB as a new data source, enabling Cypher and SQL query execution via the Bolt protocol. It includes the source implementation, tool definitions, unit tests, and documentation. Review feedback identifies a naming inconsistency in the execution plan's JSON keys and recommends implementing read-only and dry-run modes for the SQL tool to ensure consistency and safety across query tools.

Comment thread internal/sources/arcadedb/arcadedb.go Outdated
Comment thread internal/sources/arcadedb/arcadedb.go
@stenalpjolly
Copy link
Copy Markdown

/gcbrun

3 similar comments
@stenalpjolly
Copy link
Copy Markdown

/gcbrun

@stenalpjolly
Copy link
Copy Markdown

/gcbrun

@stenalpjolly
Copy link
Copy Markdown

/gcbrun

@Deeven-Seru
Copy link
Copy Markdown
Contributor Author

@stenalpjolly /gcburn is what maintainers do :)

@stenalpjolly
Copy link
Copy Markdown

@Deeven-Seru I'm from Google only. I was testing my maintainer access for the org

Deeven-Seru added a commit to Deeven-Seru/mcp-toolbox that referenced this pull request Apr 7, 2026
@Deeven-Seru Deeven-Seru requested a review from a team as a code owner April 7, 2026 05:50
Deeven-Seru added a commit to Deeven-Seru/mcp-toolbox that referenced this pull request Apr 7, 2026
@Deeven-Seru
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for ArcadeDB and Conversational Analytics, adding new sources and tools along with their respective documentation, prebuilt configurations, and integration tests. It also includes dependency updates and improvements to existing documentation samples. Feedback is provided to enhance the robustness of the new HTTP-based tools by ensuring that context is correctly propagated to all network requests and that HTTP client instances are configured with appropriate timeouts to prevent hanging operations.

Comment thread internal/sources/arcadedb/arcadedb.go Outdated
@Deeven-Seru
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces ArcadeDB as a new data source, including a source driver and tools for executing Cypher and SQL queries. It also refactors conversational analytics tools to improve context handling and HTTP client timeouts. Feedback was provided to ensure documentation adheres to style guides regarding frontmatter and title naming, to remove redundant configuration logic, to maintain consistent JSON keys in execution plans, and to improve security and performance through query parameterization and HTTP client reuse.

Comment thread docs/en/integrations/arcadedb/_index.md
Comment thread docs/en/integrations/arcadedb/tools/_index.md
Comment thread docs/en/integrations/arcadedb/tools/arcadedb-execute-cypher.md
Comment thread docs/en/integrations/arcadedb/tools/arcadedb-execute-sql.md
Comment thread internal/sources/arcadedb/arcadedb.go
Comment thread internal/sources/arcadedb/arcadedb.go Outdated
Comment thread internal/tools/arcadedb/arcadedbexecutecypher/arcadedbexecutecypher.go Outdated
@lvca
Copy link
Copy Markdown

lvca commented Apr 20, 2026

Thanks @Deeven-Seru for picking this up, and @duwenxin99 for looping me in. The overall shape matches what we agreed in #2689: a dedicated arcadedb source reusing neo4j-go-driver, with separate Cypher and SQL tools. Happy to commit to long-term maintenance of this code on the ArcadeDB side.

A few items I'd like to see addressed before merge, and a couple of things worth verifying.

Blocking

1. arcadedb-execute-sql has no readOnly mode.
The Cypher tool supports readOnly and dry_run, but the SQL tool passes statements straight through. ArcadeDB SQL can DELETE, DROP TYPE, CREATE VERTEX, etc., so for parity with the Cypher tool and to match the "developer assistant with human-in-the-loop" framing in the docs, the SQL tool should at minimum support readOnly. A simple first-token check (allow SELECT, MATCH, TRAVERSE) covers the common case; we can iterate on a proper classifier later.

2. Query parameters are hardcoded to nil.
Both Invoke methods call source.RunCypher(ctx, cypherStr, nil, ...) and source.RunSQL(ctx, sqlStr, nil). The params map[string]any argument is plumbed through the source layer but dead-ended at the tool layer, so callers have no way to pass parameterized queries. That forces string concatenation for any dynamic value and defeats Bolt's built-in parameterization. Either expose a params tool input or remove the unused argument from the interface.

3. Driver is never closed.
Source.Driver has no teardown path. Worth confirming the toolbox runtime calls a close hook on source shutdown/reload; if it does, we should implement it here to avoid leaking Bolt connections. The existing Neo4j source should be the reference.

4. Redundant Database defaulting.
newConfig sets Database: "arcadedb", the struct has validate:"required" on Database, and Initialize re-checks if r.Database == "". Two of those three paths are unreachable. The docs also list database as required while the code defaults it, which is contradictory. Pick one behavior (I'd suggest: make it optional with a documented default) and align docs, validator, and code.

Nits

5. Inconsistent JSON key casing in execution plan output.
Top-level plan uses childrenCount, recursive children use children_count. Should be consistent (camelCase to match the rest of the payload).

6. Unrelated conversationalanalytics changes.
The diff includes ~11 lines of context-propagation and HTTP-timeout changes in internal/tools/conversationalanalytics/** that aren't related to ArcadeDB. Probably a rebase artifact. Worth splitting or rebasing cleanly so the ArcadeDB PR stands alone.

Worth verifying

7. EXPLAIN behavior for Cypher dry-run.
The code prefixes queries with EXPLAIN and then reads summary.Plan() via the Neo4j driver. ArcadeDB supports EXPLAIN in its Cypher implementation, but I want to confirm the plan summary returned over Bolt is shaped compatibly with plan.Operator(), plan.Children(), etc. Otherwise dry-run will silently return empty data. I can run this against a local ArcadeDB once the other items are addressed, and report back with concrete output.

8. Classifier coverage for ArcadeDB-specific Cypher.
The classifier.QueryClassifier was built for Neo4j Cypher. ArcadeDB has some dialect differences. I'll sanity-check that writes are classified correctly across common ArcadeDB Cypher forms and open a follow-up issue if we find gaps, so it doesn't block this PR.

Offer

Once these are addressed, I'll test the branch end-to-end against a live ArcadeDB (Cypher + SQL, read-only enforcement, dry-run plan output) and post results here. Happy to also contribute a basic integration-test harness in a follow-up if the maintainers want one.

Thanks again for the work on this.

@Deeven-Seru Deeven-Seru force-pushed the fix-2689-arcadedb branch 2 times, most recently from a8be80f to 26349e7 Compare April 21, 2026 16:24
@lvca
Copy link
Copy Markdown

lvca commented Apr 22, 2026

Thanks @Deeven-Seru, nice turnaround on the blocking items. Quick takes on both open questions:

1. SQL readOnly allow-list

I'd keep it narrow for v1 and not add RETURN or LET to the first-token allow-list.

The reason is that in ArcadeDB SQL both keywords are almost always used inside script-mode bodies, not as standalone read statements, and a first-token classifier can't see past them into what they actually do:

  • LET $x = SELECT ... is read-only, but LET $x = INSERT ... / LET $x = UPDATE ... / LET $x = DELETE ... are all valid and write.
  • RETURN $x is safe only if the preceding script that produced $x was read-only, which the classifier has no way to verify.

Adding these to a first-token allow-list would silently let writes slip through readOnly mode. Safer to stay strict, let real user query samples tell us which read-only patterns are being blocked, and then either extend the list case-by-case or swap in a proper parser-based classifier. A strict default is easy to loosen; a permissive one that let a DELETE through under readOnly is much harder to walk back.

Current list (SELECT, MATCH, TRAVERSE, WITH, EXPLAIN) looks right to ship.

2. Cypher dry-run with partial plan output

My preference: return the partial payload with an explicit plan_incomplete: true flag (plus a short human-readable warning string), rather than fail fast. Reasoning:

  • Even a partial plan confirms the query parsed and the optimizer ran, which is still useful signal in a developer-assistant / human-in-the-loop flow.
  • Failing the whole dry-run over a server-version quirk is a worse UX than surfacing the degradation and letting the calling client decide what to render.
  • The check is cheap to emit at driver level: if plan.Operator() is empty, or plan.Children() is nil on a node whose OperatorType implies children, set the flag.

One carve-out: if the plan payload is completely empty (zero operators at all), I'd treat that as an error rather than partial. An empty plan almost certainly means the server-side EXPLAIN didn't produce anything usable on that version, and returning a silent empty object is more misleading than helpful. So: empty → error, partial → flagged payload.

Next step from my side

Once you push the updated PR, I'll run the end-to-end test against a live ArcadeDB (Cypher + SQL, readOnly enforcement across the cases above, dry-run plan output on a current server build) and post results here. If the partial-plan case shows up in practice I'll include concrete output so we can calibrate the warning threshold.

@Deeven-Seru
Copy link
Copy Markdown
Contributor Author

@lvca thanks for answering , PTAL at the pr

Copy link
Copy Markdown
Contributor Author

@Deeven-Seru Deeven-Seru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/gemini review

@Deeven-Seru
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for ArcadeDB as a new data source, including tools for executing Cypher and SQL queries. The implementation includes the necessary source configuration, tool logic, and comprehensive documentation. Feedback is provided regarding documentation structure violations (specifically the inclusion of body text in index files), tool naming conventions, and the use of a more robust HTTP client for SQL execution.

Comment on lines +9 to +10
ArcadeDB is a multi-model database that supports graph (Cypher), document (SQL),
key-value, and time-series workloads in a single engine.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The root _index.md file for an integration must contain strictly only YAML frontmatter and no markdown body text.

References
  1. Ensure the root _index.md file contains strictly only frontmatter and no markdown body text. (link)

no_list: true
---

{{< list-tools >}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tools/_index.md file must contain strictly only YAML frontmatter. The {{< list-tools >}} shortcode should be removed from the body.

References
  1. The tools/ directory must contain an _index.md file consisting strictly of frontmatter. (link)

@@ -0,0 +1,46 @@
---
title: "arcadedb-execute-cypher"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Tool names must be in snake_case (e.g., arcadedb_execute_cypher). Additionally, the title in the YAML frontmatter for tool documentation should follow the format: kebab-case tool name followed by ' Tool'.

Suggested change
title: "arcadedb-execute-cypher"
title: "arcadedb-execute-cypher Tool"
References
  1. Tool names must be in snake_case.
  2. For tool documentation files, the title in the YAML frontmatter should use kebab-case for the tool name, followed by ' Tool'.

@@ -0,0 +1,48 @@
---
title: "arcadedb-execute-sql"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Tool names must be in snake_case (e.g., arcadedb_execute_sql). Additionally, the title in the YAML frontmatter for tool documentation should follow the format: kebab-case tool name followed by ' Tool'.

Suggested change
title: "arcadedb-execute-sql"
title: "arcadedb-execute-sql Tool"
References
  1. Tool names must be in snake_case.
  2. For tool documentation files, the title in the YAML frontmatter should use kebab-case for the tool name, followed by ' Tool'.

req.SetBasicAuth(s.User, s.Password)
req.Header.Set("Content-Type", "application/json")

resp, err := http.DefaultClient.Do(req)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using http.DefaultClient is discouraged in production code as it lacks a default timeout, which can lead to resource exhaustion if the server hangs. It is better to define a custom client with a reasonable timeout.

Suggested change
resp, err := http.DefaultClient.Do(req)
var arcadeHTTPClient = &http.Client{
Timeout: 30 * time.Second,
}
// ... inside RunSQL ...
resp, err := arcadeHTTPClient.Do(req)

return "", fmt.Errorf("unable to parse host from arcade uri %q", s.Uri)
}
// ArcadeDB HTTP API defaults to 2480.
return fmt.Sprintf("http://%s:2480/api/v1/command/%s", host, s.ArcadeDBDatabase()), nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Deriving the HTTP API URL by hardcoding http:// and port 2480 from the Bolt URI is fragile. ArcadeDB might be configured with HTTPS or a different HTTP port. Consider making the HTTP endpoint configurable or more robustly derived.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ArcadeDB source (multi-model database with Bolt protocol support)

5 participants