Summary
We propose adding a Tavily plugin to AppKit, giving Databricks Apps first-class access to real-time web data: search, content extraction, and (later) crawling and deep research. Tavily is a web access API built specifically for LLMs and agents — results come back as clean, relevance-scored, LLM-ready content rather than raw HTML.
We're the Tavily team, and we'd like to contribute this plugin ourselves — this issue is to align on scope and design before we open the PR.
Motivation
AppKit apps are strong on Lakehouse-native data (analytics, Genie, vector search) but have no built-in way to reach outside the workspace. Common patterns this unlocks:
- Grounded AI agents — the agents plugin resolves
plugin:NAME tool providers; a Tavily plugin exposing a toolkit() would let any AppKit agent do web search/extraction with one line of frontmatter (tools: [plugin:tavily]), complementing Genie and vector-search for questions that need current, external information.
- RAG enrichment — combine vector-search results over internal docs with fresh web context in serving/agents flows.
- Data enrichment apps — enrich entities from Lakehouse tables (companies, products, tickers) with live web data.
Proposed design
A core plugin following the existing conventions (packages/appkit/src/plugins/tavily/):
Manifest — one required secret resource (permission: READ) holding the Tavily API key, surfaced via a TAVILY_API_KEY env field, so databricks apps init and appkit plugin sync wire it up like any other resource. No OBO semantics — the key is app-level, like a service principal credential.
Server API — typed methods plus injected routes:
const app = await createApp({ plugins: [new TavilyPlugin({ /* defaults */ })] });
await app.tavily.search("latest EU AI Act enforcement actions", {
maxResults: 5,
timeRange: "month",
});
await app.tavily.extract(["https://example.com/report"], { format: "markdown" });
POST /api/tavily/search and POST /api/tavily/extract, validated with Zod like other plugins.
- All outbound calls go through
this.execute(), so caching (search results are very cacheable), retry, timeout, and OpenTelemetry tracing come for free from the interceptor chain.
Agents integration — implement the ToolProvider contract (toolkit()), so agents get tavily_search / tavily_extract tools, subject to the existing tool-approval gate.
Config schema (sketch) —
Scope
- v1:
search + extract, manifest, config schema, agents toolkit(), tests, docs page.
- Later (separate PRs):
crawl / map, deep research (long-running — fits the SSE streaming machinery), and optionally an appkit-ui results component.
Open questions for maintainers
- Packaging — core plugin in
packages/appkit (like genie/serving), or would you prefer third-party integrations in a separate package? This would be the first non-Databricks-service plugin, so happy to follow whatever precedent you want to set.
- Beta gating — should it ship via
beta-exports initially?
- Dependency policy — we'd use the official
@tavily/core SDK (MIT); fine, or do you prefer plain fetch for supply-chain reasons?
If this direction sounds good, we'll follow up with the PR.
Summary
We propose adding a Tavily plugin to AppKit, giving Databricks Apps first-class access to real-time web data: search, content extraction, and (later) crawling and deep research. Tavily is a web access API built specifically for LLMs and agents — results come back as clean, relevance-scored, LLM-ready content rather than raw HTML.
We're the Tavily team, and we'd like to contribute this plugin ourselves — this issue is to align on scope and design before we open the PR.
Motivation
AppKit apps are strong on Lakehouse-native data (analytics, Genie, vector search) but have no built-in way to reach outside the workspace. Common patterns this unlocks:
plugin:NAMEtool providers; a Tavily plugin exposing atoolkit()would let any AppKit agent do web search/extraction with one line of frontmatter (tools: [plugin:tavily]), complementing Genie and vector-search for questions that need current, external information.Proposed design
A core plugin following the existing conventions (
packages/appkit/src/plugins/tavily/):Manifest — one required
secretresource (permission:READ) holding the Tavily API key, surfaced via aTAVILY_API_KEYenv field, sodatabricks apps initandappkit plugin syncwire it up like any other resource. No OBO semantics — the key is app-level, like a service principal credential.Server API — typed methods plus injected routes:
POST /api/tavily/searchandPOST /api/tavily/extract, validated with Zod like other plugins.this.execute(), so caching (search results are very cacheable), retry, timeout, and OpenTelemetry tracing come for free from the interceptor chain.Agents integration — implement the
ToolProvidercontract (toolkit()), so agents gettavily_search/tavily_extracttools, subject to the existing tool-approval gate.Config schema (sketch) —
{ "apiKey": { /* from secret resource / TAVILY_API_KEY */ }, "search": { "maxResults": 5, "searchDepth": "basic", "includeDomains": [], "excludeDomains": [] }, "cache": { "ttl": 300000 }, "timeout": 30000 }Scope
search+extract, manifest, config schema, agentstoolkit(), tests, docs page.crawl/map, deepresearch(long-running — fits the SSE streaming machinery), and optionally anappkit-uiresults component.Open questions for maintainers
packages/appkit(like genie/serving), or would you prefer third-party integrations in a separate package? This would be the first non-Databricks-service plugin, so happy to follow whatever precedent you want to set.beta-exportsinitially?@tavily/coreSDK (MIT); fine, or do you prefer plainfetchfor supply-chain reasons?If this direction sounds good, we'll follow up with the PR.