Skip to content

Tavily web search and extract for AppKit apps #468

Description

@lakshyaag-tavily

Summary

We propose adding a Tavily plugin to AppKit, giving Databricks Apps first-class access to real-time web data: search, content extraction, and (later) crawling and deep research. Tavily is a web access API built specifically for LLMs and agents — results come back as clean, relevance-scored, LLM-ready content rather than raw HTML.

We're the Tavily team, and we'd like to contribute this plugin ourselves — this issue is to align on scope and design before we open the PR.

Motivation

AppKit apps are strong on Lakehouse-native data (analytics, Genie, vector search) but have no built-in way to reach outside the workspace. Common patterns this unlocks:

  • Grounded AI agents — the agents plugin resolves plugin:NAME tool providers; a Tavily plugin exposing a toolkit() would let any AppKit agent do web search/extraction with one line of frontmatter (tools: [plugin:tavily]), complementing Genie and vector-search for questions that need current, external information.
  • RAG enrichment — combine vector-search results over internal docs with fresh web context in serving/agents flows.
  • Data enrichment apps — enrich entities from Lakehouse tables (companies, products, tickers) with live web data.

Proposed design

A core plugin following the existing conventions (packages/appkit/src/plugins/tavily/):

Manifest — one required secret resource (permission: READ) holding the Tavily API key, surfaced via a TAVILY_API_KEY env field, so databricks apps init and appkit plugin sync wire it up like any other resource. No OBO semantics — the key is app-level, like a service principal credential.

Server API — typed methods plus injected routes:

const app = await createApp({ plugins: [new TavilyPlugin({ /* defaults */ })] });

await app.tavily.search("latest EU AI Act enforcement actions", {
  maxResults: 5,
  timeRange: "month",
});
await app.tavily.extract(["https://example.com/report"], { format: "markdown" });
  • POST /api/tavily/search and POST /api/tavily/extract, validated with Zod like other plugins.
  • All outbound calls go through this.execute(), so caching (search results are very cacheable), retry, timeout, and OpenTelemetry tracing come for free from the interceptor chain.

Agents integration — implement the ToolProvider contract (toolkit()), so agents get tavily_search / tavily_extract tools, subject to the existing tool-approval gate.

Config schema (sketch)

{
  "apiKey": { /* from secret resource / TAVILY_API_KEY */ },
  "search": { "maxResults": 5, "searchDepth": "basic", "includeDomains": [], "excludeDomains": [] },
  "cache": { "ttl": 300000 },
  "timeout": 30000
}

Scope

  • v1: search + extract, manifest, config schema, agents toolkit(), tests, docs page.
  • Later (separate PRs): crawl / map, deep research (long-running — fits the SSE streaming machinery), and optionally an appkit-ui results component.

Open questions for maintainers

  1. Packaging — core plugin in packages/appkit (like genie/serving), or would you prefer third-party integrations in a separate package? This would be the first non-Databricks-service plugin, so happy to follow whatever precedent you want to set.
  2. Beta gating — should it ship via beta-exports initially?
  3. Dependency policy — we'd use the official @tavily/core SDK (MIT); fine, or do you prefer plain fetch for supply-chain reasons?

If this direction sounds good, we'll follow up with the PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions