Skip to content
Merged
18 changes: 9 additions & 9 deletions manifest.json
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
{
"version": "2",
"updated_at": "2026-04-30T11:02:41Z",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these be autogenerated on each push? or removed entirely since they are unnessessery

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's remove that but on a separate PR 👍

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#75

"updated_at": "2026-05-11T13:22:07Z",
"skills": {
"databricks-apps": {
"version": "0.1.1",
"description": "Databricks Apps development and deployment (evaluates analytics vs synced tables data access)",
"experimental": false,
"updated_at": "2026-04-30T11:00:26Z",
"updated_at": "2026-05-11T13:22:01Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -33,7 +33,7 @@
"version": "0.1.0",
"description": "Core Databricks skill for CLI, auth, and data exploration",
"experimental": false,
"updated_at": "2026-04-23T13:47:44Z",
"updated_at": "2026-05-11T10:22:59Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -48,7 +48,7 @@
"version": "0.0.0",
"description": "Declarative Automation Bundles (DABs) for deploying and managing Databricks resources",
"experimental": false,
"updated_at": "2026-04-23T13:47:44Z",
"updated_at": "2026-05-05T15:31:42Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -66,7 +66,7 @@
"version": "0.1.0",
"description": "Databricks Jobs orchestration and scheduling",
"experimental": false,
"updated_at": "2026-04-23T13:47:44Z",
"updated_at": "2026-05-07T15:19:50Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -78,7 +78,7 @@
"version": "0.1.0",
"description": "Databricks Lakebase Postgres: projects, scaling, connectivity, synced tables, and Data API",
"experimental": false,
"updated_at": "2026-04-30T11:02:37Z",
"updated_at": "2026-05-11T10:23:05Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -93,7 +93,7 @@
"version": "0.1.0",
"description": "Databricks Model Serving endpoint management",
"experimental": false,
"updated_at": "2026-04-23T13:47:44Z",
"updated_at": "2026-05-07T15:19:45Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -105,7 +105,7 @@
"version": "0.1.0",
"description": "Databricks Pipelines (DLT) for ETL and streaming",
"experimental": false,
"updated_at": "2026-04-23T13:47:44Z",
"updated_at": "2026-05-07T15:19:55Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand Down Expand Up @@ -152,7 +152,7 @@
"version": "0.1.0",
"description": "Migrate Databricks workloads from classic compute to serverless compute, including compatibility checks and concrete fixes",
"experimental": false,
"updated_at": "2026-04-24T15:10:23Z",
"updated_at": "2026-05-07T15:19:59Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand Down
2 changes: 2 additions & 0 deletions skills/databricks-apps/references/appkit/genie.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,8 @@ function GeniePage() {

Update smoke tests if headings or routes changed, then `databricks apps validate`.

For advanced Genie plugin usage, see `npx @databricks/appkit docs ./docs/plugins/genie.md`.

## Frontend

**For full component API**: run `npx @databricks/appkit docs "GenieChat"`.
Expand Down
146 changes: 80 additions & 66 deletions skills/databricks-apps/references/appkit/lakebase.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ Where `<BRANCH_NAME>` and `<DATABASE_NAME>` are full resource names (e.g. `proje

Use the `databricks-lakebase` skill to create a Lakebase project and discover branch/database resource names before running this command.

> For multi-environment deployments (dev/prod), use `variables:` and `targets:` blocks in `databricks.yml` — see the **`databricks-dabs`** skill for patterns.

**Get resource names** (if you have an existing project):
```bash
# List branches → use the name field of a READY branch
Expand All @@ -50,35 +52,32 @@ databricks postgres list-databases projects/<PROJECT_ID>/branches/<BRANCH_ID> --
```
my-app/
├── server/
│ └── server.ts # Backend with Lakebase pool + tRPC routes
│ └── server.ts # Backend with Lakebase plugin + Express routes
├── client/
│ └── src/
│ └── App.tsx # React frontend
├── app.yaml # Manifest with database resource declaration
└── package.json # Includes @databricks/lakebase dependency
```

Note: **No `config/queries/` directory** — Lakebase apps use server-side `pool.query()` calls, not SQL files.
Note: **No `config/queries/` directory** — Lakebase apps use server-side `appkit.lakebase.query()` calls, not SQL files.

## Lakebase Plugin API

## `createLakebasePool` API
Scaffolding with `--features lakebase` (see above) generates this pattern. Access Lakebase through the plugin handle returned by `createApp()`:

```typescript
import { createLakebasePool } from "@databricks/lakebase";
// or: import { createLakebasePool } from "@databricks/appkit";

const pool = createLakebasePool({
// All fields optional — auto-populated from env vars when deployed
host: process.env.PGHOST, // Lakebase hostname
database: process.env.PGDATABASE, // Database name
endpoint: process.env.LAKEBASE_ENDPOINT, // Endpoint resource path
user: process.env.PGUSER, // Service principal client ID
max: 10, // Connection pool size
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 10000,
import { createApp, lakebase } from "@databricks/appkit";

const appkit = await createApp({
plugins: [lakebase()],
});

// Query via the plugin handle — handles pooling and token refresh automatically
const result = await appkit.lakebase.query("SELECT * FROM users WHERE id = $1", [userId]);
```

Call `createLakebasePool()` **once at module level** (server startup), not inside request handlers.
The `lakebase()` plugin auto-configures from platform-injected env vars at deploy time. No manual pool setup needed.

## Environment Variables (auto-set when deployed with database resource)

Expand All @@ -91,58 +90,70 @@ Call `createLakebasePool()` **once at module level** (server startup), not insid
| `PGSSLMODE` | SSL mode (`require`) |
| `LAKEBASE_ENDPOINT` | Endpoint resource path |

## tRPC CRUD Pattern
## CRUD Routes Pattern

Always use tRPC for Lakebase operations — do NOT call `pool.query()` from the client.
Always use server-side routes for Lakebase operations — do NOT call `appkit.lakebase.query()` from the client. Use `server.extend()` to register Express routes:

```typescript
// server/server.ts
import { initTRPC } from '@trpc/server';
import { createLakebasePool } from "@databricks/lakebase";
import { createApp, server, lakebase } from "@databricks/appkit";
import { z } from 'zod';
import superjson from 'superjson'; // requires: npm install superjson

const pool = createLakebasePool(); // reads env vars automatically

const t = initTRPC.create({ transformer: superjson });
const publicProcedure = t.procedure;

export const appRouter = t.router({
listItems: publicProcedure.query(async () => {
const { rows } = await pool.query(
"SELECT * FROM app_data.items ORDER BY created_at DESC LIMIT 100"
);
return rows;
}),

createItem: publicProcedure
.input(z.object({ name: z.string().min(1) }))
.mutation(async ({ input }) => {
const { rows } = await pool.query(
"INSERT INTO app_data.items (name) VALUES ($1) RETURNING *",
[input.name]

createApp({
plugins: [server({ autoStart: false }), lakebase()],
})
.then(async (appkit) => {
// Schema init (runs once at startup)
await appkit.lakebase.query(`
CREATE SCHEMA IF NOT EXISTS app_data;
CREATE TABLE IF NOT EXISTS app_data.items (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
return rows[0];
}),

deleteItem: publicProcedure
.input(z.object({ id: z.number() }))
.mutation(async ({ input }) => {
await pool.query("DELETE FROM app_data.items WHERE id = $1", [input.id]);
return { success: true };
}),
});
`);

// CRUD routes via Express
appkit.server.extend((app) => {
app.get('/api/items', async (_req, res) => {
const { rows } = await appkit.lakebase.query(
"SELECT * FROM app_data.items ORDER BY created_at DESC LIMIT 100"
);
res.json(rows);
});

app.post('/api/items', async (req, res) => {
const parsed = z.object({ name: z.string().min(1) }).safeParse(req.body);
if (!parsed.success) { res.status(400).json({ error: 'Invalid input' }); return; }
const { rows } = await appkit.lakebase.query(
"INSERT INTO app_data.items (name) VALUES ($1) RETURNING *",
[parsed.data.name]
);
res.status(201).json(rows[0]);
});

app.delete('/api/items/:id', async (req, res) => {
const id = parseInt(req.params.id, 10);
if (isNaN(id)) { res.status(400).json({ error: 'Invalid id' }); return; }
await appkit.lakebase.query("DELETE FROM app_data.items WHERE id = $1", [id]);
res.status(204).send();
});
});

await appkit.server.start();
})
.catch(console.error);
```

> **Deploy first (App + Lakebase only)!** When your Databricks App uses Lakebase, the Service Principal must create and own the schema. Run `databricks apps deploy` before any local development. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for details.

## Schema Initialization

**Always create a custom schema** — the Service Principal cannot access any existing schemas (including `public`). It must create the schema itself to become its owner. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for the full permission model and deploy-first workflow. Initialize tables on server startup:
**Always create a custom schema** — the Service Principal cannot access any existing schemas (including `public`). It must create the schema itself to become its owner. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for the full permission model and deploy-first workflow. Initialize tables inside the `.then()` callback before registering routes (see CRUD pattern above):

```typescript
// server/server.ts — run once at startup before handling requests
await pool.query(`
// Inside onPluginsReady — runs once at startup before handling requests
await appkit.lakebase.query(`
CREATE SCHEMA IF NOT EXISTS app_data;
CREATE TABLE IF NOT EXISTS app_data.items (
id SERIAL PRIMARY KEY,
Expand All @@ -154,26 +165,28 @@ await pool.query(`

## ORM Integration (Optional)

The pool returned by `createLakebasePool()` is a standard `pg.Pool` — works with any PostgreSQL library:
The plugin exposes the raw `pg.Pool` via `appkit.lakebase.pool` — works with any PostgreSQL library:

```typescript
// Drizzle ORM
import { drizzle } from "drizzle-orm/node-postgres";
const db = drizzle(pool);
const db = drizzle(appkit.lakebase.pool);

// Prisma (with @prisma/adapter-pg)
import { PrismaPg } from "@prisma/adapter-pg";
const adapter = new PrismaPg(pool);
const adapter = new PrismaPg(appkit.lakebase.pool);
const prisma = new PrismaClient({ adapter });
```

For ORM-compatible config: `appkit.lakebase.getOrmConfig()`.

## Reading from Lakebase synced tables

Lakebase synced tables materialize Delta/UC tables into Lakebase Postgres for low-latency app reads. The lakehouse remains the source of truth; Lakebase serves as a read-optimized index.

**Architecture:**
```
Delta gold tables → Synced tables (read-only) → App reads via pool.query()
Delta gold tables → Synced tables (read-only) → App reads via appkit.lakebase.query()
App writes → Lakebase OLTP tables → optional Lakehouse Sync → Delta
```

Expand All @@ -183,7 +196,7 @@ App writes → Lakebase OLTP tables → optional Lakehouse Sync

### How It Works

Synced tables (created via `databricks postgres create-synced-table`) appear as regular Postgres tables. From the app's perspective, use the same `pool.query()` pattern but **read-only**.
Synced tables (created via `databricks postgres create-synced-table`) appear as regular Postgres tables. From the app's perspective, use the same `appkit.lakebase.query()` pattern but **read-only**.

**Key differences from CRUD tables:**

Expand All @@ -197,19 +210,20 @@ Synced tables (created via `databricks postgres create-synced-table`) appear as

**Permission grant required:** The app's SP has `CAN_CONNECT_AND_CREATE` but does **not** have `pg_read_all_data`. To read synced tables, the project owner must grant access — see the **`databricks-lakebase`** skill's SKILL.md "Grant app SP access to synced tables" section for the SQL commands and psql connection steps.

**Example tRPC route reading synced taxi data:**
**Example Express route reading synced taxi data:**

```typescript
topPickups: publicProcedure.query(async () => {
const { rows } = await pool.query(`
// Inside onPluginsReady → appkit.server.extend((app) => { ... })
app.get('/api/top-pickups', async (_req, res) => {
const { rows } = await appkit.lakebase.query(`
SELECT pickup_zip, COUNT(*) AS trip_count, AVG(fare_amount) AS avg_fare
FROM public.nyc_trips
GROUP BY pickup_zip
ORDER BY trip_count DESC
LIMIT 10
`);
return rows;
}),
res.json(rows);
});
```

> **Do not write to synced tables.** The sync pipeline manages the data — direct writes corrupt the sync state. For mixed read/write patterns, read from synced tables and write to separate app-owned tables. To create synced tables and grant the app's SP read access, see the **`databricks-lakebase`** skill's [synced-tables.md](../../../databricks-lakebase/references/synced-tables.md) and the "Grant app SP access to synced tables" section in its SKILL.md.
Expand All @@ -219,8 +233,8 @@ topPickups: publicProcedure.query(async () => {
| | Analytics | Lakebase |
|--|-----------|---------|
| SQL dialect | Databricks SQL (Spark SQL) | Standard PostgreSQL |
| Query location | `config/queries/*.sql` files | `pool.query()` in tRPC routes |
| Data retrieval | `useAnalyticsQuery` hook | tRPC query procedure |
| Query location | `config/queries/*.sql` files | `appkit.lakebase.query()` in Express routes |
| Data retrieval | `useAnalyticsQuery` hook | Express route via `server.extend()` |
| Date functions | `CURRENT_TIMESTAMP()`, `DATEDIFF(DAY, ...)` | `NOW()`, `AGE(...)` |
| Auto-increment | N/A | `SERIAL` or `GENERATED ALWAYS AS IDENTITY` |
| Insert pattern | N/A | `INSERT ... VALUES ($1) RETURNING *` |
Expand Down
4 changes: 4 additions & 0 deletions skills/databricks-apps/references/appkit/model-serving.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,10 @@ const result = await trpc.queryModel.query({ prompt: userInput });
const answer = result.choices?.[0]?.message?.content;
```

For streaming and advanced patterns, see `npx @databricks/appkit docs ./docs/plugins/model-serving.md`.

AppKit integrates with **Model Serving endpoints**. AI Gateway (beta) endpoints are not directly supported — use the underlying Model Serving endpoint name instead. AI Gateway features (rate limits, usage tracking) can be configured on Model Serving endpoints via the `databricks-model-serving` skill.

## Troubleshooting

| Error | Cause | Solution |
Expand Down
2 changes: 1 addition & 1 deletion skills/databricks-apps/references/appkit/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ Do not guess paths — run without args first, then pick from the index.
| Use `useAnalyticsQuery` | [AppKit SDK](appkit-sdk.md) — memoization, conditional queries |
| Add chart/table components | [Frontend](frontend.md) — component quick reference, anti-patterns |
| Add API mutation endpoints | [tRPC](trpc.md) — only if you need server-side logic |
| Use Lakebase for CRUD / persistent state | [Lakebase](lakebase.md) — createLakebasePool, tRPC patterns, schema init |
| Use Lakebase for CRUD / persistent state | [Lakebase](lakebase.md) — Lakebase plugin API, tRPC patterns, schema init |
| Add Genie chat | [Genie](genie.md) — space creation, plugin setup, frontend components |
| Call ML model serving endpoints | [Model Serving](model-serving.md) — resource declaration, tRPC query pattern |
| Trigger / monitor Lakeflow Jobs from the app | [Jobs](jobs.md) — env discovery, JobHandle API, SSE streaming |
Expand Down
2 changes: 1 addition & 1 deletion skills/databricks-apps/references/appkit/trpc.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ databricks apps manifest --profile <PROFILE>
**Key plugins to check for:**

- **analytics** — provides SQL warehouse query execution (do NOT reimplement with tRPC)
- **lakebase** — provides `createLakebasePool` for PostgreSQL CRUD (use pool in tRPC routes, don't create raw connections)
- **lakebase** — provides Lakebase plugin for PostgreSQL CRUD (use plugin in tRPC routes, don't create raw connections)
- **genie** — provides Genie AI-powered data exploration (check before building custom natural-language-to-SQL routes)
- **files** — provides file storage and retrieval helpers (check before writing custom file upload/download routes)

Expand Down
1 change: 1 addition & 0 deletions skills/databricks-apps/references/platform-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,4 @@ For long-running agent interactions, use **WebSockets** instead of SSE.
| OBO scopes missing after deploy | Destructive update wiped them | Re-apply scopes after each deploy |
| `${var.xxx}` appears literally in env | Variables not resolved in config | Use literal values, not bundle variables |
| 504 Gateway Timeout | Request exceeded 120s | Use WebSockets for long operations |
| `user token passthrough not enabled` | `user_api_scopes` in `databricks.yml` requires user authorization, which is not enabled in the workspace | Ask workspace admin to enable user authorization (Public Preview). See [Databricks Apps auth docs](https://docs.databricks.com/aws/en/dev-tools/databricks-apps/auth#user-authorization) |
1 change: 1 addition & 0 deletions skills/databricks-core/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ For specific products, use dedicated skills:
- **If the CLI is missing or outdated (< v0.292.0): STOP. Do not proceed or work around a missing CLI.**
- **Read the [CLI Installation](databricks-cli-install.md) reference file and follow the instructions to guide the user through installation.**
- Note: In sandboxed environments (Cursor IDE, containers), install commands write outside the workspace and may be blocked. Present the install command to the user and ask them to run it in their own terminal.
- **Exception:** If CLI installation is blocked (sandboxed containers, restricted environments), ask the user whether to fall back to direct REST API calls using `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment variables if present in the shell. See the [Databricks REST API docs](https://docs.databricks.com/api/workspace/introduction).

2. **Authenticated**: `databricks auth profiles`
- If not: see [CLI Authentication](databricks-cli-auth.md)
Expand Down
12 changes: 12 additions & 0 deletions skills/databricks-lakebase/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,18 @@ Get SP client ID: `databricks apps get <APP_NAME> --profile <PROFILE>` → `serv
**Data API:** PostgREST-compatible HTTP CRUD on Postgres tables. See [connectivity.md](references/connectivity.md).
**Synced Tables:** Sync Delta tables into Lakebase. See [synced-tables.md](references/synced-tables.md).

## PostgreSQL Extensions

Lakebase supports PostgreSQL extensions (e.g., `pgvector` for vector embeddings, `pg_stat_statements` for query statistics). See the [full list of supported extensions](https://docs.databricks.com/aws/en/oltp/projects/extensions).

```sql
-- List available extensions
SELECT * FROM pg_available_extensions ORDER BY name;

-- Install an extension
CREATE EXTENSION IF NOT EXISTS <extension_name>;
```

## Troubleshooting

| Error | Solution |
Expand Down
Loading
Loading