Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions skills/databricks-apps/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,8 @@ npx @databricks/appkit docs ./docs/plugins/analytics.md # example: specific doc

**DO NOT guess** plugin names, resource keys, or property names — always derive them from `databricks apps manifest` output. Example: if the manifest shows plugin `analytics` with a required resource `resourceKey: "sql-warehouse"` and `fields: { "id": ... }`, include `--set analytics.sql-warehouse.id=<ID>`.

3. **Verify resources after init.** Open `databricks.yml` and confirm `resources.apps.<app>.resources` contains a block for **every** required resource from every plugin you included (manifest's `resources.required`). For example, `--features analytics,genie,lakebase` must produce three blocks: `sql_warehouse`, `genie_space`, **and** `database`. A missing resource means the Apps platform won't grant the SP access to that resource at deploy time, and the app will fail at runtime — typically with `password authentication failed for user '<SP_UUID>'` (Lakebase), `403` from the SQL warehouse, or `CAN_RUN denied` (Genie). Fix by re-running `init` with the missing `--set` flag, not by hand-editing the YAML — the YAML is a generated artifact and your edit will be lost the next time someone re-scaffolds.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above:

  • database isn't the right resource name
  • it would be great to point to the Scaffolding section again


**READ [AppKit Overview](references/appkit/overview.md)** for project structure, workflow, and pre-implementation checklist.

**Genie Agent Workflow** — when the user wants a Genie-powered app, do **not** start by asking for a Genie Space ID. Instead:
Expand Down
12 changes: 11 additions & 1 deletion skills/databricks-apps/references/appkit/lakebase.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,8 @@ Check the response for the `active_deployment` field. If it exists with `status.

If you skip this step, the Service Principal won't own the database schema. You'll create schemas under your credentials that the SP **cannot access** after deployment. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for the full workflow and recovery steps.

> **First deploy with `lakebase`:** confirm `databricks.yml` declares a `database` resource on the app (alongside `sql_warehouse`, `genie_space`, etc.). Apps platform auto-creates the SP's Postgres role only when the database is attached as an app resource — without it, the deployed app fails with `password authentication failed for user '<UUID>'`. If the resource is missing, re-run `databricks apps init` with `--set lakebase.postgres.branch=...` and `--set lakebase.postgres.database=...`; if you can't (shared Lakebase, custom permissions), use the manual SQL fallback in the **`databricks-lakebase`** skill's **Grant app SP for AppKit / CRUD apps** section.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

database is the old Lakebase Autoscaling resource, so IMO agent might be confused with such resource name.

The new (Lakebase Autoscaling) resource is postgres:

Example:

bundle:
  name: lakebs

variables:
  postgres_branch:
    description: Full Lakebase Postgres branch resource name. Obtain by running `databricks postgres list-branches projects/{project-id}`, select the desired item from the output array and use its .name value.
  postgres_database:
    description: Full Lakebase Postgres database resource name. Obtain by running `databricks postgres list-databases {branch-name}`, select the desired item from the output array and use its .name value. Requires the branch resource name.

resources:
  apps:
    app:
      name: "lakebs"
      description: "A Databricks App powered by AppKit"
      source_code_path: ./
      # Uncomment to enable on behalf of user API scopes. Available scopes: sql, dashboards.genie, files.files, serving.serving-endpoints
      # user_api_scopes:
      #   - sql

      # The resources which this app has access to.
      resources:
        - name: postgres
          postgres:
            branch: ${var.postgres_branch}
            database: ${var.postgres_database}
            permission: CAN_CONNECT_AND_CREATE

targets:
  default:
    default: true
    workspace:
      host: https://e2-dogfood.staging.cloud.databricks.com

    variables:
      postgres_branch: projects/pkosiec/branches/production
      postgres_database: projects/pkosiec/branches/production/databases/db-dmfv-24qipl4z1k

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid duplicating the apps init command and instead, point Agent to the Scaffolding section? It'll be hard to maintain so many occurrences of the init command. Thanks!


The Lakebase env vars (`PGHOST`, `PGDATABASE`, etc.) are auto-set only when deployed. For local development, get the connection details from your endpoint and set them manually:

```bash
Expand All @@ -261,11 +263,18 @@ Then create `server/.env` with the values from the endpoint response:
PGHOST=<host from endpoint>
PGPORT=5432
PGDATABASE=<your database name>
PGUSER=<your service principal client ID>
PGUSER=<see note below>
PGSSLMODE=require
LAKEBASE_ENDPOINT=projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID>
```

> **`PGUSER` must match the credentials the AppKit dev server uses.** The Postgres role in `PGUSER` has to correspond to the principal that produced `PGPASSWORD` (the OAuth token).
>
> - **Default (personal Databricks profile):** AppKit's local server authenticates as your Databricks user, so `PGUSER` is your Databricks username/email. Tables created locally will be owned by your user, not the SP — that's why the deploy-first workflow exists.
> - **Testing the deployed flow locally:** export `DATABRICKS_CLIENT_ID=<SP_CLIENT_ID>` and `DATABRICKS_CLIENT_SECRET=...` so the dev server authenticates as the SP. Then `PGUSER=<SP_CLIENT_ID>` matches.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't possible locally, we cannot get Service Principal's client secret.

>
> If `PGUSER` and the OAuth token disagree, Postgres rejects the connection with `password authentication failed for user '<UUID>'`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'd rather revert changes from line 264 to 277: they don't seem to bring any benefit unless I'm mistaken?.
Maybe we can just change the line 266 and say that:

  • PGUSER doesn't need to be provided when running the app locally (why? we use the currently logged user to Databricks CLI -> don't say that in the skill, no need)
  • PGUSER is injected automatically when app is deployed on Databricks Apps


Load `server/.env` in your dev server (e.g. via `dotenv` or `node --env-file=server/.env`). Never commit `.env` files — add `server/.env` to `.gitignore`.

## Troubleshooting
Expand All @@ -276,5 +285,6 @@ Load `server/.env` in your dev server (e.g. via `dotenv` or `node --env-file=ser
| `permission denied for schema <name>` | Schema was created by another role (e.g. you ran locally before deploying) | **Ask the user before dropping** — `DROP SCHEMA` deletes all data. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for options |
| Works locally but `permission denied` after deploy | Local credentials created the schema; the SP can't access schemas it doesn't own | **Ask the user before dropping** — warn about data loss, then deploy first. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for options |
| `connection refused` | Pool not connected or wrong env vars | Check `PGHOST`, `PGPORT`, `LAKEBASE_ENDPOINT` are set |
| `password authentication failed for user '<UUID>'` | App's `databricks.yml` is missing a `database` resource — Apps platform never auto-created the SP's Postgres role on attach | Add the missing `database` resource (re-run `databricks apps init` with `--set lakebase.postgres.branch=...` and `--set lakebase.postgres.database=...`), redeploy. Manual SQL fallback: see **`databricks-lakebase`**'s **Grant app SP for AppKit / CRUD apps** |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly as before - let's point to the Scaffolding section

| `relation "X" does not exist` | Tables not initialized | Run `CREATE TABLE IF NOT EXISTS` at startup |
| App builds but pool fails at runtime | Env vars not set locally | Set vars in `server/.env` — see Local Development above |
67 changes: 54 additions & 13 deletions skills/databricks-lakebase/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,22 +211,21 @@ databricks postgres create-endpoint projects/<PROJECT_ID>/branches/<BRANCH_ID> <
```

**Run SQL against Lakebase** (GRANT, CREATE INDEX, etc.):
```bash
# 1. Get endpoint host
databricks postgres get-endpoint projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> --profile <PROFILE>

# 2. Generate OAuth token
databricks postgres generate-database-credential \
projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> \
--profile <PROFILE>
Preferred — `databricks psql` wrapper handles auth, host discovery, and TLS in one call:
```bash
databricks psql --profile <PROFILE> --project <PROJECT_ID> --branch <BRANCH_ID> --endpoint <ENDPOINT_ID> \
-- -d databricks_postgres -f path/to/script.sql

# 3. Connect (use token from step 2 as password, host from step 1)
PGPASSWORD='<TOKEN>' psql "host=<HOST> user=<USERNAME> dbname=databricks_postgres sslmode=require"
# One-off statement
databricks psql --profile <PROFILE> --project <PROJECT_ID> -- -d databricks_postgres -c "SELECT 1"
```

> **Note:** `generate-database-credential` requires the **endpoint** resource path (`.../endpoints/<ENDPOINT_ID>`), not a database or branch path.
> **`--profile` placement.** All `databricks` flags (including `--profile`) MUST come before the `--` separator. Anything after `--` is forwarded verbatim to `psql`, which doesn't understand `--profile` and will exit with `psql: error: unrecognized option`.

**Scriptable version** (single copy-paste, useful for agents):
Requires `psql` on `PATH` (the wrapper shells out to it). Branch/endpoint default to the only one when there is just one.

Manual form (use when the wrapper isn't available):
```bash
EP=projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID>
# get-endpoint JSON shape: {"status": {"hosts": {"host": "<HOSTNAME>"}, ...}, ...}
Expand All @@ -237,13 +236,53 @@ TOKEN=$(databricks postgres generate-database-credential $EP --profile <PROFILE>
PGPASSWORD="$TOKEN" psql "host=$HOST user=<USERNAME> dbname=databricks_postgres sslmode=require"
```

**Grant app SP access to synced tables** (run as project owner after sync is ONLINE and app is deployed):
> **Note:** `generate-database-credential` requires the **endpoint** resource path (`.../endpoints/<ENDPOINT_ID>`), not a database or branch path.

**Grant app SP access to synced tables** (read-only) — run as project owner after sync is ONLINE and app is deployed:
```sql
GRANT USAGE ON SCHEMA public TO "<SP_CLIENT_ID>";
GRANT SELECT ON ALL TABLES IN SCHEMA public TO "<SP_CLIENT_ID>";
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO "<SP_CLIENT_ID>";
```
For least-privilege, consider syncing into a dedicated schema instead of `public` so the grant is scoped to synced data only.

> **Default privileges caveat.** `ALTER DEFAULT PRIVILEGES` without `FOR ROLE` only applies to tables created by the role running this statement. If sync pipelines create new tables under a different role, re-run `GRANT SELECT ON ALL TABLES IN SCHEMA public TO "<SP_CLIENT_ID>"` after each new table appears, or add `FOR ROLE <pipeline_role>` once you know which role the sync runs as.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the snippet above (242-246) already does, does it make sense to repeat it?


**Grant app SP for AppKit / CRUD apps** (full DML).

> **First check: is the Lakebase declared as an app resource?** When the Apps platform attaches a `database` resource (declared in the app's `databricks.yml` under `resources.apps.<app>.resources`) to an app on deploy, it auto-creates the SP's Postgres role with `CAN_CONNECT_AND_CREATE`. If the SP is failing to connect with `password authentication failed for user '<SP_CLIENT_ID>'`, the most likely cause is a missing `database` resource — fix that first, redeploy, and the auto-grant fires. See the `databricks-apps` skill (Scaffolding) for verifying every required plugin resource is declared.
>
> The SQL block below is the **fallback** for cases the resource form doesn't cover: granting access to an existing Lakebase the app spec doesn't own (shared across apps, pre-existing schema with custom permissions, post-hoc grants for additional tables/sequences).

Manual fallback — create the role and grant DML, in one psql round-trip:
```sql
CREATE EXTENSION IF NOT EXISTS databricks_auth;

DO $$
DECLARE
sp TEXT := '<SP_CLIENT_ID>'; -- from `databricks apps get <APP> -o json | jq -r .service_principal_client_id`
BEGIN
PERFORM databricks_create_role(sp, 'SERVICE_PRINCIPAL');
EXECUTE format('GRANT CONNECT ON DATABASE "databricks_postgres" TO %I', sp);
EXECUTE format('GRANT ALL ON SCHEMA public TO %I', sp);
EXECUTE format('GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO %I', sp);
EXECUTE format('GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO %I', sp);
EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO %I', sp);
EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO %I', sp);
END $$;
```
Pipe through `databricks psql` (above). The block is idempotent; re-running is safe.

The role-creation step alone has a CLI form too (useful when granting privileges separately):
```bash
databricks postgres create-role projects/<PROJECT_ID>/branches/<BRANCH_ID> \
--role-id <SP_CLIENT_ID> \
--json '{"spec":{"identity_type":"SERVICE_PRINCIPAL","postgres_role":"<SP_CLIENT_ID>","auth_method":"LAKEBASE_OAUTH_V1"}}' \
--profile <PROFILE>
```

> **Least privilege.** The example creates the role with default privileges only — grant database/schema/table access via the explicit `GRANT` statements above. Don't add `membership_roles: ["DATABRICKS_SUPERUSER"]` for an app SP unless broad administrative access is intentional; superuser membership lets the app role read every Lakebase database, not just its own.

> **CLI body shape.** `databricks postgres create-role`'s `--json` flag binds to the inner `Role` object — fields go directly under `spec`, **not** wrapped in `{"role": ...}`. The error `Field 'role' is required and must contain at least one subfield with a non-default value` means the inner Role had no recognized fields (often because someone wrapped the body, which the CLI strips with `Warning: unknown field: role` and ships an empty body). The CLI also doesn't yet expose convenience flags like `--spec.identity-type` ([cmd/workspace/postgres/postgres.go](https://github.com/databricks/cli/blob/main/cmd/workspace/postgres/postgres.go) marks `spec` as TODO), so you must hand-craft the JSON.
Comment on lines +250 to +285
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do that:
once the Lakebase project (branch) is added as a resource to an App, the App runtime creates proper roles with proper permissions. In this PR you already corrected agent to start again from the scaffolding section if the resource is not in the databricks.yml (that's a very good addition).

But let's not try to workaround the whole App resource mechanism. App that uses Lakebase must define the project as an app resource - this is a strict prerequisite here.


Get SP client ID: `databricks apps get <APP_NAME> --profile <PROFILE>` → `service_principal_client_id` field.

Expand All @@ -257,6 +296,8 @@ Get SP client ID: `databricks apps get <APP_NAME> --profile <PROFILE>` → `serv
| `cannot configure default credentials` | Use `--profile` flag or authenticate first |
| `PERMISSION_DENIED` | Check workspace permissions |
| `permission denied for schema` | Schema owned by another role. Deploy app first so SP creates/owns it |
| `password authentication failed for user '<UUID>'` (deployed app) | SP has no Postgres role on the branch yet. Run the **Grant app SP for AppKit / CRUD apps** SQL block above, then restart the app |
| `Field 'role' is required` from `databricks postgres create-role` | `--json` binds to the inner `Role`. Pass fields directly under `spec` (no `{"role": ...}` wrapper). See the CLI body-shape note in **Grant app SP for AppKit / CRUD apps** |
| Protected branch won't delete | `update-branch` to set `spec.is_protected` to `false` first |
| Long-running operation timeout | Use `--no-wait` and poll with `get-operation` |
| Token expired during long query | Tokens expire after 1 hour; implement refresh (see [connectivity.md](references/connectivity.md)) |
Expand Down
36 changes: 36 additions & 0 deletions skills/databricks-lakebase/references/connectivity.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,42 @@ For production apps, combine with Pattern 2's token refresh loop and SQLAlchemy
- **Handle scale-to-zero reconnection** — first connection after idle may take ~100ms; implement retry
- **psycopg2 or psycopg3** — both work; psycopg3 recommended for new development (better async, pooling)

## DNS Resolution (macOS)

Python's `socket.getaddrinfo()` can fail with long Lakebase hostnames on macOS. Workaround: resolve via `dig`, then pass the IP through `hostaddr` while keeping `host` for TLS SNI.

```bash
# Resolve the Lakebase hostname to an IP
dig +short <ENDPOINT_HOST>
```

```python
import subprocess

def resolve_host(hostname: str) -> str:
try:
result = subprocess.run(
["dig", "+short", hostname], capture_output=True, text=True, check=False
)
except FileNotFoundError as e:
raise RuntimeError("'dig' is not installed; install it (e.g. `apt-get install dnsutils`) or use socket.getaddrinfo() instead") from e
lines = result.stdout.strip().splitlines()
if not lines:
raise RuntimeError(f"DNS resolution failed for {hostname}")
return lines[0]

ip = resolve_host(endpoint_host)

conn = psycopg.connect(
host=endpoint_host, # kept for TLS SNI verification
hostaddr=ip, # bypasses getaddrinfo()
dbname="databricks_postgres",
user=username,
password=token,
sslmode="require",
)
```

Comment on lines +143 to +178
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a part of my previous PR but looks like it is not needed after all - could you please cherry pick your commits on top of main to ensure only your changes are added here? Thanks!

## Data API

PostgREST-compatible HTTP API for CRUD operations on Postgres tables. **Autoscaling only.**
Expand Down