databricks · jamesbroadhead · Apr 28, 2026 · Apr 28, 2026 · Apr 28, 2026 · Apr 28, 2026
@@ -156,6 +156,8 @@ npx @databricks/appkit docs ./docs/plugins/analytics.md  # example: specific doc
 
 **DO NOT guess** plugin names, resource keys, or property names — always derive them from `databricks apps manifest` output. Example: if the manifest shows plugin `analytics` with a required resource `resourceKey: "sql-warehouse"` and `fields: { "id": ... }`, include `--set analytics.sql-warehouse.id=<ID>`.
 
+3. **Verify resources after init.** Open `databricks.yml` and confirm `resources.apps.<app>.resources` contains a block for **every** required resource from every plugin you included (manifest's `resources.required`). For example, `--features analytics,genie,lakebase` must produce three blocks: `sql_warehouse`, `genie_space`, **and** `database`. A missing resource means the Apps platform won't grant the SP access to that resource at deploy time, and the app will fail at runtime — typically with `password authentication failed for user '<SP_UUID>'` (Lakebase), `403` from the SQL warehouse, or `CAN_RUN denied` (Genie). Fix by re-running `init` with the missing `--set` flag, not by hand-editing the YAML — the YAML is a generated artifact and your edit will be lost the next time someone re-scaffolds.
+
 **READ [AppKit Overview](references/appkit/overview.md)** for project structure, workflow, and pre-implementation checklist.
 
 **Genie Agent Workflow** — when the user wants a Genie-powered app, do **not** start by asking for a Genie Space ID. Instead:

@@ -246,6 +246,8 @@ Check the response for the `active_deployment` field. If it exists with `status.
 
 If you skip this step, the Service Principal won't own the database schema. You'll create schemas under your credentials that the SP **cannot access** after deployment. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for the full workflow and recovery steps.
 
+> **First deploy with `lakebase`:** confirm `databricks.yml` declares a `database` resource on the app (alongside `sql_warehouse`, `genie_space`, etc.). Apps platform auto-creates the SP's Postgres role only when the database is attached as an app resource — without it, the deployed app fails with `password authentication failed for user '<UUID>'`. If the resource is missing, re-run `databricks apps init` with `--set lakebase.postgres.branch=...` and `--set lakebase.postgres.database=...`; if you can't (shared Lakebase, custom permissions), use the manual SQL fallback in the **`databricks-lakebase`** skill's **Grant app SP for AppKit / CRUD apps** section.
+
 The Lakebase env vars (`PGHOST`, `PGDATABASE`, etc.) are auto-set only when deployed. For local development, get the connection details from your endpoint and set them manually:
 
 ```bash
@@ -261,11 +263,18 @@ Then create `server/.env` with the values from the endpoint response:
 PGHOST=<host from endpoint>
 PGPORT=5432
 PGDATABASE=<your database name>
-PGUSER=<your service principal client ID>
+PGUSER=<see note below>
 PGSSLMODE=require
 LAKEBASE_ENDPOINT=projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID>
 ```
 
+> **`PGUSER` must match the credentials the AppKit dev server uses.** The Postgres role in `PGUSER` has to correspond to the principal that produced `PGPASSWORD` (the OAuth token).
+>
+> - **Default (personal Databricks profile):** AppKit's local server authenticates as your Databricks user, so `PGUSER` is your Databricks username/email. Tables created locally will be owned by your user, not the SP — that's why the deploy-first workflow exists.
+> - **Testing the deployed flow locally:** export `DATABRICKS_CLIENT_ID=<SP_CLIENT_ID>` and `DATABRICKS_CLIENT_SECRET=...` so the dev server authenticates as the SP. Then `PGUSER=<SP_CLIENT_ID>` matches.
+>
+> If `PGUSER` and the OAuth token disagree, Postgres rejects the connection with `password authentication failed for user '<UUID>'`.
+
 Load `server/.env` in your dev server (e.g. via `dotenv` or `node --env-file=server/.env`). Never commit `.env` files — add `server/.env` to `.gitignore`.
 
 ## Troubleshooting
@@ -276,5 +285,6 @@ Load `server/.env` in your dev server (e.g. via `dotenv` or `node --env-file=ser
 | `permission denied for schema <name>` | Schema was created by another role (e.g. you ran locally before deploying) | **Ask the user before dropping** — `DROP SCHEMA` deletes all data. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for options |
 | Works locally but `permission denied` after deploy | Local credentials created the schema; the SP can't access schemas it doesn't own | **Ask the user before dropping** — warn about data loss, then deploy first. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for options |
 | `connection refused` | Pool not connected or wrong env vars | Check `PGHOST`, `PGPORT`, `LAKEBASE_ENDPOINT` are set |
+| `password authentication failed for user '<UUID>'` | App's `databricks.yml` is missing a `database` resource — Apps platform never auto-created the SP's Postgres role on attach | Add the missing `database` resource (re-run `databricks apps init` with `--set lakebase.postgres.branch=...` and `--set lakebase.postgres.database=...`), redeploy. Manual SQL fallback: see **`databricks-lakebase`**'s **Grant app SP for AppKit / CRUD apps** |
 | `relation "X" does not exist` | Tables not initialized | Run `CREATE TABLE IF NOT EXISTS` at startup |
 | App builds but pool fails at runtime | Env vars not set locally | Set vars in `server/.env` — see Local Development above |
@@ -211,22 +211,21 @@ databricks postgres create-endpoint projects/<PROJECT_ID>/branches/<BRANCH_ID> <
 ```
 
 **Run SQL against Lakebase** (GRANT, CREATE INDEX, etc.):
-```bash
-# 1. Get endpoint host
-databricks postgres get-endpoint projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> --profile <PROFILE>
 
-# 2. Generate OAuth token
-databricks postgres generate-database-credential \
-  projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> \
-  --profile <PROFILE>
+Preferred — `databricks psql` wrapper handles auth, host discovery, and TLS in one call:
+```bash
+databricks psql --profile <PROFILE> --project <PROJECT_ID> --branch <BRANCH_ID> --endpoint <ENDPOINT_ID> \
+  -- -d databricks_postgres -f path/to/script.sql
 
-# 3. Connect (use token from step 2 as password, host from step 1)
-PGPASSWORD='<TOKEN>' psql "host=<HOST> user=<USERNAME> dbname=databricks_postgres sslmode=require"
+# One-off statement
+databricks psql --profile <PROFILE> --project <PROJECT_ID> -- -d databricks_postgres -c "SELECT 1"
 ```
 
-> **Note:** `generate-database-credential` requires the **endpoint** resource path (`.../endpoints/<ENDPOINT_ID>`), not a database or branch path.
+> **`--profile` placement.** All `databricks` flags (including `--profile`) MUST come before the `--` separator. Anything after `--` is forwarded verbatim to `psql`, which doesn't understand `--profile` and will exit with `psql: error: unrecognized option`.
 
-**Scriptable version** (single copy-paste, useful for agents):
+Requires `psql` on `PATH` (the wrapper shells out to it). Branch/endpoint default to the only one when there is just one.
+
+Manual form (use when the wrapper isn't available):
 ```bash
 EP=projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID>
 # get-endpoint JSON shape: {"status": {"hosts": {"host": "<HOSTNAME>"}, ...}, ...}
@@ -237,13 +236,53 @@ TOKEN=$(databricks postgres generate-database-credential $EP --profile <PROFILE>
 PGPASSWORD="$TOKEN" psql "host=$HOST user=<USERNAME> dbname=databricks_postgres sslmode=require"
 ```
 
-**Grant app SP access to synced tables** (run as project owner after sync is ONLINE and app is deployed):
+> **Note:** `generate-database-credential` requires the **endpoint** resource path (`.../endpoints/<ENDPOINT_ID>`), not a database or branch path.
+
+**Grant app SP access to synced tables** (read-only) — run as project owner after sync is ONLINE and app is deployed:
 ```sql
 GRANT USAGE ON SCHEMA public TO "<SP_CLIENT_ID>";
 GRANT SELECT ON ALL TABLES IN SCHEMA public TO "<SP_CLIENT_ID>";
 ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO "<SP_CLIENT_ID>";
 ```
-For least-privilege, consider syncing into a dedicated schema instead of `public` so the grant is scoped to synced data only.
+
+> **Default privileges caveat.** `ALTER DEFAULT PRIVILEGES` without `FOR ROLE` only applies to tables created by the role running this statement. If sync pipelines create new tables under a different role, re-run `GRANT SELECT ON ALL TABLES IN SCHEMA public TO "<SP_CLIENT_ID>"` after each new table appears, or add `FOR ROLE <pipeline_role>` once you know which role the sync runs as.
+
+**Grant app SP for AppKit / CRUD apps** (full DML).
+
+> **First check: is the Lakebase declared as an app resource?** When the Apps platform attaches a `database` resource (declared in the app's `databricks.yml` under `resources.apps.<app>.resources`) to an app on deploy, it auto-creates the SP's Postgres role with `CAN_CONNECT_AND_CREATE`. If the SP is failing to connect with `password authentication failed for user '<SP_CLIENT_ID>'`, the most likely cause is a missing `database` resource — fix that first, redeploy, and the auto-grant fires. See the `databricks-apps` skill (Scaffolding) for verifying every required plugin resource is declared.
+>
+> The SQL block below is the **fallback** for cases the resource form doesn't cover: granting access to an existing Lakebase the app spec doesn't own (shared across apps, pre-existing schema with custom permissions, post-hoc grants for additional tables/sequences).
+
+Manual fallback — create the role and grant DML, in one psql round-trip:
+```sql
+CREATE EXTENSION IF NOT EXISTS databricks_auth;
+
+DO $$
+DECLARE
+  sp TEXT := '<SP_CLIENT_ID>';   -- from `databricks apps get <APP> -o json | jq -r .service_principal_client_id`
+BEGIN
+  PERFORM databricks_create_role(sp, 'SERVICE_PRINCIPAL');
+  EXECUTE format('GRANT CONNECT ON DATABASE "databricks_postgres" TO %I', sp);
+  EXECUTE format('GRANT ALL ON SCHEMA public TO %I', sp);
+  EXECUTE format('GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO %I', sp);
+  EXECUTE format('GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO %I', sp);
+  EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO %I', sp);
+  EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO %I', sp);
+END $$;
+```
+Pipe through `databricks psql` (above). The block is idempotent; re-running is safe.
+
+The role-creation step alone has a CLI form too (useful when granting privileges separately):
+```bash
+databricks postgres create-role projects/<PROJECT_ID>/branches/<BRANCH_ID> \
+  --role-id <SP_CLIENT_ID> \
+  --json '{"spec":{"identity_type":"SERVICE_PRINCIPAL","postgres_role":"<SP_CLIENT_ID>","auth_method":"LAKEBASE_OAUTH_V1"}}' \
+  --profile <PROFILE>
+```
+
+> **Least privilege.** The example creates the role with default privileges only — grant database/schema/table access via the explicit `GRANT` statements above. Don't add `membership_roles: ["DATABRICKS_SUPERUSER"]` for an app SP unless broad administrative access is intentional; superuser membership lets the app role read every Lakebase database, not just its own.
+
+> **CLI body shape.** `databricks postgres create-role`'s `--json` flag binds to the inner `Role` object — fields go directly under `spec`, **not** wrapped in `{"role": ...}`. The error `Field 'role' is required and must contain at least one subfield with a non-default value` means the inner Role had no recognized fields (often because someone wrapped the body, which the CLI strips with `Warning: unknown field: role` and ships an empty body). The CLI also doesn't yet expose convenience flags like `--spec.identity-type` ([cmd/workspace/postgres/postgres.go](https://github.com/databricks/cli/blob/main/cmd/workspace/postgres/postgres.go) marks `spec` as TODO), so you must hand-craft the JSON.
 
 Get SP client ID: `databricks apps get <APP_NAME> --profile <PROFILE>` → `service_principal_client_id` field.
 
@@ -257,6 +296,8 @@ Get SP client ID: `databricks apps get <APP_NAME> --profile <PROFILE>` → `serv
 | `cannot configure default credentials` | Use `--profile` flag or authenticate first |
 | `PERMISSION_DENIED` | Check workspace permissions |
 | `permission denied for schema` | Schema owned by another role. Deploy app first so SP creates/owns it |
+| `password authentication failed for user '<UUID>'` (deployed app) | SP has no Postgres role on the branch yet. Run the **Grant app SP for AppKit / CRUD apps** SQL block above, then restart the app |
+| `Field 'role' is required` from `databricks postgres create-role` | `--json` binds to the inner `Role`. Pass fields directly under `spec` (no `{"role": ...}` wrapper). See the CLI body-shape note in **Grant app SP for AppKit / CRUD apps** |
 | Protected branch won't delete | `update-branch` to set `spec.is_protected` to `false` first |
 | Long-running operation timeout | Use `--no-wait` and poll with `get-operation` |
 | Token expired during long query | Tokens expire after 1 hour; implement refresh (see [connectivity.md](references/connectivity.md)) |

@@ -140,6 +140,42 @@ For production apps, combine with Pattern 2's token refresh loop and SQLAlchemy
 - **Handle scale-to-zero reconnection** — first connection after idle may take ~100ms; implement retry
 - **psycopg2 or psycopg3** — both work; psycopg3 recommended for new development (better async, pooling)
 
+## DNS Resolution (macOS)
+
+Python's `socket.getaddrinfo()` can fail with long Lakebase hostnames on macOS. Workaround: resolve via `dig`, then pass the IP through `hostaddr` while keeping `host` for TLS SNI.
+
+```bash
+# Resolve the Lakebase hostname to an IP
+dig +short <ENDPOINT_HOST>
+```
+
+```python
+import subprocess
+
+def resolve_host(hostname: str) -> str:
+    try:
+        result = subprocess.run(
+            ["dig", "+short", hostname], capture_output=True, text=True, check=False
+        )
+    except FileNotFoundError as e:
+        raise RuntimeError("'dig' is not installed; install it (e.g. `apt-get install dnsutils`) or use socket.getaddrinfo() instead") from e
+    lines = result.stdout.strip().splitlines()
+    if not lines:
+        raise RuntimeError(f"DNS resolution failed for {hostname}")
+    return lines[0]
+
+ip = resolve_host(endpoint_host)
+
+conn = psycopg.connect(
+    host=endpoint_host,      # kept for TLS SNI verification
+    hostaddr=ip,             # bypasses getaddrinfo()
+    dbname="databricks_postgres",
+    user=username,
+    password=token,
+    sslmode="require",
+)
+```
+
 ## Data API
 
 PostgREST-compatible HTTP API for CRUD operations on Postgres tables. **Autoscaling only.**