-
Notifications
You must be signed in to change notification settings - Fork 28
apps+lakebase: declare-resource as primary path, SQL grant as fallback, psql wrapper #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3f2c071
97a1949
d3f3fda
1153233
eb8ee35
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -246,6 +246,8 @@ Check the response for the `active_deployment` field. If it exists with `status. | |
|
|
||
| If you skip this step, the Service Principal won't own the database schema. You'll create schemas under your credentials that the SP **cannot access** after deployment. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for the full workflow and recovery steps. | ||
|
|
||
| > **First deploy with `lakebase`:** confirm `databricks.yml` declares a `database` resource on the app (alongside `sql_warehouse`, `genie_space`, etc.). Apps platform auto-creates the SP's Postgres role only when the database is attached as an app resource — without it, the deployed app fails with `password authentication failed for user '<UUID>'`. If the resource is missing, re-run `databricks apps init` with `--set lakebase.postgres.branch=...` and `--set lakebase.postgres.database=...`; if you can't (shared Lakebase, custom permissions), use the manual SQL fallback in the **`databricks-lakebase`** skill's **Grant app SP for AppKit / CRUD apps** section. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The new (Lakebase Autoscaling) resource is Example:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we avoid duplicating the apps init command and instead, point Agent to the Scaffolding section? It'll be hard to maintain so many occurrences of the init command. Thanks! |
||
|
|
||
| The Lakebase env vars (`PGHOST`, `PGDATABASE`, etc.) are auto-set only when deployed. For local development, get the connection details from your endpoint and set them manually: | ||
|
|
||
| ```bash | ||
|
|
@@ -261,11 +263,18 @@ Then create `server/.env` with the values from the endpoint response: | |
| PGHOST=<host from endpoint> | ||
| PGPORT=5432 | ||
| PGDATABASE=<your database name> | ||
| PGUSER=<your service principal client ID> | ||
| PGUSER=<see note below> | ||
| PGSSLMODE=require | ||
| LAKEBASE_ENDPOINT=projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> | ||
| ``` | ||
|
|
||
| > **`PGUSER` must match the credentials the AppKit dev server uses.** The Postgres role in `PGUSER` has to correspond to the principal that produced `PGPASSWORD` (the OAuth token). | ||
| > | ||
| > - **Default (personal Databricks profile):** AppKit's local server authenticates as your Databricks user, so `PGUSER` is your Databricks username/email. Tables created locally will be owned by your user, not the SP — that's why the deploy-first workflow exists. | ||
| > - **Testing the deployed flow locally:** export `DATABRICKS_CLIENT_ID=<SP_CLIENT_ID>` and `DATABRICKS_CLIENT_SECRET=...` so the dev server authenticates as the SP. Then `PGUSER=<SP_CLIENT_ID>` matches. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't possible locally, we cannot get Service Principal's client secret. |
||
| > | ||
| > If `PGUSER` and the OAuth token disagree, Postgres rejects the connection with `password authentication failed for user '<UUID>'`. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly, I'd rather revert changes from line 264 to 277: they don't seem to bring any benefit unless I'm mistaken?.
|
||
|
|
||
| Load `server/.env` in your dev server (e.g. via `dotenv` or `node --env-file=server/.env`). Never commit `.env` files — add `server/.env` to `.gitignore`. | ||
|
|
||
| ## Troubleshooting | ||
|
|
@@ -276,5 +285,6 @@ Load `server/.env` in your dev server (e.g. via `dotenv` or `node --env-file=ser | |
| | `permission denied for schema <name>` | Schema was created by another role (e.g. you ran locally before deploying) | **Ask the user before dropping** — `DROP SCHEMA` deletes all data. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for options | | ||
| | Works locally but `permission denied` after deploy | Local credentials created the schema; the SP can't access schemas it doesn't own | **Ask the user before dropping** — warn about data loss, then deploy first. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for options | | ||
| | `connection refused` | Pool not connected or wrong env vars | Check `PGHOST`, `PGPORT`, `LAKEBASE_ENDPOINT` are set | | ||
| | `password authentication failed for user '<UUID>'` | App's `databricks.yml` is missing a `database` resource — Apps platform never auto-created the SP's Postgres role on attach | Add the missing `database` resource (re-run `databricks apps init` with `--set lakebase.postgres.branch=...` and `--set lakebase.postgres.database=...`), redeploy. Manual SQL fallback: see **`databricks-lakebase`**'s **Grant app SP for AppKit / CRUD apps** | | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly as before - let's point to the Scaffolding section |
||
| | `relation "X" does not exist` | Tables not initialized | Run `CREATE TABLE IF NOT EXISTS` at startup | | ||
| | App builds but pool fails at runtime | Env vars not set locally | Set vars in `server/.env` — see Local Development above | | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -211,22 +211,21 @@ databricks postgres create-endpoint projects/<PROJECT_ID>/branches/<BRANCH_ID> < | |
| ``` | ||
|
|
||
| **Run SQL against Lakebase** (GRANT, CREATE INDEX, etc.): | ||
| ```bash | ||
| # 1. Get endpoint host | ||
| databricks postgres get-endpoint projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> --profile <PROFILE> | ||
|
|
||
| # 2. Generate OAuth token | ||
| databricks postgres generate-database-credential \ | ||
| projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> \ | ||
| --profile <PROFILE> | ||
| Preferred — `databricks psql` wrapper handles auth, host discovery, and TLS in one call: | ||
| ```bash | ||
| databricks psql --profile <PROFILE> --project <PROJECT_ID> --branch <BRANCH_ID> --endpoint <ENDPOINT_ID> \ | ||
| -- -d databricks_postgres -f path/to/script.sql | ||
|
|
||
| # 3. Connect (use token from step 2 as password, host from step 1) | ||
| PGPASSWORD='<TOKEN>' psql "host=<HOST> user=<USERNAME> dbname=databricks_postgres sslmode=require" | ||
| # One-off statement | ||
| databricks psql --profile <PROFILE> --project <PROJECT_ID> -- -d databricks_postgres -c "SELECT 1" | ||
| ``` | ||
|
|
||
| > **Note:** `generate-database-credential` requires the **endpoint** resource path (`.../endpoints/<ENDPOINT_ID>`), not a database or branch path. | ||
| > **`--profile` placement.** All `databricks` flags (including `--profile`) MUST come before the `--` separator. Anything after `--` is forwarded verbatim to `psql`, which doesn't understand `--profile` and will exit with `psql: error: unrecognized option`. | ||
|
|
||
| **Scriptable version** (single copy-paste, useful for agents): | ||
| Requires `psql` on `PATH` (the wrapper shells out to it). Branch/endpoint default to the only one when there is just one. | ||
|
|
||
| Manual form (use when the wrapper isn't available): | ||
| ```bash | ||
| EP=projects/<PROJECT_ID>/branches/<BRANCH_ID>/endpoints/<ENDPOINT_ID> | ||
| # get-endpoint JSON shape: {"status": {"hosts": {"host": "<HOSTNAME>"}, ...}, ...} | ||
|
|
@@ -237,13 +236,53 @@ TOKEN=$(databricks postgres generate-database-credential $EP --profile <PROFILE> | |
| PGPASSWORD="$TOKEN" psql "host=$HOST user=<USERNAME> dbname=databricks_postgres sslmode=require" | ||
| ``` | ||
|
|
||
| **Grant app SP access to synced tables** (run as project owner after sync is ONLINE and app is deployed): | ||
| > **Note:** `generate-database-credential` requires the **endpoint** resource path (`.../endpoints/<ENDPOINT_ID>`), not a database or branch path. | ||
|
|
||
| **Grant app SP access to synced tables** (read-only) — run as project owner after sync is ONLINE and app is deployed: | ||
| ```sql | ||
| GRANT USAGE ON SCHEMA public TO "<SP_CLIENT_ID>"; | ||
| GRANT SELECT ON ALL TABLES IN SCHEMA public TO "<SP_CLIENT_ID>"; | ||
| ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO "<SP_CLIENT_ID>"; | ||
| ``` | ||
| For least-privilege, consider syncing into a dedicated schema instead of `public` so the grant is scoped to synced data only. | ||
|
|
||
| > **Default privileges caveat.** `ALTER DEFAULT PRIVILEGES` without `FOR ROLE` only applies to tables created by the role running this statement. If sync pipelines create new tables under a different role, re-run `GRANT SELECT ON ALL TABLES IN SCHEMA public TO "<SP_CLIENT_ID>"` after each new table appears, or add `FOR ROLE <pipeline_role>` once you know which role the sync runs as. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is what the snippet above (242-246) already does, does it make sense to repeat it? |
||
|
|
||
| **Grant app SP for AppKit / CRUD apps** (full DML). | ||
|
|
||
| > **First check: is the Lakebase declared as an app resource?** When the Apps platform attaches a `database` resource (declared in the app's `databricks.yml` under `resources.apps.<app>.resources`) to an app on deploy, it auto-creates the SP's Postgres role with `CAN_CONNECT_AND_CREATE`. If the SP is failing to connect with `password authentication failed for user '<SP_CLIENT_ID>'`, the most likely cause is a missing `database` resource — fix that first, redeploy, and the auto-grant fires. See the `databricks-apps` skill (Scaffolding) for verifying every required plugin resource is declared. | ||
| > | ||
| > The SQL block below is the **fallback** for cases the resource form doesn't cover: granting access to an existing Lakebase the app spec doesn't own (shared across apps, pre-existing schema with custom permissions, post-hoc grants for additional tables/sequences). | ||
|
|
||
| Manual fallback — create the role and grant DML, in one psql round-trip: | ||
| ```sql | ||
| CREATE EXTENSION IF NOT EXISTS databricks_auth; | ||
|
|
||
| DO $$ | ||
| DECLARE | ||
| sp TEXT := '<SP_CLIENT_ID>'; -- from `databricks apps get <APP> -o json | jq -r .service_principal_client_id` | ||
| BEGIN | ||
| PERFORM databricks_create_role(sp, 'SERVICE_PRINCIPAL'); | ||
| EXECUTE format('GRANT CONNECT ON DATABASE "databricks_postgres" TO %I', sp); | ||
| EXECUTE format('GRANT ALL ON SCHEMA public TO %I', sp); | ||
| EXECUTE format('GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO %I', sp); | ||
| EXECUTE format('GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO %I', sp); | ||
| EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON TABLES TO %I', sp); | ||
| EXECUTE format('ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT ALL ON SEQUENCES TO %I', sp); | ||
| END $$; | ||
| ``` | ||
| Pipe through `databricks psql` (above). The block is idempotent; re-running is safe. | ||
|
|
||
| The role-creation step alone has a CLI form too (useful when granting privileges separately): | ||
| ```bash | ||
| databricks postgres create-role projects/<PROJECT_ID>/branches/<BRANCH_ID> \ | ||
| --role-id <SP_CLIENT_ID> \ | ||
| --json '{"spec":{"identity_type":"SERVICE_PRINCIPAL","postgres_role":"<SP_CLIENT_ID>","auth_method":"LAKEBASE_OAUTH_V1"}}' \ | ||
| --profile <PROFILE> | ||
| ``` | ||
|
|
||
| > **Least privilege.** The example creates the role with default privileges only — grant database/schema/table access via the explicit `GRANT` statements above. Don't add `membership_roles: ["DATABRICKS_SUPERUSER"]` for an app SP unless broad administrative access is intentional; superuser membership lets the app role read every Lakebase database, not just its own. | ||
|
|
||
| > **CLI body shape.** `databricks postgres create-role`'s `--json` flag binds to the inner `Role` object — fields go directly under `spec`, **not** wrapped in `{"role": ...}`. The error `Field 'role' is required and must contain at least one subfield with a non-default value` means the inner Role had no recognized fields (often because someone wrapped the body, which the CLI strips with `Warning: unknown field: role` and ships an empty body). The CLI also doesn't yet expose convenience flags like `--spec.identity-type` ([cmd/workspace/postgres/postgres.go](https://github.com/databricks/cli/blob/main/cmd/workspace/postgres/postgres.go) marks `spec` as TODO), so you must hand-craft the JSON. | ||
|
Comment on lines
+250
to
+285
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should do that: But let's not try to workaround the whole App resource mechanism. App that uses Lakebase must define the project as an app resource - this is a strict prerequisite here. |
||
|
|
||
| Get SP client ID: `databricks apps get <APP_NAME> --profile <PROFILE>` → `service_principal_client_id` field. | ||
|
|
||
|
|
@@ -257,6 +296,8 @@ Get SP client ID: `databricks apps get <APP_NAME> --profile <PROFILE>` → `serv | |
| | `cannot configure default credentials` | Use `--profile` flag or authenticate first | | ||
| | `PERMISSION_DENIED` | Check workspace permissions | | ||
| | `permission denied for schema` | Schema owned by another role. Deploy app first so SP creates/owns it | | ||
| | `password authentication failed for user '<UUID>'` (deployed app) | SP has no Postgres role on the branch yet. Run the **Grant app SP for AppKit / CRUD apps** SQL block above, then restart the app | | ||
| | `Field 'role' is required` from `databricks postgres create-role` | `--json` binds to the inner `Role`. Pass fields directly under `spec` (no `{"role": ...}` wrapper). See the CLI body-shape note in **Grant app SP for AppKit / CRUD apps** | | ||
| | Protected branch won't delete | `update-branch` to set `spec.is_protected` to `false` first | | ||
| | Long-running operation timeout | Use `--no-wait` and poll with `get-operation` | | ||
| | Token expired during long query | Tokens expire after 1 hour; implement refresh (see [connectivity.md](references/connectivity.md)) | | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -140,6 +140,42 @@ For production apps, combine with Pattern 2's token refresh loop and SQLAlchemy | |
| - **Handle scale-to-zero reconnection** — first connection after idle may take ~100ms; implement retry | ||
| - **psycopg2 or psycopg3** — both work; psycopg3 recommended for new development (better async, pooling) | ||
|
|
||
| ## DNS Resolution (macOS) | ||
|
|
||
| Python's `socket.getaddrinfo()` can fail with long Lakebase hostnames on macOS. Workaround: resolve via `dig`, then pass the IP through `hostaddr` while keeping `host` for TLS SNI. | ||
|
|
||
| ```bash | ||
| # Resolve the Lakebase hostname to an IP | ||
| dig +short <ENDPOINT_HOST> | ||
| ``` | ||
|
|
||
| ```python | ||
| import subprocess | ||
|
|
||
| def resolve_host(hostname: str) -> str: | ||
| try: | ||
| result = subprocess.run( | ||
| ["dig", "+short", hostname], capture_output=True, text=True, check=False | ||
| ) | ||
| except FileNotFoundError as e: | ||
| raise RuntimeError("'dig' is not installed; install it (e.g. `apt-get install dnsutils`) or use socket.getaddrinfo() instead") from e | ||
| lines = result.stdout.strip().splitlines() | ||
| if not lines: | ||
| raise RuntimeError(f"DNS resolution failed for {hostname}") | ||
| return lines[0] | ||
|
|
||
| ip = resolve_host(endpoint_host) | ||
|
|
||
| conn = psycopg.connect( | ||
| host=endpoint_host, # kept for TLS SNI verification | ||
| hostaddr=ip, # bypasses getaddrinfo() | ||
| dbname="databricks_postgres", | ||
| user=username, | ||
| password=token, | ||
| sslmode="require", | ||
| ) | ||
| ``` | ||
|
|
||
|
Comment on lines
+143
to
+178
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was a part of my previous PR but looks like it is not needed after all - could you please cherry pick your commits on top of main to ensure only your changes are added here? Thanks! |
||
| ## Data API | ||
|
|
||
| PostgREST-compatible HTTP API for CRUD operations on Postgres tables. **Autoscaling only.** | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above:
databaseisn't the right resource name