From 7e95540934e61a48f78bb93f3efbbe53dfb34e79 Mon Sep 17 00:00:00 2001 From: Atila Fassina Date: Thu, 7 May 2026 19:48:41 +0200 Subject: [PATCH 1/2] docs(AppKit): document Files plugin policy API Update the Files skill reference for AppKit v0.21.0+ per-volume policy enforcement (PR databricks/appkit#197): - Add Permission Model section explaining the three layers (UC grants, execution identity, file policies) - Add Access Policies section with built-ins, combinators, custom policies, and policy-input types - Fix HTTP Routes intro: routes always run as service principal, user identity comes from x-forwarded-user and is passed to the policy (not asUser(req)) - Reframe Server-Side API: asUser(req) switches policy identity, not UC credentials - Refresh troubleshooting with policy-related errors and the default-publicRead() write-denial gotcha Co-authored-by: Isaac --- .../references/appkit/files.md | 222 +++++++++++++++--- 1 file changed, 184 insertions(+), 38 deletions(-) diff --git a/skills/databricks-apps/references/appkit/files.md b/skills/databricks-apps/references/appkit/files.md index 3432f09..3b84cc8 100644 --- a/skills/databricks-apps/references/appkit/files.md +++ b/skills/databricks-apps/references/appkit/files.md @@ -6,12 +6,12 @@ Use the `files()` plugin when your app needs to **browse, upload, download, or m ## When to Use Files vs Other Patterns -| Pattern | Use Case | Data Source | -| --- | --- | --- | -| Analytics | Read-only dashboards, charts, KPIs | Databricks SQL Warehouse | -| Lakebase | CRUD operations, persistent state, forms | PostgreSQL (Lakebase) | -| Files | File uploads, downloads, browsing, previews | Unity Catalog Volumes | -| Files + Analytics | Upload CSVs then query warehouse tables | Volumes + SQL Warehouse | +| Pattern | Use Case | Data Source | +| ----------------- | ------------------------------------------- | ------------------------ | +| Analytics | Read-only dashboards, charts, KPIs | Databricks SQL Warehouse | +| Lakebase | CRUD operations, persistent state, forms | PostgreSQL (Lakebase) | +| Files | File uploads, downloads, browsing, previews | Unity Catalog Volumes | +| Files + Analytics | Upload CSVs then query warehouse tables | Volumes + SQL Warehouse | ## Scaffolding @@ -43,29 +43,169 @@ The env var suffix (after `DATABRICKS_VOLUME_`) becomes the volume key, lowercas import { createApp, files, server } from "@databricks/appkit"; await createApp({ - plugins: [ - server(), - files(), - ], + plugins: [server(), files()], }); ``` +### Configuration + +Plugin-level options inherit to every volume; per-volume config overrides them: + +```typescript +files({ + maxUploadSize: 5_000_000_000, // 5 GB default (all volumes) + customContentTypes: { ".avro": "application/avro" }, + volumes: { + uploads: { maxUploadSize: 100_000_000 }, // 100 MB override + exports: {}, // inherits plugin-level + }, +}); +``` + +Auto-discovered volumes merge with explicit config — `volumes: {}` is only needed for overrides. + +## Permission Model + +Three layers gate file access — understand all three before deploying: + +1. **Unity Catalog grants** — the SP needs `WRITE_VOLUME`. Set at deploy time via `app.yaml` resource bindings; the plugin auto-declares the requirement. +2. **Execution identity** — HTTP routes **always** run as the service principal. The programmatic API runs as SP by default; `asUser(req)` re-wraps with the request's user identity. +3. **File policies** — per-volume function `(action, resource, user) → boolean` evaluated **before** every operation. This is the only layer that distinguishes between users on HTTP routes (since HTTP always uses SP credentials). + +> Removing a user's UC `WRITE_VOLUME` grant has **no effect on HTTP access** — the SP's grant is what's used. Policies are the only way to restrict per-user access through HTTP routes. + +## Access Policies + +Volumes without an explicit `policy` default to `files.policy.publicRead()` (reads allowed, writes denied) and log a startup warning. Set an explicit policy on every volume that needs writes. + +```typescript +import { files } from "@databricks/appkit"; + +files({ + volumes: { + public_data: { policy: files.policy.publicRead() }, // reads only + uploads: { policy: files.policy.allowAll() }, // anyone can write + archive: { policy: files.policy.denyAll() }, // locked down + }, +}); +``` + +### Built-in policies + +| Helper | Allows | +| --------------------------- | ------------------------------------------------------------------ | +| `files.policy.publicRead()` | `list`, `read`, `download`, `raw`, `exists`, `metadata`, `preview` | +| `files.policy.allowAll()` | every action | +| `files.policy.denyAll()` | no action (yes — even `list`) | + +### Combinators + +- `files.policy.all(p1, p2, ...)` — AND, short-circuits on first deny +- `files.policy.any(p1, p2, ...)` — OR, short-circuits on first allow +- `files.policy.not(p)` — invert (e.g. `not(publicRead())` = write-only drop-box) + +### Custom policies + +A `FilePolicy` is `(action, resource, user) => boolean | Promise`. `READ_ACTIONS` and `WRITE_ACTIONS` are exported `Set` for action-class checks. + +```typescript +import { type FilePolicy, WRITE_ACTIONS } from "@databricks/appkit"; + +const ADMIN_IDS = ["admin@company.com"]; + +const adminWrite: FilePolicy = (action, _resource, user) => { + if (WRITE_ACTIONS.has(action)) return ADMIN_IDS.includes(user.id); + return true; // reads open to everyone +}; + +files({ + volumes: { reports: { policy: adminWrite } }, +}); +``` + +Mix custom logic with built-ins via combinators — e.g. SP can do anything, users can only read: + +```typescript +files.policy.any( + (_action, _resource, user) => !!user.isServicePrincipal, + files.policy.publicRead(), +); +``` + +### Policy inputs + +```typescript +type FileAction = + | "list" | "read" | "download" | "raw" | "exists" | "metadata" | "preview" + | "upload" | "mkdir" | "delete"; + +interface FileResource { + path: string; // relative path within the volume + volume: string; // volume key + size?: number; // content-length in bytes (uploads only) +} + +interface FilePolicyUser { + id: string; // from `x-forwarded-user` header (HTTP) or req + isServicePrincipal?: boolean; // true when programmatic API skipped asUser() +} +``` + +### Enforcement + +- **HTTP routes**: denied → `403 { error: "Policy denied \"{action}\" on volume \"{volume}\"", plugin: "files" }`. +- **Programmatic API**: denied → throws `PolicyDeniedError` (importable from `@databricks/appkit`) with `.action` and `.volumeKey` properties. Both `appkit.files("vol").list()` (SP, `isServicePrincipal: true`) and `appkit.files("vol").asUser(req).list()` (user) are gated. + ## Server-Side API (Programmatic) -Access volumes through the `files()` callable, which returns a `VolumeHandle`: +Access volumes through the `files()` callable, which returns a `VolumeHandle`. Every method runs as the service principal — `asUser(req)` only changes the user identity passed into the **policy** check (the underlying SDK call still uses SP credentials). ```typescript -// ✅ CORRECT — OBO access (recommended) -const entries = await appkit.files("uploads").asUser(req).list(); -const content = await appkit.files("exports").asUser(req).read("report.csv"); - -// ❌ WRONG — omitting .asUser(req) -const entries = await appkit.files("uploads").list(); -// In dev: silently falls back to service principal credentials, bypassing user-level UC permissions -// In production: throws an error +// User identity passed to policy → user.id from req +await appkit.files("uploads").asUser(req).list(); + +// SP identity passed to policy → user.isServicePrincipal === true; logs a warning +await appkit.files("uploads").list(); + +// Named accessor equivalent +await appkit.files.volume("uploads").asUser(req).list(); ``` -**ALWAYS use `.asUser(req)`** — without it, dev mode silently uses the app's service principal (masking permission issues that will crash in production). +**Use `.asUser(req)` in route handlers.** Without it the policy sees `isServicePrincipal: true` and a warning is logged — fine for background jobs, wrong for user-driven endpoints where per-user policy decisions matter. Policy denial throws `PolicyDeniedError`. + +**`VolumeAPI` methods**: `list`, `read`, `download`, `exists`, `metadata`, `preview`, `upload`, `createDirectory`, `delete`. `read()` caps files at 10 MB by default — pass `{ maxSize: }` or use `download()` for larger files. Paths are absolute (`/Volumes/...`) or relative to the volume root; `../` and null bytes are rejected. + +### Execution defaults + +Every operation runs through cache/retry/timeout interceptors: + +| Tier | Operations | Cache | Retry | Timeout | +| -------- | ------------------------------------- | ----- | ----- | ------- | +| Read | list, read, exists, metadata, preview | 60 s | 3× | 30 s | +| Download | download, raw | — | 3× | 30 s | +| Write | upload, mkdir, delete | — | — | 600 s | + +Cache keys include the volume key (no cross-volume collisions). Write operations auto-invalidate the parent directory's cached `list` entry. + +## HTTP Routes + +Mounted at `/api/files/*`. All routes execute as the service principal; user identity is read from the `x-forwarded-user` header and passed to the volume's policy. Policy denial → `403`. + +| Method | Path | Purpose | +| ------ | ---------------------------- | ------------------------------------------ | +| GET | `/volumes` | List configured volume keys | +| GET | `/:volumeKey/list?path=` | Directory listing | +| GET | `/:volumeKey/read?path=` | Read text content | +| GET | `/:volumeKey/download?path=` | Binary stream (attachment) | +| GET | `/:volumeKey/raw?path=` | Inline stream (attachment for unsafe MIME) | +| GET | `/:volumeKey/exists?path=` | Existence check | +| GET | `/:volumeKey/metadata?path=` | File metadata | +| GET | `/:volumeKey/preview?path=` | Preview (text + type flags) | +| POST | `/:volumeKey/upload?path=` | Upload (raw body) | +| POST | `/:volumeKey/mkdir` | Create directory (`body.path`) | +| DELETE | `/:volumeKey?path=` | Delete file | + +Path validation: non-empty, ≤ 4096 chars, no null bytes, no `../`. The `/raw` endpoint sets `X-Content-Type-Options: nosniff` and `Content-Security-Policy: sandbox`; HTML/JS/SVG MIME types are forced to attachment. ## Frontend Components @@ -184,8 +324,8 @@ export function FilesPage() { ```typescript const handleUpload = async (file: File) => { const uploadPath = currentPath ? `${currentPath}/${file.name}` : file.name; - const response = await fetch(apiUrl('upload', { path: uploadPath }), { - method: 'POST', + const response = await fetch(apiUrl("upload", { path: uploadPath }), { + method: "POST", body: file, }); if (!response.ok) { @@ -203,7 +343,7 @@ const handleUpload = async (file: File) => { const handleDelete = async (filePath: string) => { const response = await fetch( `/api/files/${volumeKey}?path=${encodeURIComponent(filePath)}`, - { method: 'DELETE' }, + { method: "DELETE" }, ); if (!response.ok) { const data = await response.json().catch(() => ({})); @@ -217,23 +357,28 @@ const handleDelete = async (filePath: string) => { ```typescript const handleCreateDirectory = async (name: string) => { const dirPath = currentPath ? `${currentPath}/${name}` : name; - const response = await fetch(apiUrl('mkdir'), { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, + const response = await fetch(apiUrl("mkdir"), { + method: "POST", + headers: { "Content-Type": "application/json" }, body: JSON.stringify({ path: dirPath }), }); if (!response.ok) { const data = await response.json().catch(() => ({})); - throw new Error(data.error ?? `Create directory failed (${response.status})`); + throw new Error( + data.error ?? `Create directory failed (${response.status})`, + ); } }; ``` ## Resource Requirements -Each volume key requires a resource with `WRITE_VOLUME` permission. Declare in `databricks.yml`: +The plugin **auto-generates** volume resource requirements from `DATABRICKS_VOLUME_*` env vars — setting them in `app.yaml` is usually all you need. Each discovered volume key becomes a required `WRITE_VOLUME` resource validated at startup. + +Declare the volume explicitly in `databricks.yml` only when you need to pin it as a managed resource, then wire the env var via `valueFrom` in `app.yaml`: ```yaml +# databricks.yml resources: apps: my_app: @@ -244,9 +389,8 @@ resources: permission: WRITE_VOLUME ``` -Wire the env var in `app.yaml`: - ```yaml +# app.yaml env: - name: DATABRICKS_VOLUME_UPLOADS valueFrom: uploads-volume @@ -254,11 +398,13 @@ env: ## Troubleshooting -| Error | Cause | Solution | -| --- | --- | --- | -| `Unknown volume key "X"` | Volume env var not set or misspelled | Check `DATABRICKS_VOLUME_X` is set in `app.yaml` or `.env` | -| 413 on upload | File exceeds `maxUploadSize` | Increase `maxUploadSize` in plugin config or per-volume config | -| `read()` rejects large file | File > 10 MB default limit | Use `download()` for large files or pass `{ maxSize: }` | -| Blocked content type on `/raw` | Dangerous MIME type (html, js, svg) | Use `/download` instead — these types are forced to attachment | -| Service principal access blocked | Called volume method without `.asUser(req)` | Always use `appkit.files("key").asUser(req).method()` | -| `path traversal` error | Path contains `../` | Use relative paths from volume root or absolute `/Volumes/...` paths | +| Error | Cause | Solution | +| ------------------------------------------ | ---------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- | +| `Unknown volume key "X"` | Volume env var not set or misspelled | Check `DATABRICKS_VOLUME_X` is set in `app.yaml` or `.env` | +| 413 on upload | File exceeds `maxUploadSize` | Increase `maxUploadSize` in plugin config or per-volume config | +| `read()` rejects large file | File > 10 MB default limit | Use `download()` for large files or pass `{ maxSize: }` | +| Blocked content type on `/raw` | Dangerous MIME type (html, js, svg) | Use `/download` instead — these types are forced to attachment | +| 403 on HTTP route | Volume's policy denied the action for the requesting user | Inspect `policy` config; user id comes from the `x-forwarded-user` header | +| Writes return 403 unexpectedly | Volume has no `policy` configured → defaults to `publicRead()` which denies writes | Set explicit `policy: files.policy.allowAll()` (or stricter) on volumes that accept writes | +| `PolicyDeniedError` from programmatic call | Volume's policy denied the action — SP identity used if `asUser(req)` was omitted | Call `.asUser(req)` for user-driven calls; gate trusted SP code with `policy.allowAll()` | +| Invalid path error | Path contains `../`, null bytes, or exceeds 4096 chars | Use relative paths from the volume root, or absolute `/Volumes/...` paths | From fc1a3fefb1c991df244cf7c044bbaf66a293316c Mon Sep 17 00:00:00 2001 From: Atila Fassina Date: Thu, 7 May 2026 20:28:42 +0200 Subject: [PATCH 2/2] docs(AppKit): trim Files skill to defer reference data to upstream docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Align with the jobs.md/lakebase.md style: keep gotchas and idioms in-skill, push encyclopedic reference to `npx @databricks/appkit docs`. - Flatten Access Policies subsections (5 H3s → 0) - Drop Policy inputs types block (duplicate of upstream `## Types`) - Drop standalone SP-bypass snippet (mention inline instead) - Inline Execution defaults table as a single sentence in Server-Side API - Drop full HTTP Routes table; keep the 403/security gotchas - Add `npx @databricks/appkit docs` pointers from each trimmed section (defer-to-docs count: 2 → 5, matching jobs.md) Net effect: 410 → 346 lines (-15%); the policy concepts and HTTP-route execution model stay, the redundant tables go. Co-authored-by: Isaac --- .../references/appkit/files.md | 86 +++---------------- 1 file changed, 11 insertions(+), 75 deletions(-) diff --git a/skills/databricks-apps/references/appkit/files.md b/skills/databricks-apps/references/appkit/files.md index 3b84cc8..3f6043a 100644 --- a/skills/databricks-apps/references/appkit/files.md +++ b/skills/databricks-apps/references/appkit/files.md @@ -90,7 +90,7 @@ files({ }); ``` -### Built-in policies +Built-in policies: | Helper | Allows | | --------------------------- | ------------------------------------------------------------------ | @@ -98,63 +98,25 @@ files({ | `files.policy.allowAll()` | every action | | `files.policy.denyAll()` | no action (yes — even `list`) | -### Combinators +Combinators: `policy.all(...)` (AND, short-circuits on deny), `policy.any(...)` (OR, short-circuits on allow), `policy.not(p)` (e.g. `not(publicRead())` = write-only drop-box). -- `files.policy.all(p1, p2, ...)` — AND, short-circuits on first deny -- `files.policy.any(p1, p2, ...)` — OR, short-circuits on first allow -- `files.policy.not(p)` — invert (e.g. `not(publicRead())` = write-only drop-box) - -### Custom policies - -A `FilePolicy` is `(action, resource, user) => boolean | Promise`. `READ_ACTIONS` and `WRITE_ACTIONS` are exported `Set` for action-class checks. +A `FilePolicy` is `(action, resource, user) => boolean | Promise`. Exported `READ_ACTIONS` / `WRITE_ACTIONS` are `Set` for action-class checks. `user.id` comes from the `x-forwarded-user` header (HTTP) or `req` (`asUser(req)`); `user.isServicePrincipal === true` when the programmatic API skipped `asUser()`. Full `FileAction` / `FileResource` / `FilePolicyUser` shape: `npx @databricks/appkit docs Files plugin`. ```typescript import { type FilePolicy, WRITE_ACTIONS } from "@databricks/appkit"; const ADMIN_IDS = ["admin@company.com"]; +// Writes admin-only; reads open. Wrap with `policy.any(spBypass, adminWrite)` for SP bypass. const adminWrite: FilePolicy = (action, _resource, user) => { if (WRITE_ACTIONS.has(action)) return ADMIN_IDS.includes(user.id); - return true; // reads open to everyone + return true; }; -files({ - volumes: { reports: { policy: adminWrite } }, -}); +files({ volumes: { reports: { policy: adminWrite } } }); ``` -Mix custom logic with built-ins via combinators — e.g. SP can do anything, users can only read: - -```typescript -files.policy.any( - (_action, _resource, user) => !!user.isServicePrincipal, - files.policy.publicRead(), -); -``` - -### Policy inputs - -```typescript -type FileAction = - | "list" | "read" | "download" | "raw" | "exists" | "metadata" | "preview" - | "upload" | "mkdir" | "delete"; - -interface FileResource { - path: string; // relative path within the volume - volume: string; // volume key - size?: number; // content-length in bytes (uploads only) -} - -interface FilePolicyUser { - id: string; // from `x-forwarded-user` header (HTTP) or req - isServicePrincipal?: boolean; // true when programmatic API skipped asUser() -} -``` - -### Enforcement - -- **HTTP routes**: denied → `403 { error: "Policy denied \"{action}\" on volume \"{volume}\"", plugin: "files" }`. -- **Programmatic API**: denied → throws `PolicyDeniedError` (importable from `@databricks/appkit`) with `.action` and `.volumeKey` properties. Both `appkit.files("vol").list()` (SP, `isServicePrincipal: true`) and `appkit.files("vol").asUser(req).list()` (user) are gated. +**Enforcement**: HTTP denial → `403 { error: "Policy denied \"{action}\" on volume \"{volume}\"", plugin: "files" }`. Programmatic denial → throws `PolicyDeniedError` (exported from `@databricks/appkit`, has `.action` / `.volumeKey`). Both SP and `asUser(req)` calls are gated. ## Server-Side API (Programmatic) @@ -173,39 +135,13 @@ await appkit.files.volume("uploads").asUser(req).list(); **Use `.asUser(req)` in route handlers.** Without it the policy sees `isServicePrincipal: true` and a warning is logged — fine for background jobs, wrong for user-driven endpoints where per-user policy decisions matter. Policy denial throws `PolicyDeniedError`. -**`VolumeAPI` methods**: `list`, `read`, `download`, `exists`, `metadata`, `preview`, `upload`, `createDirectory`, `delete`. `read()` caps files at 10 MB by default — pass `{ maxSize: }` or use `download()` for larger files. Paths are absolute (`/Volumes/...`) or relative to the volume root; `../` and null bytes are rejected. - -### Execution defaults - -Every operation runs through cache/retry/timeout interceptors: - -| Tier | Operations | Cache | Retry | Timeout | -| -------- | ------------------------------------- | ----- | ----- | ------- | -| Read | list, read, exists, metadata, preview | 60 s | 3× | 30 s | -| Download | download, raw | — | 3× | 30 s | -| Write | upload, mkdir, delete | — | — | 600 s | - -Cache keys include the volume key (no cross-volume collisions). Write operations auto-invalidate the parent directory's cached `list` entry. +**`VolumeAPI` methods**: `list`, `read`, `download`, `exists`, `metadata`, `preview`, `upload`, `createDirectory`, `delete`. `read()` caps files at 10 MB by default — pass `{ maxSize: }` or use `download()` for larger files. Paths are absolute (`/Volumes/...`) or relative to the volume root; `../` and null bytes are rejected. Reads cache 60 s with 3 retries; writes have a 600 s timeout, no retry, no cache; cache keys include the volume key, and writes auto-invalidate the parent directory's `list` cache. Full method signatures: `npx @databricks/appkit docs Files plugin`. ## HTTP Routes -Mounted at `/api/files/*`. All routes execute as the service principal; user identity is read from the `x-forwarded-user` header and passed to the volume's policy. Policy denial → `403`. - -| Method | Path | Purpose | -| ------ | ---------------------------- | ------------------------------------------ | -| GET | `/volumes` | List configured volume keys | -| GET | `/:volumeKey/list?path=` | Directory listing | -| GET | `/:volumeKey/read?path=` | Read text content | -| GET | `/:volumeKey/download?path=` | Binary stream (attachment) | -| GET | `/:volumeKey/raw?path=` | Inline stream (attachment for unsafe MIME) | -| GET | `/:volumeKey/exists?path=` | Existence check | -| GET | `/:volumeKey/metadata?path=` | File metadata | -| GET | `/:volumeKey/preview?path=` | Preview (text + type flags) | -| POST | `/:volumeKey/upload?path=` | Upload (raw body) | -| POST | `/:volumeKey/mkdir` | Create directory (`body.path`) | -| DELETE | `/:volumeKey?path=` | Delete file | - -Path validation: non-empty, ≤ 4096 chars, no null bytes, no `../`. The `/raw` endpoint sets `X-Content-Type-Options: nosniff` and `Content-Security-Policy: sandbox`; HTML/JS/SVG MIME types are forced to attachment. +Mounted at `/api/files/*`. All routes execute as the service principal; user identity is read from the `x-forwarded-user` header and passed to the volume's policy (denial → `403`). Reads are GET (`list`, `read`, `download`, `raw`, `exists`, `metadata`, `preview`); writes are POST (`upload`, `mkdir`) and DELETE — full route shape, request bodies, and response types: `npx @databricks/appkit docs Files plugin`. + +The `/raw` endpoint sets `X-Content-Type-Options: nosniff` and `Content-Security-Policy: sandbox`; HTML/JS/SVG MIME types are forced to attachment. ## Frontend Components