diff --git a/.claude/skills/agent-openai-memory/SKILL.md b/.claude/skills/agent-openai-memory/SKILL.md index 12f5d0b1..0258fede 100644 --- a/.claude/skills/agent-openai-memory/SKILL.md +++ b/.claude/skills/agent-openai-memory/SKILL.md @@ -1,32 +1,24 @@ --- name: agent-openai-memory -description: "Add memory capabilities to your agent. Use when: (1) User asks about 'memory', 'state', 'remember', 'conversation history', (2) Want to persist conversations or user preferences, (3) Adding checkpointing or long-term storage." +description: "Add session-based memory to OpenAI Agents SDK agent using AsyncDatabricksSession and Lakebase. Use when: (1) User asks about 'memory', 'state', 'remember', 'conversation history', (2) Want to persist conversations or user preferences, (3) Adding session-based checkpointing." --- # Stateful Memory with OpenAI Agents SDK Sessions -This template uses OpenAI Agents SDK [Sessions](https://openai.github.io/openai-agents-python/sessions/) with `AsyncDatabricksSession` to persist conversation history to a Databricks Lakebase instance. +Uses `AsyncDatabricksSession` to persist conversation history to Lakebase, enabling multi-turn interactions where the agent remembers prior messages within a session. -## How Sessions Work - -Sessions automatically manage conversation history for multi-turn interactions: - -1. **Before each run**: The session retrieves prior conversation history and prepends it to input -2. **During the run**: New items (user messages, responses, tool calls) are generated -3. **After each run**: All new items are automatically stored in the session - -This eliminates the need to manually manage conversation state between runs. +## Prerequisites -## Key Concepts +1. **Dependency**: `databricks-openai[memory]` in `pyproject.toml` (already included in memory templates) +2. **Lakebase instance**: See **lakebase-setup** skill for creating and configuring one +3. **Environment variable**: Set `LAKEBASE_INSTANCE_NAME` in `.env`: + ```bash + LAKEBASE_INSTANCE_NAME= + ``` -| Concept | Description | -|---------|-------------| -| **Session** | Stores conversation history for a specific `session_id` | -| **`session_id`** | Unique identifier linking requests to the same conversation | -| **`AsyncDatabricksSession`** | Session implementation backed by Databricks Lakebase | -| **`LAKEBASE_INSTANCE_NAME`** | Environment variable specifying the Lakebase instance | +--- -## How This Template Uses Sessions +## Implementation ### Session Creation (`agent_server/agent.py`) @@ -43,8 +35,6 @@ result = await Runner.run(agent, messages, session=session) ### Session ID Extraction (`agent_server/agent.py`) -The `session_id` is extracted from `custom_inputs` or auto-generated: - ```python def get_session_id(request: ResponsesAgentRequest) -> str: if hasattr(request, "custom_inputs") and request.custom_inputs: @@ -55,8 +45,6 @@ def get_session_id(request: ResponsesAgentRequest) -> str: ### Lakebase Instance Resolution (`agent_server/utils.py`) -The `LAKEBASE_INSTANCE_NAME` env var can be either an instance name or a hostname. The `resolve_lakebase_instance_name()` function handles both cases: - ```python _LAKEBASE_INSTANCE_NAME_RAW = os.environ.get("LAKEBASE_INSTANCE_NAME") LAKEBASE_INSTANCE_NAME = resolve_lakebase_instance_name(_LAKEBASE_INSTANCE_NAME_RAW) @@ -64,78 +52,50 @@ LAKEBASE_INSTANCE_NAME = resolve_lakebase_instance_name(_LAKEBASE_INSTANCE_NAME_ --- -## Prerequisites - -1. **Dependency**: `databricks-openai[memory]` must be in `pyproject.toml` (already included) - -2. **Lakebase instance**: You need a Databricks Lakebase instance. See the **lakebase-setup** skill for creating and configuring one. - -3. **Environment variable**: Set `LAKEBASE_INSTANCE_NAME` in your `.env` file: - ```bash - LAKEBASE_INSTANCE_NAME= - ``` - ---- - -## Configuration Files +## Configuration ### databricks.yml (Lakebase Resource) -Add the Lakebase database resource to your app: - ```yaml resources: - apps: - agent_openai_advanced: - name: "your-app-name" - source_code_path: ./ - - resources: - # ... other resources (experiment, etc.) ... - - # Lakebase instance for session storage - - name: 'database' - database: - instance_name: '' - database_name: 'databricks_postgres' - permission: 'CAN_CONNECT_AND_CREATE' + - name: 'database' + database: + instance_name: '' + database_name: 'databricks_postgres' + permission: 'CAN_CONNECT_AND_CREATE' ``` -### databricks.yml config block (Environment Variables) - -The `LAKEBASE_INSTANCE_NAME` env var is resolved from the database resource at deploy time. Add to your app's `config.env` in `databricks.yml`: - ```yaml - config: - env: - - name: LAKEBASE_INSTANCE_NAME - value_from: "database" +config: + env: + - name: LAKEBASE_INSTANCE_NAME + value_from: "database" ``` -### .env (Local Development) +--- -```bash -LAKEBASE_INSTANCE_NAME= -``` +## Testing ---- +### Verify Lakebase Connectivity -## Testing Sessions +```bash +databricks lakebase instances get --profile +``` -### Test Multi-Turn Conversation Locally +### Test Multi-Turn Conversation ```bash # Start the server uv run start-app -# First message - starts a new session +# First message -- starts a new session curl -X POST http://localhost:8000/invocations \ -H "Content-Type: application/json" \ -d '{"input": [{"role": "user", "content": "Hello, I live in SF!"}]}' # Note the session_id from custom_outputs in the response -# Second message - continues the same session +# Second message -- continues the same session (should remember SF) curl -X POST http://localhost:8000/invocations \ -H "Content-Type: application/json" \ -d '{ @@ -144,28 +104,19 @@ curl -X POST http://localhost:8000/invocations \ }' ``` -### Test Streaming - -```bash -curl -X POST http://localhost:8000/invocations \ - -H "Content-Type: application/json" \ - -d '{ - "input": [{"role": "user", "content": "Hello!"}], - "stream": true - }' -``` +If the agent responds with "SF" or "San Francisco", session memory is working. --- ## Troubleshooting -| Issue | Cause | Solution | -|-------|-------|----------| -| **"LAKEBASE_INSTANCE_NAME environment variable is required"** | Missing env var | Set `LAKEBASE_INSTANCE_NAME` in `.env` | -| **SSL connection closed unexpectedly** | Network/instance issue | Verify Lakebase instance is running: `databricks lakebase instances get ` | -| **Agent doesn't remember previous messages** | Different session_id | Pass the same `session_id` via `custom_inputs` across requests | -| **"Unable to resolve hostname"** | Hostname doesn't match any instance | Verify the hostname or use the instance name directly | -| **Permission denied** | Missing Lakebase access | Add `database` resource to `databricks.yml` with `CAN_CONNECT_AND_CREATE` | +| Issue | Solution | +|-------|----------| +| "LAKEBASE_INSTANCE_NAME environment variable is required" | Set `LAKEBASE_INSTANCE_NAME` in `.env` | +| SSL connection closed unexpectedly | Verify instance is running: `databricks lakebase instances get ` | +| Agent doesn't remember previous messages | Pass same `session_id` via `custom_inputs` across requests | +| "Unable to resolve hostname" | Use instance name directly instead of hostname | +| Permission denied | Add `database` resource to `databricks.yml` with `CAN_CONNECT_AND_CREATE` | --- diff --git a/.claude/skills/lakebase-setup/SKILL.md b/.claude/skills/lakebase-setup/SKILL.md index 0458be93..42105087 100644 --- a/.claude/skills/lakebase-setup/SKILL.md +++ b/.claude/skills/lakebase-setup/SKILL.md @@ -1,195 +1,110 @@ --- name: lakebase-setup -description: "Configure Lakebase for agent memory storage. Use when: (1) Adding memory capabilities to the agent, (2) 'Failed to connect to Lakebase' errors, (3) Permission errors on checkpoint/store tables, (4) User says 'lakebase', 'memory setup', or 'add memory'." +description: "Configure Lakebase (provisioned or autoscaling) for agent memory and chat UI persistence. Use when: (1) Adding memory capabilities, (2) 'Failed to connect to Lakebase' errors, (3) Permission errors on checkpoint/store tables, (4) User says 'lakebase', 'memory setup', or 'add memory'." --- # Lakebase Setup for Agent Persistence -> **Profile reminder:** All `databricks` CLI commands must include the profile from `.env`: `databricks --profile ` or `DATABRICKS_CONFIG_PROFILE= databricks ` +> **Profile reminder:** All `databricks` CLI commands require `--profile ` from `.env`. -> **Two types of Lakebase:** Databricks supports **provisioned** instances (with instance name) and **autoscaling** instances (project/branch model). This skill covers both. Make sure you know which Lakebase instance the user is using, ask the user which type they are using if unclear. +> **Two types of Lakebase:** Databricks supports **provisioned** instances (with instance name) and **autoscaling** instances (project/branch model). Ask the user which type they are using if unclear. -## Use Cases - -Lakebase is used for three distinct purposes across the agent templates: - -| Use case | Templates | Description | -|----------|-----------|-------------| -| **Chat UI conversation history** | All templates | The built-in chat UI (`e2e-chatbot-app-next`) can persist conversations across page refreshes and browser sessions. This is purely UI-side persistence — the agent itself is stateless. | -| **Agent short-term memory** | `agent-langgraph-advanced`, `agent-openai-advanced` | Conversation threads within a session via `AsyncCheckpointSaver` (LangGraph) or `AsyncDatabricksSession` (OpenAI SDK). The agent remembers what was said earlier in the same conversation. | -| **Agent long-term memory** | `agent-langgraph-advanced` | User facts across sessions via `AsyncDatabricksStore`. The agent remembers things about a user from previous conversations. | - -> **Note:** When the quickstart prompts for Lakebase on a non-memory template, it's for **chat UI history** only — not for the agent. Memory templates always require Lakebase. - -## Overview - -Lakebase provides persistent PostgreSQL storage for agents: -- **Short-term memory** (LangGraph): Conversation history within a thread (`AsyncCheckpointSaver`) -- **Long-term memory** (LangGraph): User facts across sessions (`AsyncDatabricksStore`) -- **Short-term memory** (OpenAI SDK): Conversation history via `AsyncDatabricksSession` -- **Long-running agent persistence** (OpenAI SDK): Background task state via custom SQLAlchemy tables (`agent_server` schema) - -> **Note:** For pre-configured memory templates, see: -> - `agent-langgraph-advanced` - Short-term memory, long-term memory, and long-running background tasks (LangGraph) -> - `agent-openai-advanced` - Short-term memory and long-running background tasks (OpenAI SDK) - -## Complete Setup Workflow +## Setup Workflow ``` -┌───────────────────────────────────────────────────────────────────────────┐ -│ 1. Add dependency → 2. Get instance → 3. Configure DAB │ -│ 4. Configure .env → 5. Deploy → 6. Grant SP permissions → 7. Run │ -└───────────────────────────────────────────────────────────────────────────┘ +1. Add dependency → 2. Get instance → 3. Configure databricks.yml +4. Configure .env → 5. Deploy → 6. Grant SP permissions → 7. Run ``` -> **Shortcut:** If using a pre-configured memory template, `uv run quickstart` with Lakebase flags handles steps 2-4 automatically. You still need to do steps 5-7 manually. +> **Shortcut:** Pre-configured memory templates + `uv run quickstart` with Lakebase flags handles steps 2-4 automatically. Steps 5-7 are still manual. --- ## Step 1: Add Memory Dependency -Add the memory extra to your `pyproject.toml`: - ```toml +# pyproject.toml dependencies = [ "databricks-langchain[memory]", - # ... other dependencies ] ``` -Then sync dependencies: -```bash -uv sync -``` +Then: `uv sync` --- ## Step 2: Create or Get Lakebase Instance -### Option A: Provisioned Instance - -1. Go to your Databricks workspace -2. Navigate to **Compute** → **Lakebase** -3. Click **Create Instance** (or use an existing one) -4. Note the **instance name** - -### Option B: Autoscaling Instance +**Provisioned:** Navigate to **Compute > Lakebase** in your workspace. Note the **instance name**. -Autoscaling uses a **project/branch** model. You need three values: -- **Project name** (e.g., `my-project`) -- **Branch name** (e.g., `my-branch`) -- **Database ID** (e.g., `db-xxxx-xxxxxxxxxx`) - -Find these via the postgres API: +**Autoscaling:** Uses project/branch model. You need project name, branch name, and database ID. ```bash -# List projects databricks api get /api/2.0/postgres/projects --profile - -# List branches for a project -databricks api get /api/2.0/postgres/projects//branches --profile - -# List databases for a branch -databricks api get /api/2.0/postgres/projects//branches//databases --profile +databricks api get /api/2.0/postgres/projects//branches --profile +databricks api get /api/2.0/postgres/projects//branches//databases --profile ``` -**Important:** The database ID is the internal ID (e.g., `db-xxxx-xxxxxxxxxx`), NOT `databricks_postgres`. +**Important:** Database ID is the internal ID (e.g., `db-xxxx-xxxxxxxxxx`), NOT `databricks_postgres`. --- -## Step 3: Configure databricks.yml (Lakebase Resource) - -> **Note:** If you ran `uv run quickstart` with Lakebase flags (`--lakebase-provisioned-name` or `--lakebase-autoscaling-project`/`--lakebase-autoscaling-branch`), the quickstart already configured `databricks.yml` for you — including fetching the database ID for autoscaling. Manual configuration is only needed if you didn't use quickstart or need to change values. +## Step 3: Configure databricks.yml -### Option A: Provisioned +> If `uv run quickstart` was used with Lakebase flags, this is already configured. -Add the `database` resource to your app in `databricks.yml`: +**Provisioned** -- add `database` resource and env var: ```yaml resources: - apps: - your_app: - name: "your-app-name" - source_code_path: ./ - resources: - # ... other resources (experiment, UC functions, etc.) ... - - # Lakebase instance for long-term memory - - name: 'database' - database: - instance_name: '' - database_name: 'databricks_postgres' - permission: 'CAN_CONNECT_AND_CREATE' + - name: 'database' + database: + instance_name: '' + database_name: 'databricks_postgres' + permission: 'CAN_CONNECT_AND_CREATE' ``` -**Important:** -- The `instance_name: ''` must match the actual Lakebase instance name -- Using the `database` resource type automatically grants the app's service principal access to Lakebase -See `.claude/skills/add-tools/examples/lakebase.yaml` for the YAML snippet. - -### Option B: Autoscaling - -Add the `postgres` resource to your app in `databricks.yml`: - ```yaml -resources: - apps: - your_app: - name: "your-app-name" - source_code_path: ./ - resources: - # ... other resources (experiment, UC functions, etc.) ... - - # Autoscaling Lakebase instance for long-term memory - - name: 'postgres' - postgres: - branch: "projects//branches/" - database: "projects//branches//databases/" - permission: 'CAN_CONNECT_AND_CREATE' +config: + env: + - name: LAKEBASE_INSTANCE_NAME + value_from: "database" + - name: EMBEDDING_ENDPOINT + value: "databricks-gte-large-en" + - name: EMBEDDING_DIMS + value: "1024" ``` -**Important:** The `branch` and `database` fields use full resource path format. +**Autoscaling** -- add `postgres` resource and env vars: -See `.claude/skills/add-tools/examples/lakebase-autoscaling.yaml` for the YAML snippet. - -### Add Environment Variables to databricks.yml config block - -**Provisioned:** ```yaml - config: - env: - # Lakebase instance name - resolved from database resource at deploy time - - name: LAKEBASE_INSTANCE_NAME - value_from: "database" - # Static values for embedding configuration - - name: EMBEDDING_ENDPOINT - value: "databricks-gte-large-en" - - name: EMBEDDING_DIMS - value: "1024" +resources: + - name: 'postgres' + postgres: + branch: "projects//branches/" + database: "projects//branches//databases/" + permission: 'CAN_CONNECT_AND_CREATE' ``` -**Autoscaling:** ```yaml - config: - env: - # Autoscaling Lakebase config - - name: LAKEBASE_AUTOSCALING_PROJECT - value: "" - - name: LAKEBASE_AUTOSCALING_BRANCH - value: "" - # Static values for embedding configuration - - name: EMBEDDING_ENDPOINT - value: "databricks-gte-large-en" - - name: EMBEDDING_DIMS - value: "1024" +config: + env: + - name: LAKEBASE_AUTOSCALING_PROJECT + value: "" + - name: LAKEBASE_AUTOSCALING_BRANCH + value: "" + - name: EMBEDDING_ENDPOINT + value: "databricks-gte-large-en" + - name: EMBEDDING_DIMS + value: "1024" ``` +See `.claude/skills/add-tools/examples/lakebase.yaml` and `lakebase-autoscaling.yaml` for full YAML snippets. + --- ## Step 4: Configure .env (Local Development) -For local development, add to `.env`: - **Provisioned:** ```bash LAKEBASE_INSTANCE_NAME= @@ -205,22 +120,10 @@ EMBEDDING_ENDPOINT=databricks-gte-large-en EMBEDDING_DIMS=1024 ``` -**Important:** `embedding_dims` must match the embedding endpoint: - -| Endpoint | Dimensions | -|----------|------------| -| `databricks-gte-large-en` | 1024 | -| `databricks-bge-large-en` | 1024 | - -> **Note:** `.env` is only for local development. When deployed, the app gets values from `databricks.yml` config env. - --- -## Step 5: Initialize Tables ## Step 5: Deploy -Deploy the app so the service principal and resources are created: - ```bash DATABRICKS_CONFIG_PROFILE= databricks bundle deploy ``` @@ -229,23 +132,20 @@ DATABRICKS_CONFIG_PROFILE= databricks bundle deploy ## Step 6: Grant SP Permissions (CRITICAL) -> **WARNING:** You MUST complete this step before running the app. Without it, the app will fail with database migration errors like `CREATE TABLE IF NOT EXISTS "drizzle"."__drizzle_migrations"` — permission denied. - -After deploying, the app's service principal needs Postgres roles to access Lakebase tables. The DAB resource grants basic connectivity, but you must also grant Postgres-level schema and table permissions. +> **WARNING:** Without this step, the app fails with `CREATE TABLE IF NOT EXISTS "drizzle"."__drizzle_migrations"` -- permission denied. -**Step 1:** Get the app's service principal client ID: +**Get the app's service principal client ID:** ```bash DATABRICKS_CONFIG_PROFILE= databricks apps get --output json | jq -r '.service_principal_client_id' ``` -**Step 2:** Grant permissions using the grant script: - +**Grant permissions:** ```bash # Provisioned: DATABRICKS_CONFIG_PROFILE= uv run python scripts/grant_lakebase_permissions.py \ --memory-type --instance-name -# Autoscaling (endpoint — reads LAKEBASE_AUTOSCALING_ENDPOINT from .env by default): +# Autoscaling (endpoint): DATABRICKS_CONFIG_PROFILE= uv run python scripts/grant_lakebase_permissions.py \ --memory-type --autoscaling-endpoint @@ -256,134 +156,20 @@ DATABRICKS_CONFIG_PROFILE= uv run python scripts/grant_lakebase_permiss **Memory type by template:** -| Template | `--memory-type` value | -|----------|-----------------------| +| Template | `--memory-type` | +|----------|-----------------| | `agent-langgraph-advanced` | `langgraph` | | `agent-openai-advanced` | `openai` | -The script handles fresh branches gracefully (warns but doesn't fail if tables don't exist yet — they'll be created on first app startup). - --- -## Step 7: Run Your App +## Step 7: Run ```bash DATABRICKS_CONFIG_PROFILE= databricks bundle run {{BUNDLE_NAME}} ``` -> **Note:** `bundle deploy` only uploads files and configures resources. `bundle run` is required to actually start the app with the new code. - ---- - -## Complete Examples: databricks.yml with Lakebase - -### Provisioned Lakebase - -```yaml -bundle: - name: agent_langgraph - -resources: - apps: - agent_langgraph: - name: "my-agent-app" - description: "Agent with long-term memory" - source_code_path: ./ - config: - command: ["uv", "run", "start-app"] - env: - - name: MLFLOW_TRACKING_URI - value: "databricks" - - name: MLFLOW_REGISTRY_URI - value: "databricks-uc" - - name: API_PROXY - value: "http://localhost:8000/invocations" - - name: CHAT_APP_PORT - value: "3000" - - name: CHAT_PROXY_TIMEOUT_SECONDS - value: "300" - - name: MLFLOW_EXPERIMENT_ID - value_from: "experiment" - # Lakebase instance name (resolved from database resource) - - name: LAKEBASE_INSTANCE_NAME - value_from: "database" - # Static values for embedding configuration - - name: EMBEDDING_ENDPOINT - value: "databricks-gte-large-en" - - name: EMBEDDING_DIMS - value: "1024" - - resources: - - name: 'experiment' - experiment: - experiment_id: "" - permission: 'CAN_MANAGE' - - name: 'database' - database: - instance_name: '' - database_name: 'databricks_postgres' - permission: 'CAN_CONNECT_AND_CREATE' - -targets: - dev: - mode: development - default: true -``` - -### Autoscaling Lakebase - -```yaml -bundle: - name: agent_langgraph - -resources: - apps: - agent_langgraph: - name: "my-agent-app" - description: "Agent with long-term memory" - source_code_path: ./ - config: - command: ["uv", "run", "start-app"] - env: - - name: MLFLOW_TRACKING_URI - value: "databricks" - - name: MLFLOW_REGISTRY_URI - value: "databricks-uc" - - name: API_PROXY - value: "http://localhost:8000/invocations" - - name: CHAT_APP_PORT - value: "3000" - - name: CHAT_PROXY_TIMEOUT_SECONDS - value: "300" - - name: MLFLOW_EXPERIMENT_ID - value_from: "experiment" - # Autoscaling Lakebase config - - name: LAKEBASE_AUTOSCALING_PROJECT - value: "" - - name: LAKEBASE_AUTOSCALING_BRANCH - value: "" - # Static values for embedding configuration - - name: EMBEDDING_ENDPOINT - value: "databricks-gte-large-en" - - name: EMBEDDING_DIMS - value: "1024" - - resources: - - name: 'experiment' - experiment: - experiment_id: "" - permission: 'CAN_MANAGE' - - name: 'postgres' - postgres: - branch: "projects//branches/" - database: "projects//branches//databases/" - permission: 'CAN_CONNECT_AND_CREATE' - -targets: - dev: - mode: development - default: true -``` +> `bundle deploy` uploads files. `bundle run` starts the app. --- @@ -391,72 +177,15 @@ targets: | Issue | Cause | Solution | |-------|-------|----------| -| **"embedding_dims is required when embedding_endpoint is specified"** | Missing `embedding_dims` parameter | Add `embedding_dims=1024` to AsyncDatabricksStore | -| **"relation 'store' does not exist"** | Tables not initialized | The app creates tables on first use; ensure SP has CREATE permission | -| **"Unable to resolve Lakebase instance 'None'"** | Missing env var in deployed app | Add `LAKEBASE_INSTANCE_NAME` to databricks.yml `config.env` | -| **"permission denied for table store"** | Missing grants | Run `uv run python scripts/grant_lakebase_permissions.py ` to grant permissions | -| **"Failed to connect to Lakebase"** | Wrong instance name or project/branch | Verify values in databricks.yml and .env | -| **Connection pool errors on exit** | Python cleanup race | Ignore `PythonFinalizationError` - it's harmless | -| **App not updated after deploy** | Forgot to run bundle | Run `databricks bundle run ` after deploy | -| **value_from not resolving** | Resource name mismatch | Ensure `value_from` value matches `name` in databricks.yml resources | -| **"Invalid postgres resource parameters"** | Missing `database` field in postgres resource | Add full `database` path: `projects//branches//databases/` | -| **`CREATE TABLE IF NOT EXISTS "drizzle"."__drizzle_migrations"` fails** | Grant step was skipped — SP lacks Postgres permissions | Run `grant_lakebase_permissions.py` with `--memory-type`, then restart the app | - ---- - -## LakebaseClient API (for reference) - -```python -from databricks_ai_bridge.lakebase import LakebaseClient, SchemaPrivilege, TablePrivilege - -# Provisioned: -client = LakebaseClient(instance_name="...") -# Autoscaling: -client = LakebaseClient(project="...", branch="...") - -# Create role (must do first) -client.create_role(identity_name, "SERVICE_PRINCIPAL") - -# Grant schema (note: schemas is a list, grantee not role) -client.grant_schema( - grantee="...", - schemas=["public"], - privileges=[SchemaPrivilege.USAGE, SchemaPrivilege.CREATE], -) - -# Grant tables (note: tables includes schema prefix) -client.grant_table( - grantee="...", - tables=["public.store"], - privileges=[TablePrivilege.SELECT, TablePrivilege.INSERT, ...], -) - -# Execute raw SQL -client.execute("SELECT * FROM pg_tables WHERE schemaname = 'public'") -``` - -### Service Principal Identifiers - -When granting permissions manually, note that Databricks apps have multiple identifiers: - -| Field | Format | Example | -|-------|--------|---------| -| `service_principal_id` | Numeric ID | `1234567890123456` | -| `service_principal_client_id` | UUID | `a1b2c3d4-e5f6-7890-abcd-ef1234567890` | -| `service_principal_name` | String name | `my-app-service-principal` | - -**Get all identifiers:** -```bash -DATABRICKS_CONFIG_PROFILE= databricks apps get --output json | jq '{ - id: .service_principal_id, - client_id: .service_principal_client_id, - name: .service_principal_name -}' -``` - -**Which to use:** -- `LakebaseClient.create_role()` - Use `service_principal_client_id` (UUID) or `service_principal_name` -- Raw SQL grants - Use `service_principal_client_id` (UUID) +| "embedding_dims is required" | Missing parameter | Add `embedding_dims=1024` to AsyncDatabricksStore | +| "relation 'store' does not exist" | Tables not initialized | Ensure SP has CREATE permission; tables auto-create on first use | +| "Unable to resolve Lakebase instance 'None'" | Missing env var | Add `LAKEBASE_INSTANCE_NAME` to `databricks.yml` `config.env` | +| "permission denied for table store" | Missing grants | Run `grant_lakebase_permissions.py ` | +| "Failed to connect to Lakebase" | Wrong instance/project/branch | Verify values in `databricks.yml` and `.env` | +| App not updated after deploy | Forgot bundle run | Run `databricks bundle run ` after deploy | +| `value_from` not resolving | Resource name mismatch | Ensure `value_from` matches resource `name` in `databricks.yml` | +| "Invalid postgres resource parameters" | Missing `database` field | Add full path: `projects//branches//databases/` | +| `CREATE TABLE "drizzle"` fails | Grant step skipped | Run `grant_lakebase_permissions.py`, then restart app | --- diff --git a/.claude/skills/long-running-server/SKILL.md b/.claude/skills/long-running-server/SKILL.md index 694e7631..d9ef5608 100644 --- a/.claude/skills/long-running-server/SKILL.md +++ b/.claude/skills/long-running-server/SKILL.md @@ -1,79 +1,45 @@ --- name: long-running-server -description: "Enable long-running background task support with LongRunningAgentServer. Use when: (1) Agent tasks may exceed HTTP timeout (~120s), (2) User wants background/async execution, (3) User says 'long running', 'background tasks', or 'async agent'." +description: "Upgrade to LongRunningAgentServer for background task execution surviving HTTP timeouts. Configures task queuing, status polling, and stream resumption. Use when: (1) Agent tasks may exceed HTTP timeout (~120s), (2) User wants background/async execution, (3) User says 'long running', 'background tasks', or 'async agent'." --- # Enable Long-Running Agent Server -> **Prerequisite:** Lakebase must be configured. If not already set up, follow the **lakebase-setup** skill first. +> **Prerequisite:** Lakebase must be configured. Follow the **lakebase-setup** skill first if not done. -Upgrades from `AgentServer` to `LongRunningAgentServer`, enabling background task execution that survives HTTP timeouts. Long-running tasks are persisted to Lakebase PostgreSQL so clients can poll or stream results. - -## What It Enables +Upgrades from `AgentServer` to `LongRunningAgentServer`, enabling background task execution persisted to Lakebase PostgreSQL. | Request pattern | Description | |---|---| -| **Standard** | `POST /responses` — blocks until complete (queries ≤ 120s) | -| **Background + Poll** | `POST /responses { background: true }` → `GET /responses/{id}` | +| **Standard** | `POST /responses` -- blocks until complete (queries <= 120s) | +| **Background + Poll** | `POST /responses { background: true }` then `GET /responses/{id}` | | **Background + Stream** | `POST /responses { background: true, stream: true }` with cursor-based resumption via `starting_after` | --- ## Step 1: Add Dependency -Add `databricks-ai-bridge[agent-server]` to `pyproject.toml`: - ```toml -dependencies = [ - # ... existing dependencies ... - "databricks-ai-bridge[agent-server]>=0.18.0", -] +dependencies = ["databricks-ai-bridge[agent-server]>=0.18.0"] ``` -Run `uv sync` to install. +Verify: `uv sync && python -c "from databricks_ai_bridge.long_running import LongRunningAgentServer"` --- ## Step 2: Update `start_server.py` -Replace the basic `AgentServer` with `LongRunningAgentServer`. Key changes: - -1. Import `LongRunningAgentServer` instead of `AgentServer` -2. Subclass it to override `transform_stream_event` (replaces placeholder IDs in streamed events) -3. Pass Lakebase connection config and timeout settings -4. Add a lifespan hook to initialize database tables at startup - -### OpenAI SDK +Replace `AgentServer` with `LongRunningAgentServer`. Key changes from the base `start_server.py`: ```python -"""Agent server entry point. load_dotenv must run before agent imports (auth config).""" - -# ruff: noqa: E402 -import os -from contextlib import asynccontextmanager -from pathlib import Path - -from dotenv import load_dotenv - -load_dotenv(dotenv_path=Path(__file__).parent.parent / ".env", override=True) - -import logging - from databricks_ai_bridge.long_running import LongRunningAgentServer -from mlflow.genai.agent_server import setup_mlflow_git_based_version_tracking - from agent_server.utils import lakebase_config, replace_fake_id - -import agent_server.agent # noqa: F401 - -logger = logging.getLogger(__name__) - +# LangGraph uses: from agent_server.utils import LAKEBASE_CONFIG as lakebase_config, replace_fake_id class AgentServer(LongRunningAgentServer): def transform_stream_event(self, event, response_id): return replace_fake_id(event, response_id) - agent_server = AgentServer( "ResponsesAgent", enable_chat_proxy=True, @@ -84,82 +50,15 @@ agent_server = AgentServer( task_timeout_seconds=float(os.getenv("TASK_TIMEOUT_SECONDS", "3600")), poll_interval_seconds=float(os.getenv("POLL_INTERVAL_SECONDS", "1.0")), ) - -log_level = os.getenv("LOG_LEVEL", "INFO") -logging.getLogger("agent_server").setLevel(getattr(logging, log_level.upper(), logging.INFO)) - -_original_lifespan = agent_server.app.router.lifespan_context - - -@asynccontextmanager -async def _lifespan(app): - # Initialize session/long-running tables at startup. - # If using AsyncDatabricksSession, create a throwaway session and call _ensure_tables(). - async with _original_lifespan(app): - yield - - -agent_server.app.router.lifespan_context = _lifespan - -app = agent_server.app # noqa: F841 -setup_mlflow_git_based_version_tracking() - - -def main(): - agent_server.run(app_import_string="agent_server.start_server:app") ``` -### LangGraph +Keep the existing `load_dotenv`, `setup_mlflow_git_based_version_tracking()`, and `main()` boilerplate. Add a lifespan hook to initialize Lakebase tables at startup: ```python -"""Agent server entry point. load_dotenv must run before agent imports (auth config).""" - -# ruff: noqa: E402 -import os -from contextlib import asynccontextmanager -from pathlib import Path - -from dotenv import load_dotenv - -load_dotenv(dotenv_path=Path(__file__).parent.parent / ".env", override=True) - -import logging - -from databricks_ai_bridge.long_running import LongRunningAgentServer -from mlflow.genai.agent_server import setup_mlflow_git_based_version_tracking - -from agent_server.utils import replace_fake_id, LAKEBASE_CONFIG - -import agent_server.agent # noqa: F401 - -logger = logging.getLogger(__name__) - - -class AgentServer(LongRunningAgentServer): - def transform_stream_event(self, event, response_id): - return replace_fake_id(event, response_id) - - -agent_server = AgentServer( - "ResponsesAgent", - enable_chat_proxy=True, - db_instance_name=LAKEBASE_CONFIG.instance_name, - db_autoscaling_endpoint=LAKEBASE_CONFIG.autoscaling_endpoint, - db_project=LAKEBASE_CONFIG.autoscaling_project, - db_branch=LAKEBASE_CONFIG.autoscaling_branch, - task_timeout_seconds=float(os.getenv("TASK_TIMEOUT_SECONDS", "3600")), - poll_interval_seconds=float(os.getenv("POLL_INTERVAL_SECONDS", "1.0")), -) - -app = agent_server.app # noqa: F841 -setup_mlflow_git_based_version_tracking() - _original_lifespan = app.router.lifespan_context - @asynccontextmanager async def _lifespan(app): - # Initialize Lakebase tables at startup (e.g. run_lakebase_setup) try: async with _original_lifespan(app): yield @@ -167,177 +66,80 @@ async def _lifespan(app): logger.warning("Long-running DB init failed: %s. Background mode disabled.", exc) yield - app.router.lifespan_context = _lifespan - - -def main(): - agent_server.run(app_import_string="agent_server.start_server:app") ``` +Verify: `uv run start-app` -- server should start without import errors. + --- ## Step 3: Add `replace_fake_id` Utility -Add to `utils.py` if not already present. The implementation differs by SDK: - -### OpenAI SDK +Add to `utils.py`. Only the match condition differs by SDK: ```python +# OpenAI SDK: match exact constant try: from agents.models.fake_id import FAKE_RESPONSES_ID except ImportError: FAKE_RESPONSES_ID = "__fake_id__" +_match = lambda s: s == FAKE_RESPONSES_ID +# LangGraph: match prefix +# _match = lambda s: s.startswith("resp_placeholder_") def replace_fake_id(obj, real_id: str): - """Recursively replace FAKE_RESPONSES_ID with real_id.""" if isinstance(obj, dict): return {k: replace_fake_id(v, real_id) for k, v in obj.items()} elif isinstance(obj, list): return [replace_fake_id(item, real_id) for item in obj] - elif isinstance(obj, str) and obj == FAKE_RESPONSES_ID: + elif isinstance(obj, str) and _match(obj): return real_id return obj ``` -### LangGraph - -```python -_FAKE_ID_PREFIX = "resp_placeholder_" - - -def replace_fake_id(obj, real_id: str): - """Recursively replace any resp_placeholder_* ID with real_id.""" - if isinstance(obj, dict): - return {k: replace_fake_id(v, real_id) for k, v in obj.items()} - elif isinstance(obj, list): - return [replace_fake_id(item, real_id) for item in obj] - elif isinstance(obj, str) and obj.startswith(_FAKE_ID_PREFIX): - return real_id - return obj -``` - ---- - -## Step 4: Add Lakebase Config - -Add to `utils.py` if not already present. This reads Lakebase connection parameters from environment variables: - -```python -import os -from dataclasses import dataclass -from typing import Optional - - -@dataclass(frozen=True) -class LakebaseConfig: - instance_name: Optional[str] - autoscaling_endpoint: Optional[str] - autoscaling_project: Optional[str] - autoscaling_branch: Optional[str] - - -def init_lakebase_config() -> LakebaseConfig: - """Read lakebase env vars. Priority: endpoint > project+branch > instance_name.""" - endpoint = os.getenv("LAKEBASE_AUTOSCALING_ENDPOINT") or None - raw_name = os.getenv("LAKEBASE_INSTANCE_NAME") or None - project = os.getenv("LAKEBASE_AUTOSCALING_PROJECT") or None - branch = os.getenv("LAKEBASE_AUTOSCALING_BRANCH") or None - - has_autoscaling = project and branch - if not endpoint and not raw_name and not has_autoscaling: - raise ValueError( - "Lakebase configuration is required. Set one of:\n" - " LAKEBASE_AUTOSCALING_ENDPOINT=\n" - " LAKEBASE_AUTOSCALING_PROJECT + LAKEBASE_AUTOSCALING_BRANCH\n" - " LAKEBASE_INSTANCE_NAME=\n" - ) - - if endpoint: - return LakebaseConfig(instance_name=None, autoscaling_endpoint=endpoint, - autoscaling_project=None, autoscaling_branch=None) - elif has_autoscaling: - return LakebaseConfig(instance_name=None, autoscaling_endpoint=None, - autoscaling_project=project, autoscaling_branch=branch) - else: - return LakebaseConfig(instance_name=raw_name, autoscaling_endpoint=None, - autoscaling_project=None, autoscaling_branch=None) - - -# Module-level singleton -lakebase_config = init_lakebase_config() -``` - --- -## Step 5: Configure `databricks.yml` +## Step 4: Configure Environment -Add Lakebase resource and env vars per the **lakebase-setup** skill. The long-running server additionally uses these optional env vars: +Add to `databricks.yml` config env (Lakebase vars per **lakebase-setup** skill, plus): ```yaml -config: - env: - # ... existing env vars ... - - name: TASK_TIMEOUT_SECONDS - value: "3600" - - name: POLL_INTERVAL_SECONDS - value: "1.0" - - name: LOG_LEVEL - value: "INFO" +- name: TASK_TIMEOUT_SECONDS + value: "3600" +- name: POLL_INTERVAL_SECONDS + value: "1.0" ``` ---- - -## Step 6: Configure `.env` for Local Development - -Add Lakebase connection vars (see **lakebase-setup** skill for all options): - -```bash -# Pick ONE mode: -# Option 1: Autoscaling endpoint -LAKEBASE_AUTOSCALING_ENDPOINT= -# Option 2: Autoscaling project/branch -LAKEBASE_AUTOSCALING_PROJECT= -LAKEBASE_AUTOSCALING_BRANCH= -# Option 3: Provisioned instance -LAKEBASE_INSTANCE_NAME= - -# Optional tuning -TASK_TIMEOUT_SECONDS=3600 -POLL_INTERVAL_SECONDS=1.0 -LOG_LEVEL=INFO -``` +Add to `.env`: `TASK_TIMEOUT_SECONDS=3600` and `POLL_INTERVAL_SECONDS=1.0` --- -## Step 7: Deploy and Grant Permissions +## Step 5: Deploy, Grant Permissions, and Verify -Follow the **lakebase-setup** skill Steps 5-7 to deploy, grant SP permissions, and run the app. +Follow **lakebase-setup** skill Steps 5-7 to deploy, grant SP permissions, and run the app. ---- +**Verify background mode:** +```bash +# Submit a background task +curl -X POST http://localhost:8000/invocations \ + -H "Content-Type: application/json" \ + -d '{"input": [{"role": "user", "content": "Hello!"}], "background": true}' -## Constructor Reference +# Poll for result (use the response ID from above) +curl http://localhost:8000/responses/ +``` -| Parameter | Type | Default | Description | -|---|---|---|---| -| `name` | `str` | required | Server name (e.g. `"ResponsesAgent"`) | -| `enable_chat_proxy` | `bool` | `False` | Enable chat UI proxy endpoint | -| `db_instance_name` | `str \| None` | `None` | Provisioned Lakebase instance name | -| `db_autoscaling_endpoint` | `str \| None` | `None` | Autoscaling endpoint hostname | -| `db_project` | `str \| None` | `None` | Autoscaling project name | -| `db_branch` | `str \| None` | `None` | Autoscaling branch name | -| `task_timeout_seconds` | `float` | `3600` | Max background task time before timeout | -| `poll_interval_seconds` | `float` | `1.0` | Stream event poll interval | +If polling returns the agent's response, long-running mode is working. --- ## Troubleshooting -| Issue | Cause | Solution | -|---|---|---| -| `ImportError: cannot import LongRunningAgentServer` | Missing dependency | Add `databricks-ai-bridge[agent-server]>=0.18.0` and `uv sync` | -| `background=true` returns but no result | Lakebase not configured | Set Lakebase env vars in `.env` / `databricks.yml` | -| Task times out | Long agent execution | Increase `TASK_TIMEOUT_SECONDS` | -| Stream events have placeholder IDs | Missing `transform_stream_event` | Ensure `AgentServer` subclass overrides it | -| DB initialization failed warning | Lakebase connection error | Check env vars and permissions (see **lakebase-setup** skill) | +| Issue | Solution | +|---|---| +| `ImportError: cannot import LongRunningAgentServer` | Add `databricks-ai-bridge[agent-server]>=0.18.0` and `uv sync` | +| `background=true` returns but no result | Lakebase not configured -- set env vars in `.env` / `databricks.yml` | +| Task times out | Increase `TASK_TIMEOUT_SECONDS` | +| Stream events have placeholder IDs | Ensure `AgentServer` subclass overrides `transform_stream_event` | +| DB initialization failed warning | Check Lakebase env vars and permissions (see **lakebase-setup** skill) | diff --git a/.claude/skills/quickstart/SKILL.md b/.claude/skills/quickstart/SKILL.md index 9763bff4..0d4ccbc8 100644 --- a/.claude/skills/quickstart/SKILL.md +++ b/.claude/skills/quickstart/SKILL.md @@ -1,6 +1,6 @@ --- name: quickstart -description: "Set up Databricks agent development environment. Use when: (1) First time setup, (2) Configuring Databricks authentication, (3) User says 'quickstart', 'set up', 'authenticate', or 'configure databricks', (4) No .env file exists." +description: "Set up Databricks agent development environment: configure authentication, create MLflow experiment, and initialize .env. Use when: (1) First time setup, (2) Configuring Databricks authentication, (3) User says 'quickstart', 'set up', 'authenticate', or 'configure databricks', (4) No .env file exists." --- # Quickstart & Authentication @@ -9,13 +9,7 @@ description: "Set up Databricks agent development environment. Use when: (1) Fir - **uv** (Python package manager) - **nvm** with Node 20 (for frontend) -- **Databricks CLI v0.283.0+** - -Check CLI version: -```bash -databricks -v # Must be v0.283.0 or above -brew upgrade databricks # If version is too old -``` +- **Databricks CLI v0.283.0+**: `databricks -v` (upgrade: `brew upgrade databricks`) ## Run Quickstart @@ -23,14 +17,7 @@ brew upgrade databricks # If version is too old uv run quickstart ``` -**Options:** -- `--profile NAME`: Use specified profile (non-interactive) -- `--host URL`: Workspace URL for initial setup -{{LAKEBASE_OPTIONS}}- `--skip-lakebase`: Skip Lakebase setup (non-interactive / CI use) -- `--app-name NAME`: Existing Databricks app name to bind this bundle to -- `-h, --help`: Show help - -**Examples:** +**Common options:** ```bash # Interactive (prompts for profile selection) uv run quickstart @@ -41,78 +28,55 @@ uv run quickstart --profile DEFAULT # New workspace setup uv run quickstart --host https://your-workspace.cloud.databricks.com -# Bind to an existing app created via the Databricks UI +# Bind to an existing app created via Databricks UI uv run quickstart --app-name my-existing-app # Skip Lakebase setup (CI / non-interactive) uv run quickstart --profile DEFAULT --skip-lakebase -{{LAKEBASE_EXAMPLES}}``` +``` -## What Quickstart Configures +Run `uv run quickstart -h` for all options. -Creates/updates `.env` with: -- `DATABRICKS_CONFIG_PROFILE` - Selected CLI profile -- `MLFLOW_TRACKING_URI` - Set to `databricks://` for local auth -- `MLFLOW_EXPERIMENT_ID` - Auto-created experiment ID -{{LAKEBASE_CONFIGURES_ENV}} -Updates `databricks.yml`: -- Sets `experiment_id` in the app's experiment resource -- Updates app `name` field if `--app-name` is provided -{{LAKEBASE_CONFIGURES_YML}} +## Verify Setup -## Existing App +```bash +# Confirm auth works +databricks auth env --profile $(grep DATABRICKS_CONFIG_PROFILE .env | cut -d= -f2) -If you created an app via the Databricks UI before cloning a template, use `--app-name` to bind the bundle to it: +# Confirm .env was created with required values +cat .env | grep -E 'DATABRICKS_CONFIG_PROFILE|MLFLOW_EXPERIMENT_ID' +``` + +## Existing App Binding + +If an app was created via the Databricks UI before cloning, use `--app-name` to bind the bundle: ```bash uv run quickstart --app-name my-existing-app ``` -Quickstart will update `databricks.yml` with the app name and print the binding command: +Then run the printed binding command: ```bash databricks bundle deployment bind my-existing-app --auto-approve databricks bundle deploy ``` -This avoids the "An app with the same name already exists" error on first deploy. - ## Idempotency -Re-running quickstart is safe: -- **Experiment**: If `MLFLOW_EXPERIMENT_ID` is already in `.env` and the experiment still exists, it is reused (no duplicate created). -- **Lakebase**: If Lakebase config is already in `.env`, the interactive prompt is skipped and the existing config is reused. +Re-running quickstart is safe -- existing experiments and Lakebase configs are reused, not duplicated. ## Manual Authentication (Fallback) If quickstart fails: ```bash -# Create new profile databricks auth login --host https://your-workspace.cloud.databricks.com - -# Verify -databricks auth profiles +databricks auth profiles # Verify ``` -Then manually create `.env` (copy from `.env.example`): -```bash -# Authentication (choose one method) -DATABRICKS_CONFIG_PROFILE=DEFAULT -# DATABRICKS_HOST=https://.databricks.com -# DATABRICKS_TOKEN=dapi.... - -# MLflow configuration -MLFLOW_EXPERIMENT_ID= -MLFLOW_TRACKING_URI="databricks://DEFAULT" -MLFLOW_REGISTRY_URI="databricks-uc" - -# Frontend proxy settings -CHAT_APP_PORT=3000 -CHAT_PROXY_TIMEOUT_SECONDS=300 -``` +Then create `.env` from `.env.example` and fill in `DATABRICKS_CONFIG_PROFILE`, `MLFLOW_EXPERIMENT_ID`, and `MLFLOW_TRACKING_URI`. ## Next Steps -After quickstart completes: 1. Run `uv run discover-tools` to find available workspace resources (see **discover-tools** skill) 2. Run `uv run start-app` to test locally (see **run-locally** skill) diff --git a/agent-langchain-ts/.claude/skills/deploy/SKILL.md b/agent-langchain-ts/.claude/skills/deploy/SKILL.md index afbfb0e3..37c01068 100644 --- a/agent-langchain-ts/.claude/skills/deploy/SKILL.md +++ b/agent-langchain-ts/.claude/skills/deploy/SKILL.md @@ -1,61 +1,27 @@ --- name: deploy -description: "Deploy TypeScript LangChain agent to Databricks. Use when: (1) User wants to deploy, (2) User says 'deploy', 'push to databricks', 'production', (3) After making changes that need deployment." +description: "Build, validate, and deploy TypeScript LangChain agent to Databricks Apps. Use when: (1) User wants to deploy, (2) User says 'deploy', 'push to databricks', 'production', (3) After making changes that need deployment." --- # Deploy to Databricks -## Quick Deploy - -```bash -# Validate configuration -databricks bundle validate -t dev - -# Deploy to dev environment -databricks bundle deploy -t dev - -# Start the app -databricks bundle run agent_langchain_ts -``` - ## Deployment Targets -### Development (dev) -```bash -databricks bundle deploy -t dev -``` - -**Characteristics:** -- Default target -- User-scoped naming: `db-agent-langchain-ts-` -- Development mode permissions -- Auto-created resources +| Target | Command | Naming | Notes | +|--------|---------|--------|-------| +| **dev** (default) | `databricks bundle deploy -t dev` | `db-agent-langchain-ts-` | User-scoped, auto-created resources | +| **prod** | `databricks bundle deploy -t prod` | `db-agent-langchain-ts-prod` | Stricter permissions, fixed naming | -### Production (prod) -```bash -databricks bundle deploy -t prod -``` - -**Characteristics:** -- Production mode -- Stricter permissions -- Fixed naming: `db-agent-langchain-ts-prod` -- Requires explicit configuration +--- -## Step-by-Step Deployment +## Deploy Workflow -### 1. Prepare Code +### 1. Build and Test Locally -Ensure code is committed and tested: ```bash -# Test locally first -npm run dev - -# Run tests -npm test - -# Verify build works -npm run build +npm run dev # Test locally first +npm test # Run tests +npm run build # Verify build ``` ### 2. Validate Bundle @@ -64,370 +30,73 @@ npm run build databricks bundle validate -t dev ``` -This checks: -- `databricks.yml` syntax -- `app.yaml` configuration -- Resource references -- Variable interpolation +Checks `databricks.yml` syntax, `app.yaml` config, resource references, and variable interpolation. -### 3. Deploy Bundle +### 3. Deploy ```bash databricks bundle deploy -t dev ``` -This will: -- Create MLflow experiment if needed -- Upload source code -- Configure app environment -- Grant resource permissions -- Create app instance - ### 4. Start App ```bash databricks bundle run agent_langchain_ts ``` -Or manually: -```bash -databricks apps start db-agent-langchain-ts- -``` - -### 5. Verify Deployment +### 5. Verify ```bash -# Check app status +# Check status databricks apps get db-agent-langchain-ts- -# View logs +# Follow logs databricks apps logs db-agent-langchain-ts- --follow # Test health endpoint curl https:///apps/db-agent-langchain-ts-/health -``` - -## Managing Existing Apps - -### Bind Existing App - -If app already exists: -```bash -# Get app details -databricks apps get db-agent-langchain-ts- - -# Bind to bundle -databricks bundle deploy -t dev --force-bind -``` - -### Delete and Recreate - -```bash -# Delete existing app -databricks apps delete db-agent-langchain-ts- - -# Deploy fresh -databricks bundle deploy -t dev -``` - -## Configuration Files - -### databricks.yml - -Main bundle configuration: - -```yaml -bundle: - name: agent-langchain-ts - -variables: - serving_endpoint_name: - default: "databricks-claude-sonnet-4-5" - -resources: - experiments: - agent_experiment: - name: /Users/${workspace.current_user.userName}/agent-langchain-ts - - apps: - agent_langchain_ts: - name: db-agent-langchain-ts-${var.resource_name_suffix} - source_code_path: ./ - resources: - - name: serving-endpoint - serving_endpoint: - name: ${var.serving_endpoint_name} - permission: CAN_QUERY -``` - -### app.yaml - -Runtime configuration: - -```yaml -command: - - npm - - start - -env: - - name: DATABRICKS_MODEL - value: "databricks-claude-sonnet-4-5" - - name: MLFLOW_TRACKING_URI - value: "databricks" - - name: MLFLOW_EXPERIMENT_ID - value_from: "experiment" - -resources: - - name: serving-endpoint - serving_endpoint: - name: ${var.serving_endpoint_name} - permission: CAN_QUERY -``` - -## Viewing Deployed App - -### Get App URL - -```bash -databricks apps get db-agent-langchain-ts- --output json | jq -r .url -``` - -### Access App - -Navigate to: -``` -https:///apps/db-agent-langchain-ts- -``` - -### Test Deployed App - -```bash -# Health check -curl https:///apps/db-agent-langchain-ts-/health - -# Chat request +# Test chat curl -X POST https:///apps/db-agent-langchain-ts-/api/chat \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ - -d '{ - "messages": [ - {"role": "user", "content": "Hello!"} - ] - }' -``` - -## Monitoring - -### View Logs - -```bash -# Follow logs in real-time -databricks apps logs db-agent-langchain-ts- --follow - -# Get last 100 lines -databricks apps logs db-agent-langchain-ts- --tail 100 - -# Filter logs -databricks apps logs db-agent-langchain-ts- | grep ERROR -``` - -### View MLflow Traces - -See [MLflow Tracing Guide](../_shared/MLFLOW.md) for viewing traces in your workspace. - -### App Metrics - -```bash -# Get app details -databricks apps get db-agent-langchain-ts- --output json - -# Check app state -databricks apps get db-agent-langchain-ts- --output json | jq -r .state -``` - -## Updating Deployed App - -### Update Code - -```bash -# Make changes to code -# Then redeploy -databricks bundle deploy -t dev - -# Restart app -databricks apps restart db-agent-langchain-ts- + -d '{"messages": [{"role": "user", "content": "Hello!"}]}' ``` -### Update Configuration +--- -Edit `app.yaml` or `databricks.yml`, then: +## Update or Redeploy ```bash +# After code/config changes: databricks bundle deploy -t dev databricks apps restart db-agent-langchain-ts- ``` -## Adding Resources - -### Add Serving Endpoint Permission - -Edit `app.yaml`: - -```yaml -resources: - - name: serving-endpoint - serving_endpoint: - name: "your-endpoint-name" - permission: CAN_QUERY -``` - -Then redeploy: -```bash -databricks bundle deploy -t dev -``` - -### Add Unity Catalog Function - -Edit `databricks.yml`: - -```yaml -resources: - - name: uc-function - function: - name: "catalog.schema.function_name" - permission: EXECUTE -``` - -Update `app.yaml` to pass function config: - -```yaml -env: - - name: UC_FUNCTION_CATALOG - value: "catalog" - - name: UC_FUNCTION_SCHEMA - value: "schema" - - name: UC_FUNCTION_NAME - value: "function_name" -``` - -Redeploy: -```bash -databricks bundle deploy -t dev -``` - -## Troubleshooting - -### "App with same name already exists" +### Bind Existing App -Either bind existing app: ```bash databricks bundle deploy -t dev --force-bind ``` -Or delete and recreate: +### Delete and Recreate + ```bash databricks apps delete db-agent-langchain-ts- databricks bundle deploy -t dev ``` -### "Permission denied on serving endpoint" - -Ensure endpoint is listed in `app.yaml` resources: -```yaml -resources: - - name: serving-endpoint - serving_endpoint: - name: "databricks-claude-sonnet-4-5" - permission: CAN_QUERY -``` - -### "Experiment not found" - -Create experiment: -```bash -databricks experiments create \ - --experiment-name "/Users/$(databricks current-user me --output json | jq -r .userName)/agent-langchain-ts" -``` - -Or update `databricks.yml` to auto-create: -```yaml -resources: - experiments: - agent_experiment: - name: /Users/${workspace.current_user.userName}/agent-langchain-ts -``` - -### "App failed to start" - -Check logs: -```bash -databricks apps logs db-agent-langchain-ts- -``` - -Common issues: -- Missing dependencies in `package.json` -- Incorrect `npm start` command in `app.yaml` -- Missing environment variables -- Build errors - -### "Cannot reach app URL" - -Verify: -1. App is running: `databricks apps get | jq -r .state` -2. URL is correct: `databricks apps get | jq -r .url` -3. Authentication token is valid - -## CI/CD Integration - -### GitHub Actions Example - -```yaml -name: Deploy to Databricks - -on: - push: - branches: [main] - -jobs: - deploy: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v3 - - - name: Set up Node.js - uses: actions/setup-node@v3 - with: - node-version: '18' - - - name: Install dependencies - run: npm install - - - name: Run tests - run: npm test - - - name: Install Databricks CLI - run: | - curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh - - - name: Deploy to Databricks - env: - DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} - DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }} - run: | - databricks bundle deploy -t prod - databricks bundle run agent_langchain_ts -``` +--- -## Best Practices +## Troubleshooting -1. **Test Locally First**: Always test with `npm run dev` before deploying -2. **Use Dev Environment**: Test deployments in dev before prod -3. **Monitor Logs**: Check logs after deployment -4. **Version Control**: Commit changes before deploying -5. **Resource Permissions**: Verify all required resources are granted in `app.yaml` -6. **MLflow Traces**: Monitor traces to debug issues -7. **Incremental Updates**: Make small changes and test frequently +| Issue | Solution | +|-------|----------| +| "App with same name already exists" | `databricks bundle deploy -t dev --force-bind` or delete first | +| "Permission denied on serving endpoint" | Add endpoint to `app.yaml` resources with `permission: CAN_QUERY` | +| "Experiment not found" | Add experiment to `databricks.yml` resources (auto-creates on deploy) | +| "App failed to start" | Check logs: `databricks apps logs `. Common: missing deps, bad start command, missing env vars | +| "Cannot reach app URL" | Verify app state: `databricks apps get \| jq -r .state` and URL: `jq -r .url` | ## Related Skills