feat(agent-server): add docker runtime mode for per-conversation containers#3403
feat(agent-server): add docker runtime mode for per-conversation containers#3403rbren wants to merge 3 commits into
Conversation
…ainers
When Config.conversation_runtime == 'docker', every conversation runs
in its own Docker container hosting another agent-server (in local mode).
The outer agent-server acts as a thin reverse proxy in front of the
per-conversation containers:
* POST /api/conversations spawns a fresh container, mints a session
key, and forwards the create request.
* All other /api/conversations/{cid}/... HTTP routes — including
/run, /pause, /events/..., /workspace/..., the
trajectory download, secrets, etc. — are forwarded verbatim to the
matching container via a catch-all proxy route.
* The /sockets/events/{cid} WebSocket is bridged to the inner
container with the same session key.
* DELETE /api/conversations/{cid} proxies the delete and then stops
the container.
* GET /api/conversations, /search and /count fan out across
the registered containers.
Default behavior (conversation_runtime == 'local') is unchanged.
No container pre-warming, no pools: each conversation gets a fresh
container at first use and an in-memory registry tracks the host port +
session key. Implementation lives entirely in docker_runtime/;
api.py only learns how to install the routers and start/stop the
container manager.
Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
⚠️ QA Report: PASS WITH ISSUES
Docker runtime mode works for the core create/proxy/WebSocket/delete flow, but I found two conversation API compatibility regressions in docker mode.
Does this PR achieve its stated goal?
Partially. I verified a real outer uvicorn agent-server in OH_CONVERSATION_RUNTIME=docker mode pulled the documented ghcr.io/openhands/agent-server:latest-python image, created a per-conversation Docker container, rewrote the workspace to /workspace, proxied GET /api/conversations/{id}, bridged /sockets/events/{id}, and removed the container on DELETE. However, two claimed preserved endpoints do not match local-mode behavior: GET /api/conversations?ids=<id> returns 500 in docker mode, and /api/conversations/count changes the response shape from a raw number to an object.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build succeeded; Docker daemon was available (28.0.4) and the documented runtime image pulled successfully. |
| CI Status | pre-commit was failing and several jobs were still pending; multiple tests/checks were green. I did not rerun CI tests. |
| Functional Verification |
Functional Verification
Test 1: Baseline local-mode API contract
Step 1 — Establish baseline (local mode):
Started the server with OH_CONVERSATION_RUNTIME=local and created a conversation using the normal HTTP API. Then queried the existing list/count endpoints:
curl "http://127.0.0.1:18081/api/conversations?ids=$LCID"
# HTTP 200, body: [{"id":"526d00e9-fefa-45a2-b355-dfdc9f53802f", ...}]
curl "http://127.0.0.1:18081/api/conversations/count"
# HTTP 200, body: 1This establishes the existing client-visible contract: ids lookup returns a JSON array, and count returns a raw JSON number.
Test 2: PR docker runtime core flow
Step 2 — Apply PR behavior:
Started the PR server with:
OH_CONVERSATION_RUNTIME=docker OH_CONVERSATION_CONTAINER_STARTUP_TIMEOUT=90 uv run uvicorn openhands.agent_server.api:create_app --factory --host 127.0.0.1 --port 18080Created a conversation through the outer server:
curl -H 'Content-Type: application/json' --data @/tmp/pr-start.json http://127.0.0.1:18080/api/conversations
# HTTP 201, id=1e74d784-b1c0-4fad-b142-27e7c1bc7343,
# workspace.working_dir=/workspace
docker ps --filter 'name=oh-conv-'
# ebbdda2c8f99 oh-conv-1e74d784b1c04fadb14227e7c1bc7343-79ca09c1 ... 0.0.0.0:30450->8000/tcpThis confirms the PR creates a real per-conversation container and rewrites the workspace path into the container.
Step 3 — Exercise proxied traffic:
curl "http://127.0.0.1:18080/api/conversations/$CID"
# HTTP 200, returned the created conversation with workspace.working_dir=/workspace
curl "http://127.0.0.1:18080/api/conversations/search"
# HTTP 200, returned items containing id=1e74d784-b1c0-4fad-b142-27e7c1bc7343
uv run python /tmp/qa_ws_check.py
# connected
# {"id":"5d0e05be-01b2-441e-9f76-975d9f00673c","timestamp":"2026-05-27T14...
curl -X DELETE "http://127.0.0.1:18080/api/conversations/$CID"
# HTTP 200, body: {"success":true}
docker ps --filter 'name=oh-conv-'
# no remaining QA containersThis confirms root HTTP proxying, search aggregation, WebSocket bridging, and DELETE cleanup work in a real Docker-backed run.
Test 3: Reproduced docker-mode compatibility regressions
Step 1 — Baseline: local mode returned HTTP 200 with a JSON array for GET /api/conversations?ids=<id> and raw 1 for /count.
Step 2 — PR docker mode: the same user-facing endpoints behaved differently:
curl "http://127.0.0.1:18080/api/conversations?ids=$CID"
# HTTP 500
# {"detail":"Internal Server Error","exception":"'list' object has no attribute 'get'"}
curl "http://127.0.0.1:18080/api/conversations/count"
# HTTP 200
# {"count":1}This shows docker mode does not fully preserve the existing conversation endpoint contract promised in the PR description.
Issues Found
- 🟠 Issue:
GET /api/conversations?ids=<conversation_id>returns 500 in docker mode instead of the local-mode JSON array response. - 🟠 Issue:
GET /api/conversations/countchanges response shape from raw JSON number (1) to object ({"count":1}).
This review was created by an AI agent (OpenHands) on behalf of the user.
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable direction, but I found a few docker-mode issues that need attention before this is safe to merge: auth bypass, exposed inner servers, and REST/auth contract regressions.
This review was created by an AI agent (OpenHands) on behalf of the user.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🔴 HIGH — this is opt-in, but it changes request routing/authentication and starts network-reachable per-conversation servers.
VERDICT: ❌ Needs rework before merging.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26516796937
Six fixes for the per-conversation docker runtime, driven by reviewer findings on PR #3403: 1. Bind inner container ports to loopback only (-p 127.0.0.1:HOST:8000) so the per-conversation agent-servers can only be reached through the outer server's authenticated proxy. (R3311480573) 2. Authenticate the WebSocket bridge against the OUTER server's session keys before opening the upstream connection. Reuses the existing sockets.py helper (header / query / first-message auth), and the bridge no longer calls accept() a second time. (R3311480598) 3. Preserve the local GET /api/conversations?ids=... contract: route is batch-get-by-id, requires the ids query param, returns list[ConversationInfo | None]. Looks each id up in the registry and fetches from its container (None for missing). (R3311480555, R3311480542) 4. Preserve the local /api/conversations/count contract: returns a bare JSON integer (not {"count": N}), honors ?status= by forwarding the query to each inner container and summing their integers. (R3311480576, R3311480571) 5. ContainerManager.start() now returns (running, is_new). The POST route only tears down the container on inner 4xx / connection error when is_new=True, so a retried create against an existing conversation can no longer kill the live container. (R3311480570) 6. Workspace static-file routes mount under the workspace-cookie auth group in docker mode via a new docker_workspace_router. The workspace router is now registered before the header-only api_router so the more specific path wins; browser iframe/<img> embeds with the oh_workspace_session_key cookie continue to work. (R3311480585) Tests: * test_container_manager: assert loopback port binding; updated for the (running, is_new) return tuple, plus an explicit is_new=False assert on the idempotent second start. * test_docker_routers: new tests for batch-get-by-ids (incl. 422 on missing ids, null slots for unknown ids), bare-int /count contract, WS rejects wrong key, WS rejects missing first-message auth, WS accepts with valid outer key, POST retry preserves existing container on inner 4xx, fresh-create cleans up on inner 4xx, workspace route registered before the catch-all. Fake inner app reordered so /search and /count aren't shadowed by /{cid}. 22 docker_runtime tests pass; 144 tests in api / conversation / workspace / docker_runtime all green. Co-authored-by: openhands <openhands@all-hands.dev>
|
Pushed e7ec1a7 addressing all 8 review threads (now resolved). Summary: Critical (security)
Important (API contracts)
Tests This comment was posted by an AI agent (OpenHands) on behalf of the user. |
…n-out for shared-disk read-only metadata
* Drop the 339-line ContainerManager and its bespoke docker run
wrapping; replace with DockerConversationRegistry (190 lines)
that's a thin shell around DockerWorkspace. Image pulls, GPU,
network, port allocation, log streaming, healthchecks, lifecycle
cleanup are all delegated.
* Add bind_host to DockerWorkspace so the outer can publish the
inner agent-server on 127.0.0.1 only (defense-in-depth: only the
outer reaches the inner; other hosts on the network can't bypass
outer auth).
* Replace the docker fan-out across containers (batch_get / count /
search) with shared-disk metadata reads. Outer's ConversationService
runs in a new read_only_metadata mode: skips lease acquisition
and EventService startup; get / search / count /
batch_get re-read meta.json / base_state.json off disk on
every call so sub-container writes show up immediately.
* Bind-mount layout: per-cid conversations/{cid_hex} is the only
conversation dir each sub-container can see (the outer sees all of
them and reads on-disk metadata). Settings/secrets dir is shared
via OH_PERSISTENCE_DIR so cipher keys match.
* Global per-host routers (bash/git/file/vscode/desktop/hooks/mcp/
skills/tools/llm) are reverse-proxied via a required ?cid=…
query parameter — registered one route per prefix so the catch-all
doesn't shadow /api/conversations, /api/settings, etc.
* Auth: outer and inner share OH_SESSION_API_KEYS_0 via
conversation_container_forward_env. The proxy synthesizes the
X-Session-API-Key header from the shared workspace key when the
inbound request authenticated via the workspace-session cookie (so
iframe/<img> embeds still reach the inner static file server).
* Drop the fan-out tests; add tests for the read-only mode + ?cid=
routing. Net diff: -132 LoC across the package while adding new
behaviour-level tests.
Co-authored-by: openhands <openhands@all-hands.dev>
|
Pushed 1. Drop
|
| Concern | ContainerManager |
DockerWorkspace |
|---|---|---|
docker run wrapping |
bespoke argv builder | yes |
| Free port allocation | bespoke | yes |
| Image pulls & cleanup | bespoke | yes (cleanup_image) |
| Network / GPU / platform | partial | yes |
| Volume mounts | bespoke | yes |
| Forwarded env | bespoke | yes (forward_env + extra_env) |
| Log streaming | bespoke | yes (detach_logs) |
| Healthcheck wait | bespoke urlopen loop |
yes (health_check_timeout) |
| Lifecycle / cleanup | bespoke | yes (cleanup) |
I added one small field to DockerWorkspace to cover the one capability that wasn't already there:
bind_host: str— host interface to publish on. Default""keeps-p HOST_PORT:8000; setting"127.0.0.1"gives-p 127.0.0.1:HOST_PORT:8000. The docker registry pins this to127.0.0.1so only the outer agent-server can reach the inner — defense-in-depth around the proxy auth.
2. Drop fan-out across containers; read shared disk instead
Per the review pushback, fan-out was the wrong shape — it was N container hops for what's fundamentally a cheap directory walk. The outer's ConversationService now has a read_only_metadata mode that:
- Skips
EventServicestartup in__aenter__(no leases acquired, no in-memory state, no lease-renewal task). - On every
get/search/count/batch_get, re-walksconversations_pathand readsmeta.json+base_state.jsonstraight off disk. Falls back to a synthesized state for conversations whosebase_state.jsonhasn't been flushed yet. - Mutation methods aren't expected to be called (the docker proxy router intercepts them before they reach
ConversationService).
Bind-mount layout is per-cid: each sub-container only sees its own conversations/{cid_hex} subdirectory. The outer sees all of them. The .openhands settings/secrets dir is shared so OH_SECRET_KEY round-trips correctly.
Other changes asked for in the review
?cid=for global routers (bash/git/file/vscode/desktop/hooks/mcp/skills/tools/llm): registered one specific route per prefix indocker_global_proxy_routerso the catch-all doesn't shadow/api/conversations//api/settings/ etc. Missing?cid=→ clear 400 telling the client what they need to do.X-Session-API-Keyforwarding: outer and inner share the sameOH_SESSION_API_KEYS_0viaconversation_container_forward_env(now includes that key in the default list). The proxy passes through whatever header the client sent; for cookie-authed workspace static files it synthesizes the header fromworkspace.api_key(read out of the outer's env) so the inner static file server is happy.- No outer-side services touched for the simpler approach. The outer still runs
tmux/ vscode / desktop / sockets / settings / profiles in-process — those just aren't conversation-scoped.
Verification
The new architecture still answers every API the user asked about:
| Endpoint group | Where it runs in docker mode |
|---|---|
POST /api/conversations |
proxy → sub-container (spawns it first) |
| `GET /api/conversations[/count | /search |
Per-cid mutations (/run, /pause, /events, …) |
proxy → sub-container |
Workspace static files (/conversations/{cid}/workspace/…) |
proxy → sub-container, cookie-auth preserved |
Global routers (/bash, /git, /file, …) |
proxy → sub-container, requires ?cid= |
WS /sockets/events/{cid} |
outer authenticates, then bridges to sub-container |
Stats
13 files changed, 1083 insertions(+), 1215 deletions(-)
Net -132 LoC even though new tests were added. Locally:
tests/agent_server/test_conversation_service.py 80 passed (4 new read-only-mode tests)
tests/agent_server/test_conversation_router.py 69 passed (no changes)
tests/agent_server/docker_runtime/test_docker_routers 17 passed (rewritten for new registry)
ruff check, ruff format, and pyright all clean on the changed files.
This comment was created by an AI agent (OpenHands) on behalf of the PR author.
Motivation
Today every conversation runs in-process on the agent-server, sharing the same filesystem, tmux server, environment, and (most importantly) blast radius. There's no isolation between conversations and no way to give each conversation a clean, throwaway environment without restarting the whole server.
This PR adds an opt-in docker runtime mode to
openhands-agent-server: the outer server keeps serving the frontend, auth, settings, profiles, etc., but conversation-scoped work is offloaded into a fresh per-conversation Docker container running a second agent-server (in local mode). The outer server just reverse-proxies HTTP and WebSocket traffic to the right container.What changes
Config.conversation_runtime: Literal["local", "docker"](default"local", so existing deployments are untouched).Config.conversation_image,conversation_container_network,conversation_container_volumes,conversation_container_forward_env,conversation_container_platform,conversation_container_startup_timeoutknobs.openhands.agent_server.docker_runtime:container_manager.ContainerManager— owns the in-memoryconversation_id -> RunningContainerregistry, spawns containers viadocker run, allocates host ports in the same 30000–39999 rangeDockerWorkspaceuses, mints a per-container session API key, polls/healthuntil ready, tears down on failure.proxy.proxy_http— streams HTTP request bodies and response bodies between the outer and inner servers, replacing the client'sX-Session-API-Keywith the container's.proxy.bridge_websocket— bidirectional WebSocket bridge usingwebsockets, honoring text/binary frames and either-side close.routers.py— docker-mode replacements forconversation_router,event_router,workspace_router, and the conversation half ofsockets_router.api.pylearns to install the docker routers (and skip the in-process conversation/event/workspace/bash/git/file/vscode/desktop/skills/hooks/mcp routers) when in docker mode, and to start/stop aContainerManagerin the lifespan.Endpoint coverage
Every conversation-scoped endpoint is preserved by the catch-all proxy route, so all of these continue to work in docker mode (just executed inside the container):
POST /api/conversationsworkspace.working_dirto/workspace, forwardGET /api/conversations//search//countitemsGET/PATCH /api/conversations/{cid}DELETE /api/conversations/{cid}/api/conversations/{cid}/...(run, pause, interrupt, secrets, confirmation_policy, switch_profile, switch_llm, condense, fork, agent_final_response, events/, workspace/ including the trajectory download and the static workspace file server)/{cid}/{tail:path}route streams request and responseWS /sockets/events/{cid}/sockets/events/{cid}Non-goals (intentionally)
docker run; cold-start is whatever your image's cold-start is.Tests
tests/agent_server/docker_runtime/test_container_manager.py(6 tests) — stubssubprocess.run, exercises realurlopenagainst a tiny localhost HTTP server, covers happy-path start, idempotency, Docker-unavailable, container-died-during-startup cleanup, single stop, andshutdown()of multiple containers.tests/agent_server/docker_runtime/test_docker_routers.py(9 tests) — boots a real FastAPI "inner" app on an ephemeral port via uvicorn and a stubContainerManagerthat points every conversation at it, then exercises the outer app's HTTP and WebSocket surface end-to-end (including the catch-all subpath, fan-out list, count, delete teardown, missing-container 404, and a websocket round-trip). Also asserts local-mode routes are unchanged.All 1111 existing agent-server tests still pass (verified with the env-isolated run — two pre-existing flakes in
test_webhook_subscriberand one host-env contamination intest_terminal_serviceare unrelated to this PR; each passes in isolation).Running it
The host needs Docker available; the outer server will fail fast on
POST /api/conversationswith a 503 if it isn't.Co-authored-by: openhands openhands@all-hands.dev
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:151cd52-pythonRun
All tags pushed for this build
About Multi-Architecture Support
151cd52-python) is a multi-arch manifest supporting both amd64 and arm64151cd52-python-amd64) are also available if needed