docs(sdk): document FIFOLock held across arun()'s await (Q4 of #3341)#3400
docs(sdk): document FIFOLock held across arun()'s await (Q4 of #3341)#3400VascoSch92 wants to merge 1 commit into
Conversation
arun() holds the state lock (FIFOLock) across 'await astep'. That is safe only because every other state mutator is dispatched via run_in_executor onto a worker thread; FIFOLock is reentrant by OS thread id, so a state-mutating coroutine awaited on the event-loop thread would silently re-enter the lock and corrupt history. Make this load-bearing invariant explicit with a comment at the await site and a note on FIFOLock. Comment-only; no behavior change.
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable. The warning is useful, but one sentence overstates the executor invariant; I left one inline comment.
[RISK ASSESSMENT] 🟢 LOW — documentation-only, no runtime behavior change.
VERDICT: Comment until the wording is clarified.
This review was generated by an AI agent (OpenHands) on behalf of the requester.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26503939638
| ) | ||
|
|
||
| # The state lock is held across this await. This is safe | ||
| # only because every other state mutator is dispatched via |
There was a problem hiding this comment.
🟠 Important: This says every other state mutator is dispatched via run_in_executor, but native async Agent.astep() and ACPAgent.astep() both run state/on-event mutations on the event-loop thread after async awaits. That makes the documented invariant too broad and contradictory. Please narrow this to the actual rule: no unrelated/concurrent same-loop state mutator may run while arun() holds the lock; intentional mutations inside the awaited astep() are part of this critical section.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Runtime SDK verification confirms the PR documents the FIFO lock/arun() async invariant without changing observed lock behavior.
Does this PR achieve its stated goal?
Yes. The stated goal was documentation-only: make the lock-across-await invariant and FIFOLock's thread-based reentrancy explicit. Running the SDK on main showed the underlying same-thread coroutine reentry behavior existed but both warnings were absent; running the same verification on 830ed290 showed both warnings present while imports and runtime lock behavior remained unchanged.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully; SDK imports worked via uv run python. |
| CI Status | 🟡 At time checked: 15 successful, 12 in progress, 3 skipped; no failures observed. |
| Functional Verification | ✅ Before/after runtime introspection showed the new warnings are present and behavior is unchanged. |
Functional Verification
Test 1: Documentation warning appears for the actual SDK objects while lock behavior remains unchanged
Step 1 — Establish baseline without the PR:
Checked out origin/main and ran a Python script that imported FIFOLock and LocalConversation, demonstrated same-event-loop-thread reentry while a lock holder was suspended across await, and inspected the runtime doc/source text exposed by Python introspection:
git checkout origin/main >/tmp/qa_checkout_base.log 2>&1 && uv run python - <<'PY'
import asyncio, inspect, subprocess
from openhands.sdk.conversation.fifo_lock import FIFOLock
from openhands.sdk.conversation.impl.local_conversation import LocalConversation
# creates one holder coroutine and one contender coroutine sharing one FIFOLock
# then prints import status, observed lock events, and whether the new warnings exist
PYOutput excerpt:
commit=5c11fece
import_ok=FIFOLock,LocalConversation
runtime_reentry_events=holder acquired | contender acquired while holder awaited | holder releasing
fifolock_doc_asyncio_warning_present=False
arun_lock_across_await_warning_present=False
This confirms the invariant the PR describes is real: another coroutine on the same event-loop thread can acquire the thread-reentrant lock while the holder is suspended. It also establishes the baseline documentation gap: neither warning was present on main.
Step 2 — Apply the PR's changes:
Checked out docs/fifolock-await-invariant and reset to the PR commit 830ed2909e2cad7f4fa5e63efb040bf0e8f8b12f.
Step 3 — Re-run with the PR in place:
Ran the same SDK import/introspection/lock-behavior script:
git checkout docs/fifolock-await-invariant >/tmp/qa_checkout_pr.log 2>&1 && git reset --hard 830ed2909e2cad7f4fa5e63efb040bf0e8f8b12f >/tmp/qa_reset_pr.log 2>&1 && uv run python - <<'PY'
import asyncio, inspect, subprocess
from openhands.sdk.conversation.fifo_lock import FIFOLock
from openhands.sdk.conversation.impl.local_conversation import LocalConversation
# same script as baseline
PYOutput excerpt:
commit=830ed290
import_ok=FIFOLock,LocalConversation
runtime_reentry_events=holder acquired | contender acquired while holder awaited | holder releasing
fifolock_doc_asyncio_warning_present=True
arun_lock_across_await_warning_present=True
This shows the PR achieves its documentation goal: the FIFOLock asyncio/task warning and the arun() lock-across-await warning are now visible through the actual SDK source/doc objects. The identical runtime_reentry_events output also supports the PR's claim of no behavior change.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
What
Q4 of #3341, documentation only.
arun()holds the state lock (FIFOLock) acrossawait self.agent.astep(...). Two comments make that implicit, load-bearing invariant explicit:awaitsite inarun()— why holding the lock across the await is safe, and the rule it depends on;FIFOLock— that reentrancy is keyed by OS thread id, not by asyncio task.Why
The lock-across-
awaitis safe today only because every other state mutator is dispatched viarun_in_executoronto a worker thread. SinceFIFOLockis thread-reentrant, a state-mutating coroutine awaited on the event-loop thread would silently re-enter the lock — no block, no error — and corrupt history. Nothing in the code recorded that rule; this turns it into a greppable warning so a future async mutator (or the Q5 refactor) doesn't trip over it.No behavior change.
Verification
ruff format --checkandruff checkcleanAgent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:830ed29-pythonRun
All tags pushed for this build
About Multi-Architecture Support
830ed29-python) is a multi-arch manifest supporting both amd64 and arm64830ed29-python-amd64) are also available if needed