fix(acp): keep current_model_id honest on resume and unapplied overrides#3407
Conversation
Addresses two correctness gaps in the model-state surfacing from #3347 that were flagged in review but went in unaddressed (review landed ~3 min before merge). Both make `ConversationInfo.current_model_id` (the picker/chip) lie about the model the ACP server is actually running. 1. Stale id on resume (asymmetric persistence gating). `init_state` cleared the persisted `acp_available_models` when the server reported a `models` block with no usable models, but gated `acp_current_model_id` only on the value being non-None — so a true resume reporting `currentModelId: ""` / `availableModels: []` cleared the list while keeping the old id, leaving a chip pointing at a model absent from the (now-empty) picker. The clear branch now mirrors the list gating (`_available_models is not None`). 2. Unapplied override surfaced as current. `current_model_id = self.acp_model or reported_model_id` trusted the override even on paths where it never reached the server: a fresh unknown/custom provider (no `_meta`, no `set_session_model`), or a resume whose `set_session_model` the server rejected (swallowed `ACPRequestError`). `_maybe_set_session_model` and `_reapply_session_model_on_resume` now return whether the override was actually applied, and the resolution prefers `acp_model` only then — otherwise it surfaces the server-reported id (or None). Tests: extend the helper unit tests with the new bool return, strengthen the explicit-empty-models resume test to assert the id clears too, and add integration coverage for unknown-provider / rejected-resume / applied-override resolution. All four bug-path tests fail on the pre-fix code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable overall, but I found one correctness gap that can still let an unapplied ACP override leak into ConversationInfo on resume when load_session omits models. See inline.
Risk: 🟡 Medium (ACP model metadata correctness; low security/safety risk).
This review was generated by an AI agent (OpenHands) on behalf of the user.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26521857355
Coverage Report •
|
||||||||||||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The ACPAgent model picker/chip now stays honest across stale resume state and unapplied/rejected overrides, while the applied override path still works.
Does this PR achieve its stated goal?
Yes. I exercised the SDK as a user would by starting an ACPAgent against a real stdio ACP server shim and comparing base main against this PR. On main, the stale persisted model id survived an explicit empty models resume response, unknown-provider fresh overrides were shown even though no override reached the server, and rejected resume overrides still appeared as current. On the PR branch, those same flows surface the server-reported model or None, and the known-provider override path still reports the applied override.
| Phase | Result |
|---|---|
| Environment Setup | ✅ Existing uv project environment was usable; SDK/ACP runtime paths executed successfully. |
| CI Status | ✅ 32 checks successful, 4 skipped; only this qa-changes run was still in progress when checked. |
| Functional Verification | ✅ Reproduced the base bugs and verified the PR fixes them with real ACP stdio runtime execution. |
Functional Verification
I used temporary scripts outside the repo (/tmp/acp_fake_server.py, /tmp/acp_qa_client.py) to launch ACPAgent with acp_command=[sys.executable] and acp_args=[SERVER], then drove agent.init_state(...) against an actual ACP stdio subprocess. The fake server recorded real ACP calls (initialize, new_session, load_session, set_session_model) and returned controlled models blocks.
Test 1: Resume with explicit empty models clears stale current id
Step 1 — Reproduce on base main:
Ran git checkout main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_client.py stale_resume_empty_models:
base-stale_resume_empty_models.json
scenario: stale_resume_empty_models
agent: some-custom-acp
current_model_id: None
available_model_ids: []
persisted acp_current_model_id: stale-model
persisted acp_available_model_ids: []
server calls: [('load_session', None)]
This confirms the bug: the server reported an explicit empty model state on resume, but persisted acp_current_model_id remained stale-model while the picker list was cleared.
Step 2 — Apply PR changes:
Checked out fix-acp-model-id-accuracy at 5d0d6313.
Step 3 — Re-run with the fix:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_client.py stale_resume_empty_models:
pr-stale_resume_empty_models.json
scenario: stale_resume_empty_models
agent: some-custom-acp
current_model_id: None
available_model_ids: []
persisted acp_current_model_id: <missing>
persisted acp_available_model_ids: []
server calls: [('load_session', None)]
This shows the stale persisted id is now removed in lock-step with the explicit empty model list.
Test 2: Fresh unknown-provider override is not surfaced unless applied
Step 1 — Reproduce on base main:
Ran git checkout main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_client.py unknown_fresh_override:
base-unknown_fresh_override.json
scenario: unknown_fresh_override
agent: some-custom-acp
current_model_id: caller-model
available_model_ids: ['server-model']
persisted acp_current_model_id: caller-model
persisted acp_available_model_ids: ['server-model']
server calls: [('new_session', None)]
This confirms the bug: no set_session_model call occurred for the unknown provider, but the SDK still claimed the caller override was current.
Step 2 — Apply PR changes:
Checked out fix-acp-model-id-accuracy at 5d0d6313.
Step 3 — Re-run with the fix:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_client.py unknown_fresh_override:
pr-unknown_fresh_override.json
scenario: unknown_fresh_override
agent: some-custom-acp
current_model_id: server-model
available_model_ids: ['server-model']
persisted acp_current_model_id: server-model
persisted acp_available_model_ids: ['server-model']
server calls: [('new_session', None)]
This shows the SDK now surfaces the server-reported model when the caller override never reached the server.
Test 3: Rejected resume override falls back to server model
Step 1 — Reproduce on base main:
Ran git checkout main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_client.py resume_rejected_override:
base-resume_rejected_override.json
scenario: resume_rejected_override
agent: codex-acp
current_model_id: caller-model
available_model_ids: ['server-resumed']
persisted acp_current_model_id: caller-model
persisted acp_available_model_ids: ['server-resumed']
server calls: [('load_session', None), ('set_session_model', 'caller-model')]
The fake server rejected set_session_model with method not found; base still reported caller-model, confirming the misleading current-model behavior.
Step 2 — Apply PR changes:
Checked out fix-acp-model-id-accuracy at 5d0d6313.
Step 3 — Re-run with the fix:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_client.py resume_rejected_override:
pr-resume_rejected_override.json
scenario: resume_rejected_override
agent: codex-acp
current_model_id: server-resumed
available_model_ids: ['server-resumed']
persisted acp_current_model_id: server-resumed
persisted acp_available_model_ids: ['server-resumed']
server calls: [('load_session', None), ('set_session_model', 'caller-model')]
This confirms that after a rejected resume reapply, the SDK no longer claims the unapplied override as current.
Test 4: Applied known-provider override still wins
Step 1 — Establish baseline on base main:
Ran git checkout main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_client.py known_fresh_override:
base-known_fresh_override.json
scenario: known_fresh_override
agent: codex-acp
current_model_id: caller-model
server calls: [('new_session', None), ('set_session_model', 'caller-model')]
This establishes the intended happy path: the known provider receives set_session_model, and the override is shown as current.
Step 2 — Apply PR changes:
Checked out fix-acp-model-id-accuracy at 5d0d6313.
Step 3 — Re-run with the fix:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/acp_qa_client.py known_fresh_override:
pr-known_fresh_override.json
scenario: known_fresh_override
agent: codex-acp
current_model_id: caller-model
server calls: [('new_session', None), ('set_session_model', 'caller-model')]
This confirms the fix did not regress the applied override path.
Issues Found
None.
Final verdict: PASS.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
…lock Addresses review feedback on this PR: the first commit still let an unapplied override leak through on one path — a true resume where `load_session` omits the (UNSTABLE) `models` block AND the server rejects `set_session_model`. There `_reapply_session_model_on_resume` returns False, `_current_model_id` is None, but `truly_resumed` is true and `_available_models` is None, so the persisted `acp_current_model_id` (which named the rejected override) was preserved — and even after clearing it, `ConversationInfo`'s final `acp_model` fallback would re-assert the override for a live agent. Two coordinated changes: - ACPAgent now carries an explicit `_model_override_applied` signal out of `_init`. `init_state` clears the stale persisted id when an override was attempted but not applied (`acp_model` set yet `_model_override_applied` False), in addition to the existing replaced-session / reported-empty cases. - `_compose_conversation_info` gates the `acp_model` fallback on the agent not being live+initialized. Once `init_state` has fired, `current_model_id` is authoritative (including None); the `acp_model` fallback is only for cold reads. The persisted-id fallback is unchanged (kept honest by `init_state`), so the legitimate preserve-on-resume case still surfaces the last-known model. Tests: regression for rejected `set_session_model` + absent `models` (id cleared, override-not-applied), plus live-agent-no-fallback and cold-read- fallback coverage for `_compose_conversation_info`. Both bug-path tests fail on the pre-fix code. 255 passed; ruff + pyright clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔍 Review in progress… We are performing the review through OpenHands Cloud Automation. You can log in and view the conversation here. |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the ACP model-id accuracy fixes with a real ACP stdio subprocess and the agent-server ConversationInfo composition path; the PR corrects the stale/unapplied model reporting cases I reproduced on main.
Does this PR achieve its stated goal?
Yes. On main, an unknown/custom ACP provider with acp_model=caller-model reported agent_current_model_id: caller-model even though no set_session_model call was made and the server reported server-model; with this PR the same run reports server-model. On resume, the PR also clears a stale persisted current id when the server reports an empty models block, and falls back to the server-reported model when reapplying an override is rejected.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully and installed the uv environment. |
| CI Status | 🟡 Most checks were green when observed; a few Build & Push arm64 jobs plus automation review/QA checks were still in progress. |
| Functional Verification | ✅ Reproduced the bug on origin/main, then verified the fixed behavior at aa8f24541f42a2c5e51b37628509e24159592fbc. |
Functional Verification
Test 1: ACPAgent model state via real ACP stdio subprocess
Step 1 — Reproduce / establish baseline (without the fix):
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_driver.py.
Relevant output:
[
{
"scenario": "unknown_fresh",
"agent_current_model_id": "caller-model",
"agent_available_models": ["server-model"],
"persisted_current_model_id": "caller-model",
"set_session_model_calls": []
},
{
"scenario": "resume_empty_models",
"agent_current_model_id": null,
"agent_available_models": [],
"persisted_current_model_id": "stale-model",
"persisted_available_models": []
},
{
"scenario": "resume_reject_with_reported_model",
"agent_current_model_id": "caller-model",
"agent_available_models": ["server-resumed"],
"persisted_current_model_id": "caller-model",
"set_session_model_calls": [{"model_id": "caller-model", "session_id": "stored-session"}]
}
]This confirms both reported bugs on the base branch: the fresh unknown provider advertised an override that never reached the server, and resume paths retained stale/incorrect current ids despite the server reporting different/no model state.
Step 2 — Apply the PR's changes:
Checked out aa8f24541f42a2c5e51b37628509e24159592fbc.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_acp_driver.py.
Relevant output:
[
{
"scenario": "unknown_fresh",
"agent_current_model_id": "server-model",
"agent_available_models": ["server-model"],
"persisted_current_model_id": "server-model",
"set_session_model_calls": []
},
{
"scenario": "resume_empty_models",
"agent_current_model_id": null,
"agent_available_models": [],
"persisted_current_model_id": null,
"persisted_available_models": []
},
{
"scenario": "resume_reject_with_reported_model",
"agent_current_model_id": "server-resumed",
"agent_available_models": ["server-resumed"],
"persisted_current_model_id": "server-resumed",
"set_session_model_calls": [{"model_id": "caller-model", "session_id": "stored-session"}]
}
]This shows the ACPAgent now reports the model the live server actually claims, clears the stale persisted id alongside an explicitly empty models block, and stops surfacing an override after rejected reapply.
Test 2: Agent-server ConversationInfo current_model_id fallback
Step 1 — Reproduce / establish baseline (without the fix):
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_conversation_info_driver.py.
Output:
{
"cold_read_current_model_id": "caller-model",
"live_initialized_unapplied_override_current_model_id": "caller-model"
}This shows the server-side ConversationInfo composition still fell back to acp_model for a live initialized ACPAgent whose resolved current model was None.
Step 2 — Apply the PR's changes:
Checked out aa8f24541f42a2c5e51b37628509e24159592fbc.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_conversation_info_driver.py.
Output:
{
"cold_read_current_model_id": "caller-model",
"live_initialized_unapplied_override_current_model_id": null
}This confirms ConversationInfo no longer reasserts an unapplied live override, while preserving the intended cold-read fallback.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review: fix(acp): keep current_model_id honest on resume and unapplied overrides
🟢 Good taste — This is a well-scoped, correctness-critical fix. The two bugs described in the PR are real, the root cause analysis is sharp, and the solution is minimal.
Summary
This PR addresses two distinct paths where ConversationInfo.current_model_id could advertise a model the live ACP session wasn't actually running:
- Stale id on resume —
init_statenow treatsacp_current_model_idandacp_available_modelswith symmetric clearing logic, addingoverride_attempted_not_appliedas a third clear condition. Previously only the list was cleared when the server reported emptymodels:{}, leaving the id to silently survive. - Unapplied override surfaced —
_maybe_set_session_modeland_reapply_session_model_on_resumenow returnbool, threading_model_override_appliedback through_start_acp_serversocurrent_model_idresolution can prefer the override only when it actually reached the server.
Both fixes are defense-in-depth: the init_state persistence cleanup and the conversation_service.py agent_initialized gate together ensure neither the live attr nor the persisted hint can misrepresent the live model.
Observations
bool(session_meta) is safe as an "applied via _meta" signal (acp_agent.py, line 1528). build_session_model_meta returns {} when acp_model is None, when the provider is unknown, or when the provider uses the set_session_model protocol instead of _meta. So bool(session_meta) == True exclusively means "this is a _meta-based provider (e.g. claude) AND a model override was requested AND it was embedded in the session kwargs" — the semantics are tight.
_initialized is reliably set at acp_agent.py:1151 after _start_acp_server completes and before the agent_state update, so getattr(agent, "_initialized", False) in conversation_service.py correctly distinguishes live agents from cold-read deserialized ones.
Asymmetry between the two module functions (pre-existing, not introduced here): _maybe_set_session_model guards unknown providers with an early return, so no protocol call is made on a fresh session. _reapply_session_model_on_resume falls through to the try block for unknown providers, making an opportunistic set_session_model attempt on resume. This is intentional — resumes give unknown servers a chance to accept the protocol — but with both functions now returning bool and feeding _model_override_applied, the difference is more prominent. The return-value semantics are correct for each path; just worth keeping in mind when adding new providers.
override_attempted_not_applied placement (acp_agent.py, lines 1216-1218): computed after _start_acp_server assigns all PrivateAttrs (_model_override_applied, _current_model_id, _available_models), so it reads fresh state. No ordering issue.
Testing
The four new test cases are solid real-path tests (not mock-only assertions):
test_resume_loads_models_clears_stale_current_id— server reports emptycurrentModelId, stale id must be dropped.test_resume_rejected_override_with_absent_models_clears_stale_id— the case that would slip through both thenot truly_resumedand_available_models is not Noneguards without the newoverride_attempted_not_appliedcondition. Validates the exact gap being fixed.test_unknown_provider_surfaces_server_model_not_unapplied_override— guards the fresh-unknown-provider path.test_resume_rejected_override_surfaces_server_model— guards the resume-rejection path.
The non-regression test_known_provider_surfaces_applied_override (passes before and after) is a nice explicit guard that the happy path is not disturbed.
The 238-passed baseline plus the ruff/pyright clean run in the PR description closes the verification loop.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW — targeted internal state-resolution fix, no public API surface changes, purely corrects whatConversationInfo.current_model_idreports. The happy override path (claude_meta, codex/geminiset_session_model) is explicitly tested and unchanged.
VERDICT:
✅ Worth merging — Core logic is sound, test coverage is thorough, no blocking issues.
KEY INSIGHT:
Making both helper functions return bool is the cleanest possible interface: it converts a "did the server actually get this?" question from an implicit assumption into an explicit, testable contract.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Taste Rating: Acceptable — the main direction is good, but I found two correctness holes where current_model_id can still claim an unverified/cleared override.
Risk: 🟡 Medium (ACP model metadata correctness; no security impact).
This review was generated by an AI agent (OpenHands) on behalf of the user.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26522977005
| getattr(agent, "current_model_id", None) | ||
| or agent_state.get("acp_current_model_id") | ||
| or getattr(agent, "acp_model", None) | ||
| or (None if agent_initialized else getattr(agent, "acp_model", None)) |
There was a problem hiding this comment.
🟠 Important: This fallback can undo the new stale-id clearing after an agent-server restart. If ACPAgent.init_state cleared acp_current_model_id because an override was rejected/unapplied, the persisted state still has acp_session_id and the serialized agent still has acp_model; after restart the PrivateAttrs reset and _initialized is false, so this cold path falls back to acp_model and re-advertises the model that was deliberately cleared. Please gate the acp_model fallback to pre-session/no persisted ACP session, or persist an explicit applied/tombstone flag so a cleared current id stays cleared across cold reads.
| # reported in ``models.currentModelId`` (``None`` for older agents | ||
| # that don't surface the field). Trusting ``acp_model`` on paths | ||
| # where it never reached the server would mislabel the chip/picker. | ||
| current_model_id = ( |
There was a problem hiding this comment.
🟠 Important: For the session _meta path (Claude), new_session has already returned models.currentModelId after processing the override, but this still reports self.acp_model whenever metadata was sent. If the server normalizes the requested id, falls back without failing, or reports an opaque id that matches availableModels, the chip becomes less honest than the server response. Prefer reported_model_id when present for the _meta path, falling back to self.acp_model only when the server omitted model state; the protocol-call path can keep using self.acp_model because set_session_model happens after the new_session response.
Summary
Follow-up to #3347, which surfaced
current_model_id/available_modelsfrom ACP session responses. The review on #3347 raised two 🟠 Important correctness issues, but it landed ~3 minutes before the PR merged and went in unaddressed. Both letConversationInfo.current_model_id(the picker/chip) claim a model the ACP server isn't actually running.1. Stale
acp_current_model_idon resume (asymmetric persistence gating)init_stategated the two persisted fields differently:acp_available_models— cleared when the server reported amodelsblock (_available_models is not None), even an empty one.acp_current_model_id— kept whenever the value was non-None, with no "did the server report?" check.So on a true resume where the server reports
models:{currentModelId:"", availableModels:[]}(both map toNone/[]via_extract_session_models), the list was cleared but the stale id survived — rendering a chip that points at a model absent from the now-empty picker.Fix: the clear branch now mirrors the list gating:
2. Unapplied override surfaced as the current model
current_model_id = self.acp_model or reported_model_idtrusted the override even on paths where it never reached the server:_metanorset_session_model(both provider-gated), soacp_modelis set on the agent but the server runs its own default;set_session_modelthe server rejected —_reapply_session_model_on_resumeswallows theACPRequestErrorand logs "the live session may run on the server default", yet the surface still claimed the override.Fix:
_maybe_set_session_modeland_reapply_session_model_on_resumenow return whether the override was actually applied; resolution prefersacp_modelonly then, otherwise surfaces the server-reported id (orNone). The happy override path (claude_meta, codex/geminiset_session_model) is unchanged.Verification
@google/gemini-cli@0.38.0through a realACPAgent: happy-path surfacing is correct (current_model_id, 8available_models); confirmed servers do reportmodelsblocks onnew_session.tests/sdk/agent/test_acp_agent.py: 238 passed.ruff check/ruff formatclean;pyright0 errors.🤖 Generated with Claude Code
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:df86f87-pythonRun
All tags pushed for this build
About Multi-Architecture Support
df86f87-python) is a multi-arch manifest supporting both amd64 and arm64df86f87-python-amd64) are also available if needed