feat(autofix): aider harness behind AUTOFIX_HARNESS flag (Phase 2a)#7
feat(autofix): aider harness behind AUTOFIX_HARNESS flag (Phase 2a)#7dnplkndll wants to merge 8 commits into
Conversation
… coding components Each component now resolves AUTOFIX_HARNESS via select_orchestrator instead of constructing AutofixAgent directly. Default behavior unchanged (builtin → AutofixAgent); aider path activates only when AUTOFIX_HARNESS=aider.
Skipped by default; opt in with SEER_AIDER_DOGFOOD=1 to run inside the seer container after rebuilding :lightweight. Confirms aider can clone the repo and answer in --ask mode against Vertex Gemini.
Two issues found during VM live test against ghcr.io/ledoent/seer:7-merge:
* `--ask` is not a flag in aider 0.65.0 — use `--chat-mode ask` instead.
* Add `--no-check-update --no-show-release-notes` so aider doesn't
auto-pip-install upgrades mid-run (the auto-upgrade clobbered seer's
pinned tokenizers when it ran against the system Python).
* Install `google-auth` and `google-cloud-aiplatform` into
/opt/aider-venv so litellm's vertex_ai/* model routing can
authenticate via the existing seer-vertex ADC. Without these,
aider hits `ModuleNotFoundError: No module named 'google'` from
litellm.llms.vertex_ai_and_google_ai_studio.
Verified end-to-end on sentry-seer-1: clone + ask-mode aider call
against vertex_ai/gemini-2.5-flash completes in ~9s, returns real
Gemini output, costs ~$0.0008 per invocation.
Live verification on sentry-seer-1 ✅Image pulled and tested against real Vertex Gemini:
Two issues found and fixed in 5a36248
Note on diagnostic mode (deferred to Phase 2b)The "I need the file" response is correct aider behavior — in This PR delivers Phase 2a — orchestrator wiring + sandboxed invocation + Vertex auth — and is ready to merge behind the default |
Bugbot review found the standalone smoke test bypassed real component
integration. The autofix components in root_cause / solution / coding
touch the agent in five ways that AiderHarness was missing or breaking:
* `agent.usage` summed into per-step totals via `cur.usage += agent.usage`
→ would AttributeError. Now zero-inits `Usage()`.
* `agent.add_user_message(prompt)` called *before* `agent.run(...)` in
coding + solution → no method existed. Added with same signature as
`AutofixAgent.add_user_message` and a `_resolve_prompt` fallback
that picks up the last user message when `run_config.prompt` is empty.
* `agent.tools = []` set mid-flow in root_cause before the reasoning
pass → `_unused_tools` private name made it un-settable. Now a public
`self.tools` attribute.
* `agent.memory` fed to `llm_client.generate_structured` as the
formatter LLM's context → empty memory means the formatter has nothing
to extract from. Now synthesizes an assistant `Message` with aider's
stdout after each run.
* Constructor accepts a pre-existing memory list (CodingComponent's
`_prefill_initial_memory` seeds expand_document tool calls).
Also drops the broken Pydantic attribute assignment in `_record_diff`:
`BaseStep` has no `extra="allow"` config, so the runtime assignment
would fail in Pydantic v2. Phase 2a logs the diff at INFO; Phase 2b adds
proper `ChangesStep.changes` persistence via FilePatch parsing.
New tests cover each contract method plus regression guards for the
flag fixes that landed in 5a36248 (`--chat-mode ask`, `--no-check-update`).
A new test_component_wiring.py source-checks each of the three components
still routes through `select_orchestrator` and forwards
`AUTOFIX_HARNESS_STRICT`.
Pytest test isolation and module reloads can re-import harness modules,
which would trip the existing duplicate-registration guard and crash at
import time even though the registration is a no-op (same name, same
class). Now register_harness:
* Returns early when name → cls is already in the registry (same cls).
* Still raises ValueError when name → *different* cls, which is the
real bug it was guarding against (two modules silently clobbering
each other's registration).
This is the only path the duplicate-error guard was protecting; the
existing test_duplicate_registration_raises still holds because that
test registers two distinct classes.
Self-review pass: bugbot findings + fixesRe-reading the branch as if it were someone else's PR surfaced real integration bugs the standalone smoke test missed. Addressed in 2 follow-up commits: 🔴 Critical (would fail at first real autofix run)
🟡 Medium
Docstring inaccuracy — claimed Redundant 🟢 Low (left as-is, documented)
Test coverage added8 new tests ( New All tests pass in the seer container: No UI screenshotsThis PR is server-side only (Celery worker code path + a feature flag). No frontend changes; no Sentry UI changes. The "UI surface" affected is the autofix step log in the worker, which is verified via the inline aider stdout shown in the prior live-verification comment above. PR description refreshed to reflect the 8-commit shape + Phase 2b backlog. |
What changed
Adds aider as an alternate autofix orchestrator behind the new
AUTOFIX_HARNESSfeature flag. Default remainsbuiltin(existingAutofixAgent), so live behavior is unchanged until an operator opts in.This is Phase 2a of the coding-harness rollout — see
docs/coding-harnesses.md(merged in #6) for the comparison matrix and constraint summary that drove the design.Scope (8 commits, in order)
feat(harness): scaffold harness package + AUTOFIX_HARNESS flag(478f0c2) — addsseer.automation.harnesspackage withselect_orchestrator()registry + factory and theAUTOFIX_HARNESS/AUTOFIX_HARNESS_STRICT/AUTOFIX_HARNESS_MODELconfig flags.feat(harness): AiderHarness subprocess wrapper + diff adapter(53ede97) — newharness/aider.pywith sandboxed subprocess invocation, plus a minimaldiff_adapter.py.build(docker): bake aider-chat into Lightweight.Dockerfile(43a5bbf) —aider-chat==0.65.0installed into an isolated/opt/aider-venvso its sprawling dep tree (litellm, prompt-toolkit, gitpython) can't fight seer's pinned versions.feat(autofix): wire orchestrator selection in components(3a8b834) — root_cause / solution / coding components each resolve the orchestrator viaselect_orchestrator(config)instead of constructingAutofixAgentdirectly.test(autofix): gated dogfood integration test(d305c73) — opt-in (SEER_AIDER_DOGFOOD=1) live test runnable inside the container after deploy.fix(harness): aider 0.65.0 flag set + vertex deps in venv(5a36248) — fallout from live VM verification:--chat-mode ask(not--ask),--no-check-updateto stop aider clobbering seer'stokenizersvia mid-session pip-upgrade,google-auth+google-cloud-aiplatformin/opt/aider-venvfor litellm Vertex routing.fix(harness): align AiderHarness with AutofixAgent contract surface(01c9625) — bugbot review caught that the standalone smoke test bypassed real component integration. Addsusage(socur.usage += agent.usageworks),add_user_message(coding/solution push prompt this way),_resolve_promptfallback to last user message,toolsas public attribute (settable to[]mid-flow), constructor-seeded memory, and synthesized assistantMessageafter each run so the formatter LLMs have aider's output to extract from. Drops the broken Pydantic attribute assignment in_record_diff(BaseStep has noextra="allow"). New tests cover each contract method + regression guards for the flag fixes.refactor(harness): idempotent re-registration(11b3bb8) — samename → clsis a no-op, only different class under same name still raises. Survives pytest module reloads.Design highlights
/tmp/aider-<run_id>/with a shallowgit clone, cleaned up in afinally:block. Walltime capped at 600s.ledoent/seeronfeature/explorer-endpointssince the benchmark issues are all seer-side. Phase 2b switches to Sentry code-mapping RPC lookup.root_cause_analysisandsolutionsteps run aider in--chat-mode ask(no commits).codingstep gets--auto-commits; the resultinggit diff HEAD~1 HEADis logged at INFO (UI persistence deferred to Phase 2b — see below).vertex_ai/gemini-2.5-flash; uses the existing seer-vertex ADC at/etc/sentry-extra/seer-vertex-key.json. No new env vars.AUTOFIX_HARNESS_STRICT=TrueraisesHarnessNotAvailableErroron unknown harness; default false falls back to builtin.Tests
test_aider.py— 19 unit tests patchingsubprocess.run+tempfile+shutil. Covers ask vs coding mode, diff capture, clone failure, clone timeout, aider timeout, aider non-zero, missing binary, empty prompt,should_continue, registry lookup, AutofixAgent contract surface (usage,add_user_message,tools, memory pass-through, assistant-message synthesis), regression guards for--chat-mode askand--no-check-update.test_select_orchestrator.py— 6 tests covering registry, fallback, strict mode, duplicate-class registration error, idempotent same-class re-registration.test_diff_adapter.py— golden tests for the file-count + path-extraction helpers.test_component_wiring.py— source-level checks that all three autofix components still route throughselect_orchestratorand forwardAUTOFIX_HARNESS_STRICT.test_aider_dogfood.py— gated live test (SEER_AIDER_DOGFOOD=1) for in-container verification.Run in seer container:
35 passed, 2 skipped in 7.14s.Live verification on sentry-seer-1
Clone + Vertex Gemini round-trip in ~9s, $0.0008 per invocation. ADC auth via
/etc/sentry-extra/seer-vertex-key.jsonworks.select_orchestrator("aider", strict=True)resolves correctly.Known Phase 2a gaps (Phase 2b backlog)
ChangesStep.changesneeds a unified-diff →FilePatch+Hunkparser.ledoent/seeronfeature/explorer-endpoints. Phase 2b resolves from Sentry code-mappings.--chat-mode ask, aider doesn't auto-load files from the repo map. For real autofix root-cause/solution, pass relevant files via--read <path>extracted from the autofix request'srelevant_code_files.Usage()is zero-init; aider's token-count output (Tokens: 2.5k sent, 31 received) is currently discarded. Phase 2b parses this for accurate per-step token totals.Rollback
Set / leave
AUTOFIX_HARNESS=builtin(default) — theAutofixAgentpath is untouched. If a wider revert is needed,git revertthe 8 commits in reverse order; no schema or migration changes.