feat(autofix): aider harness behind AUTOFIX_HARNESS flag (Phase 2a) by dnplkndll · Pull Request #7 · ledoent/seer

dnplkndll · 2026-05-20T01:24:18Z

What changed

Adds aider as an alternate autofix orchestrator behind the new AUTOFIX_HARNESS feature flag. Default remains builtin (existing AutofixAgent), so live behavior is unchanged until an operator opts in.

This is Phase 2a of the coding-harness rollout — see docs/coding-harnesses.md (merged in #6) for the comparison matrix and constraint summary that drove the design.

Scope (8 commits, in order)

feat(harness): scaffold harness package + AUTOFIX_HARNESS flag (478f0c2) — adds seer.automation.harness package with select_orchestrator() registry + factory and the AUTOFIX_HARNESS / AUTOFIX_HARNESS_STRICT / AUTOFIX_HARNESS_MODEL config flags.
feat(harness): AiderHarness subprocess wrapper + diff adapter (53ede97) — new harness/aider.py with sandboxed subprocess invocation, plus a minimal diff_adapter.py.
build(docker): bake aider-chat into Lightweight.Dockerfile (43a5bbf) — aider-chat==0.65.0 installed into an isolated /opt/aider-venv so its sprawling dep tree (litellm, prompt-toolkit, gitpython) can't fight seer's pinned versions.
feat(autofix): wire orchestrator selection in components (3a8b834) — root_cause / solution / coding components each resolve the orchestrator via select_orchestrator(config) instead of constructing AutofixAgent directly.
test(autofix): gated dogfood integration test (d305c73) — opt-in (SEER_AIDER_DOGFOOD=1) live test runnable inside the container after deploy.
fix(harness): aider 0.65.0 flag set + vertex deps in venv (5a36248) — fallout from live VM verification: --chat-mode ask (not --ask), --no-check-update to stop aider clobbering seer's tokenizers via mid-session pip-upgrade, google-auth + google-cloud-aiplatform in /opt/aider-venv for litellm Vertex routing.
fix(harness): align AiderHarness with AutofixAgent contract surface (01c9625) — bugbot review caught that the standalone smoke test bypassed real component integration. Adds usage (so cur.usage += agent.usage works), add_user_message (coding/solution push prompt this way), _resolve_prompt fallback to last user message, tools as public attribute (settable to [] mid-flow), constructor-seeded memory, and synthesized assistant Message after each run so the formatter LLMs have aider's output to extract from. Drops the broken Pydantic attribute assignment in _record_diff (BaseStep has no extra="allow"). New tests cover each contract method + regression guards for the flag fixes.
refactor(harness): idempotent re-registration (11b3bb8) — same name → cls is a no-op, only different class under same name still raises. Survives pytest module reloads.

Design highlights

Sandbox: each invocation runs in /tmp/aider-<run_id>/ with a shallow git clone, cleaned up in a finally: block. Walltime capped at 600s.
Hardcoded repo (Phase 2a): ledoent/seer on feature/explorer-endpoints since the benchmark issues are all seer-side. Phase 2b switches to Sentry code-mapping RPC lookup.
Mode selection: root_cause_analysis and solution steps run aider in --chat-mode ask (no commits). coding step gets --auto-commits; the resulting git diff HEAD~1 HEAD is logged at INFO (UI persistence deferred to Phase 2b — see below).
Vertex Gemini only: aider model defaults to vertex_ai/gemini-2.5-flash; uses the existing seer-vertex ADC at /etc/sentry-extra/seer-vertex-key.json. No new env vars.
Strict mode: AUTOFIX_HARNESS_STRICT=True raises HarnessNotAvailableError on unknown harness; default false falls back to builtin.

Tests

test_aider.py — 19 unit tests patching subprocess.run + tempfile + shutil. Covers ask vs coding mode, diff capture, clone failure, clone timeout, aider timeout, aider non-zero, missing binary, empty prompt, should_continue, registry lookup, AutofixAgent contract surface (usage, add_user_message, tools, memory pass-through, assistant-message synthesis), regression guards for --chat-mode ask and --no-check-update.
test_select_orchestrator.py — 6 tests covering registry, fallback, strict mode, duplicate-class registration error, idempotent same-class re-registration.
test_diff_adapter.py — golden tests for the file-count + path-extraction helpers.
test_component_wiring.py — source-level checks that all three autofix components still route through select_orchestrator and forward AUTOFIX_HARNESS_STRICT.
test_aider_dogfood.py — gated live test (SEER_AIDER_DOGFOOD=1) for in-container verification.

Run in seer container: 35 passed, 2 skipped in 7.14s.

Live verification on sentry-seer-1

Aider v0.65.0
Model: vertex_ai/gemini-2.5-flash with ask edit format
Git repo: .git with 471 files
Repo-map: using 1024 tokens, auto refresh
[... Gemini response ...]
Tokens: 2.5k sent, 31 received. Cost: $0.00082 message, $0.00082 session.

Clone + Vertex Gemini round-trip in ~9s, $0.0008 per invocation. ADC auth via /etc/sentry-extra/seer-vertex-key.json works. select_orchestrator("aider", strict=True) resolves correctly.

Known Phase 2a gaps (Phase 2b backlog)

Diff UI persistence: captured diff is logged at INFO only. Real persistence into ChangesStep.changes needs a unified-diff → FilePatch + Hunk parser.
Dynamic repo resolution: hardcoded to ledoent/seer on feature/explorer-endpoints. Phase 2b resolves from Sentry code-mappings.
Diagnostic context: in --chat-mode ask, aider doesn't auto-load files from the repo map. For real autofix root-cause/solution, pass relevant files via --read <path> extracted from the autofix request's relevant_code_files.
Usage attribution: Usage() is zero-init; aider's token-count output (Tokens: 2.5k sent, 31 received) is currently discarded. Phase 2b parses this for accurate per-step token totals.

Rollback

Set / leave AUTOFIX_HARNESS=builtin (default) — the AutofixAgent path is untouched. If a wider revert is needed, git revert the 8 commits in reverse order; no schema or migration changes.

…venv)

… coding components Each component now resolves AUTOFIX_HARNESS via select_orchestrator instead of constructing AutofixAgent directly. Default behavior unchanged (builtin → AutofixAgent); aider path activates only when AUTOFIX_HARNESS=aider.

Skipped by default; opt in with SEER_AIDER_DOGFOOD=1 to run inside the seer container after rebuilding :lightweight. Confirms aider can clone the repo and answer in --ask mode against Vertex Gemini.

Two issues found during VM live test against ghcr.io/ledoent/seer:7-merge: * `--ask` is not a flag in aider 0.65.0 — use `--chat-mode ask` instead. * Add `--no-check-update --no-show-release-notes` so aider doesn't auto-pip-install upgrades mid-run (the auto-upgrade clobbered seer's pinned tokenizers when it ran against the system Python). * Install `google-auth` and `google-cloud-aiplatform` into /opt/aider-venv so litellm's vertex_ai/* model routing can authenticate via the existing seer-vertex ADC. Without these, aider hits `ModuleNotFoundError: No module named 'google'` from litellm.llms.vertex_ai_and_google_ai_studio. Verified end-to-end on sentry-seer-1: clone + ask-mode aider call against vertex_ai/gemini-2.5-flash completes in ~9s, returns real Gemini output, costs ~$0.0008 per invocation.

dnplkndll · 2026-05-20T13:12:28Z

Live verification on sentry-seer-1 ✅

Image pulled and tested against real Vertex Gemini:

Aider v0.65.0
Model: vertex_ai/gemini-2.5-flash with ask edit format
Git repo: .git with 471 files
Repo-map: using 1024 tokens, auto refresh
Initial repo scan can be slow in larger repos, but only happens once.
I cannot answer that question without seeing the contents of
`src/seer/automation/harness/__init__.py`. Please add it to the chat.

Tokens: 2.5k sent, 31 received. Cost: $0.00082 message, $0.00082 session.

Clone + Gemini round-trip: ~9 seconds
Per-invocation cost: ~$0.0008
ADC auth via /etc/sentry-extra/seer-vertex-key.json works
select_orchestrator("aider", strict=True) resolved correctly under AUTOFIX_HARNESS_STRICT=1

Two issues found and fixed in `5a36248`

--ask is not a real flag in aider 0.65.0 — must use --chat-mode ask.
Aider auto-upgrade clobbered the system Python's tokenizers when it ran pip install --upgrade --upgrade-strategy only-if-needed aider-chat mid-session. Suppressed via --no-check-update --no-show-release-notes.
google-auth was missing from /opt/aider-venv — litellm needs it for vertex_ai/* model routing. Added to the Dockerfile pip install line.

Note on diagnostic mode (deferred to Phase 2b)

The "I need the file" response is correct aider behavior — in --chat-mode ask it doesn't auto-load files from the repo map for diagnostic questions. For real autofix root-cause/solution steps, we'll need to pass relevant files via --read <path> (extracted from the autofix request's relevant_code_files). This refinement belongs in Phase 2b alongside the dynamic-repo-resolution work.

This PR delivers Phase 2a — orchestrator wiring + sandboxed invocation + Vertex auth — and is ready to merge behind the default AUTOFIX_HARNESS=builtin flag.

Bugbot review found the standalone smoke test bypassed real component integration. The autofix components in root_cause / solution / coding touch the agent in five ways that AiderHarness was missing or breaking: * `agent.usage` summed into per-step totals via `cur.usage += agent.usage` → would AttributeError. Now zero-inits `Usage()`. * `agent.add_user_message(prompt)` called *before* `agent.run(...)` in coding + solution → no method existed. Added with same signature as `AutofixAgent.add_user_message` and a `_resolve_prompt` fallback that picks up the last user message when `run_config.prompt` is empty. * `agent.tools = []` set mid-flow in root_cause before the reasoning pass → `_unused_tools` private name made it un-settable. Now a public `self.tools` attribute. * `agent.memory` fed to `llm_client.generate_structured` as the formatter LLM's context → empty memory means the formatter has nothing to extract from. Now synthesizes an assistant `Message` with aider's stdout after each run. * Constructor accepts a pre-existing memory list (CodingComponent's `_prefill_initial_memory` seeds expand_document tool calls). Also drops the broken Pydantic attribute assignment in `_record_diff`: `BaseStep` has no `extra="allow"` config, so the runtime assignment would fail in Pydantic v2. Phase 2a logs the diff at INFO; Phase 2b adds proper `ChangesStep.changes` persistence via FilePatch parsing. New tests cover each contract method plus regression guards for the flag fixes that landed in 5a36248 (`--chat-mode ask`, `--no-check-update`). A new test_component_wiring.py source-checks each of the three components still routes through `select_orchestrator` and forwards `AUTOFIX_HARNESS_STRICT`.

Pytest test isolation and module reloads can re-import harness modules, which would trip the existing duplicate-registration guard and crash at import time even though the registration is a no-op (same name, same class). Now register_harness: * Returns early when name → cls is already in the registry (same cls). * Still raises ValueError when name → *different* cls, which is the real bug it was guarding against (two modules silently clobbering each other's registration). This is the only path the duplicate-error guard was protecting; the existing test_duplicate_registration_raises still holds because that test registers two distinct classes.

dnplkndll · 2026-05-20T13:47:40Z

Self-review pass: bugbot findings + fixes

Re-reading the branch as if it were someone else's PR surfaced real integration bugs the standalone smoke test missed. Addressed in 2 follow-up commits:

🔴 Critical (would fail at first real autofix run)

agent.usage missing on AiderHarness — all three components do cur.usage += agent.usage. Would AttributeError on every autofix invocation. → Fixed in 01c9625 (zero-init Usage()).

agent.add_user_message(...) missing — coding/component.py:161 and solution/component.py:152 push their formatted prompts this way before calling run(). Without this method, the prompt was lost and aider got an empty --message. → Fixed: added the method, plus a _resolve_prompt fallback that picks up the last user message when run_config.prompt is empty.

agent.memory was empty for the formatter LLM — root_cause and solution both call llm_client.generate_structured(messages=agent.memory, ...) after agent.run(). With our empty memory list, the formatter had nothing to extract from. → Fixed: synthesize an assistant Message with aider's stdout after each run.

agent.tools = [] set mid-flow in root_cause/component.py:119 — was stored as _unused_tools, can't be reassigned cleanly. → Fixed: public self.tools attribute.

_record_diff assignment to Pydantic BaseStep would fail — BaseStep has no extra="allow" config, so setting an unknown attribute fails in Pydantic v2. Test mocks hid this because they mocked state.update() entirely. → Fixed: log the diff at INFO; Phase 2b adds ChangesStep.changes persistence via FilePatch parsing.

🟡 Medium

register_harness raised on idempotent re-import — pytest module reload would trip the duplicate guard even for the same class. → Fixed in 11b3bb8 (same name → cls is a no-op; only different class under same name still raises).

Docstring inaccuracy — claimed --ask mode; the actual flag set has been --chat-mode ask since 5a36248. → Fixed.

Redundant AIDER_NO_PRETTY=1 env var — already passing --no-pretty flag. → Removed.

🟢 Low (left as-is, documented)

_resolve_model_name reads os.environ directly rather than from the AppConfig instance. The harness only gets the AgentConfig (not AppConfig) at construction time, and Pydantic populates AUTOFIX_HARNESS_MODEL from env anyway, so this is fine. Documented.
Diff capture timeout exceptions propagate instead of returning empty string. Acceptable — 30s for git diff HEAD~1 HEAD on a single-commit shallow clone would indicate something seriously wrong.
Test imports private underscored constants (_AIDER_TIMEOUT_SECONDS). Convention violation but the constants are intentionally importable; tests prefer it over magic numbers.

Test coverage added

8 new tests (test_aider.py now 19 total) covering the contract surface: usage default, tools settable, add_user_message appends, constructor memory seeding (with defensive-copy check), _resolve_prompt fallback, assistant-message synthesis, --chat-mode ask regression, --no-check-update regression.

New test_component_wiring.py source-checks each of the three components still routes through select_orchestrator and forwards AUTOFIX_HARNESS_STRICT — guards against a future refactor silently reverting to direct AutofixAgent(...).

All tests pass in the seer container: 35 passed, 2 skipped in 7.14s.

No UI screenshots

This PR is server-side only (Celery worker code path + a feature flag). No frontend changes; no Sentry UI changes. The "UI surface" affected is the autofix step log in the worker, which is verified via the inline aider stdout shown in the prior live-verification comment above.

PR description refreshed to reflect the 8-commit shape + Phase 2b backlog.

dnplkndll added 6 commits May 19, 2026 17:53

feat(harness): scaffold harness package + AUTOFIX_HARNESS flag

478f0c2

feat(harness): AiderHarness subprocess wrapper + diff adapter

53ede97

build(docker): bake aider-chat into Lightweight.Dockerfile (isolated …

43a5bbf

…venv)

test(autofix): gated dogfood integration test for AiderHarness

d305c73

Skipped by default; opt in with SEER_AIDER_DOGFOOD=1 to run inside the seer container after rebuilding :lightweight. Confirms aider can clone the repo and answer in --ask mode against Vertex Gemini.

dnplkndll added 2 commits May 20, 2026 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(autofix): aider harness behind AUTOFIX_HARNESS flag (Phase 2a)#7

feat(autofix): aider harness behind AUTOFIX_HARNESS flag (Phase 2a)#7
dnplkndll wants to merge 8 commits into
feature/explorer-endpointsfrom
feat/aider-harness

dnplkndll commented May 20, 2026 •

edited

Loading

Uh oh!

dnplkndll commented May 20, 2026

Uh oh!

dnplkndll commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dnplkndll commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Scope (8 commits, in order)

Design highlights

Tests

Live verification on sentry-seer-1

Known Phase 2a gaps (Phase 2b backlog)

Rollback

Uh oh!

dnplkndll commented May 20, 2026

Live verification on sentry-seer-1 ✅

Two issues found and fixed in 5a36248

Note on diagnostic mode (deferred to Phase 2b)

Uh oh!

dnplkndll commented May 20, 2026

Self-review pass: bugbot findings + fixes

🔴 Critical (would fail at first real autofix run)

🟡 Medium

🟢 Low (left as-is, documented)

Test coverage added

No UI screenshots

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dnplkndll commented May 20, 2026 •

edited

Loading

Two issues found and fixed in `5a36248`