feat(core): verification gate wrapper with quick fixes and escalation#434
feat(core): verification gate wrapper with quick fixes and escalation#434
Conversation
…nd escalation (#415) Brings the VerificationWrapper (used by all external engines) to parity with ReactAgent's comprehensive self-correction: - Quick fix integration: try pattern-based fixes before re-invoking adapter - Fix attempt tracking: FixAttemptTracker prevents repeated failed fixes - Escalation blocker: creates blocker when tracker recommends or retries exhausted - Default max_correction_rounds bumped from 3 to 5 - Shared build_escalation_question() used by both wrapper and ReactAgent - 22 tests covering all new behavior + integration scenario
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
- Record fix_tracker outcome AFTER quick fix attempt (not before), preventing premature escalation from inflated failure counts - Record SUCCESS outcome when quick fix works - Add EscalationDecision type annotation to _create_escalation_blocker - Replace __import__ hack with direct FixOutcome import in test
|
PR 434 review - see next comment for details |
|
Code Review: Verification Gate Wrapper Enhancement This is a solid PR that extends external engines with the same self-correction capabilities ReactAgent already has. Test coverage is substantially improved (11 to 22 tests). A few issues worth addressing before merge: Bug: output field dropped from blocker results Both _create_escalation_blocker and _create_exhaustion_blocker return AgentResult without preserving the adapter last output. The previous failed path had output=result.output and that output is now silently dropped when escalating to a blocker. Both methods would need to receive the last adapter result or output as a parameter to preserve it. Layering concern: core depending on adapters react_agent.py (core layer) now imports from codeframe.core.adapters.verification_wrapper at line 29. The adapters subdirectory is for external agent adapter implementations and having core reach into it inverts the expected dependency direction. build_escalation_question is a pure utility that only depends on FixAttemptTracker, so fix_tracker.py would be a more natural home for it. Minor: off-by-one in escalation timing Escalation is checked at the start of each round against the previous round recorded failures since record_outcome has not run yet for the current round. This means MAX_SAME_ERROR_ATTEMPTS=3 actually triggers escalation on round 4 not round 3. The test in TestEscalationIntegration pre-fills the tracker with 4 failures to compensate. Worth a comment in the loop clarifying the check-before-record ordering. Minor: fix_tracker is a public attribute self.fix_tracker is intentionally public for test inspection. A brief comment noting it is exposed for observability would help future readers. Quick fix failures not individually tracked When find_quick_fix returns a fix but apply_quick_fix returns False, the code falls through to record FixOutcome.FAILED with fix_description=verification_gate. The specific quick fix name is not recorded so the tracker cannot prevent retrying the same failing quick fix in a subsequent round. What is working well
The output field issue is the most impactful item. The layering concern is worth addressing before merge. All other items are minor. |
|
Review summary for feat(core): verification gate wrapper with quick fixes and escalation - see full review in next comment |
|
Full review posted - please see the detailed review comment below. |
Review: feat(core): verification gate wrapper with quick fixes and escalationThis is a solid implementation that closes the parity gap between VerificationWrapper (used by external engines) and ReactAgent internal self-correction loop. The structure is clean, the test expansion is thorough, and the build_escalation_question refactor is a good DRY improvement. A few issues worth addressing before merging. Bug - FixOutcome used via dynamic import hack in test: In the escalation test, FixOutcome is accessed via a dynamic import call instead of a proper top-level import. FixOutcome is already exported from codeframe.core.fix_tracker. Just add it to the top-level import alongside FixAttemptTracker: from codeframe.core.fix_tracker import FixAttemptTracker, FixOutcome. This is the single most important fix - the dynamic import approach is fragile and hard to read. Architecture concern - coupling direction in react_agent.py: react_agent.py now imports from codeframe.core.adapters.verification_wrapper to get build_escalation_question. This creates a dependency from core/react_agent.py to core/adapters/verification_wrapper.py. If verification_wrapper.py ever needs to reference ReactAgent, it would create a circular import. A better home for build_escalation_question would be core/blockers.py or a small core/self_correction.py utility - both ReactAgent and VerificationWrapper would then import from a neutral location rather than one depending on the other. Lost adapter output in blocker results: Both _create_escalation_blocker and _create_exhaustion_blocker return AgentResult without the output field from the most recent adapter run. The old failure path preserved result.output. The adapter last response before escalation is valuable for debugging. Consider passing last_output through to the blocker helpers. Fix tracker records FAILED before quick fix is attempted: In the correction loop, the outcome is recorded as FAILED before the quick fix is tried. If a quick fix then succeeds on the next gate check, the tracker still counted this round as a failure. This is semantically odd and could cause premature escalation on long runs where quick fixes are intermittently successful. Minor - get_error_summary() is a new call on gate results: The old code only called self._format_gate_errors(gate_result). The new code also calls gate_result.get_error_summary(). The or fallback only handles None/empty return, not a missing method. Worth confirming this method exists on all GateResult implementations, or guarding with hasattr. What is working well: Test structure with separate test classes per concern (TestQuickFixIntegration, TestFixTrackerIntegration, TestEscalationIntegration, TestBuildEscalationQuestion) makes failures easy to diagnose. Using blocked instead of failed on exhaustion is better UX - tasks surface for human review rather than silently failing. The _verbose_print helper is consistent with ReactAgent pattern. The fallback error formatting with the or chain between get_error_summary() and _format_gate_errors() is good defensive coding. Max correction rounds at 5 matches ReactAgent and is appropriate for external engine round-trips. Summary: The two items that should definitely be fixed before merge are the dynamic import hack in tests and verifying get_error_summary() compatibility on real GateResult objects. The architecture coupling concern and lost adapter output are worth addressing but are lower severity. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/core/adapters/test_verification_wrapper.py (1)
3-13:⚠️ Potential issue | 🟠 MajorMark this module as v2.
Please add
pytestmark = pytest.mark.v2below the import block. Without it, these new verification-wrapper tests can be excluded from v2-selective runs.As per coding guidelines, "New v2 Python tests must be marked with
@pytest.mark.v2decorator orpytestmark = pytest.mark.v2".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/core/adapters/test_verification_wrapper.py` around lines 3 - 13, Add the module-level pytest marker by inserting "pytestmark = pytest.mark.v2" immediately below the import block in this test module (tests/core/adapters/test_verification_wrapper.py); this ensures the new verification-wrapper tests (which exercise VerificationWrapper and build_escalation_question and related fixtures) are included in v2-selective runs. Place the single line after the last import so the file-level marker applies to all tests in the module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@codeframe/core/adapters/verification_wrapper.py`:
- Around line 216-223: The wrapper currently creates escalation blockers itself
via the blockers.create call inside the try/except blocks (the two occurrences
that catch Exception and log "Failed to create escalation blocker"); remove
those local blockers.create calls (and their surrounding try/except) so the
runtime remains the single owner of blocker creation when a wrapped adapter
returns status="blocked" with blocker_question, leaving the wrapper to only
return the status/blocker_question to be handled by the runtime.
- Around line 147-159: The quick-fix, escalation-to-blocker, and
retries-exhausted branches in VerificationWrapper currently only print/return
and must emit events via the core/events.py API; update the branches around
_try_quick_fix, the call to _create_escalation_blocker, and the
retries-exhausted path to emit explicit events (e.g.,
VerificationQuickFixApplied, VerificationEscalatedToBlocker,
VerificationRetriesExhausted) including relevant metadata (task_id,
error_summary, escalation, round_num, outcome) and keep the existing calls to
fix_tracker.record_outcome and _verbose_print; ensure events are imported from
core/events.py and emitted before returning or continuing so the runtime bridge
can observe the state transitions.
---
Outside diff comments:
In `@tests/core/adapters/test_verification_wrapper.py`:
- Around line 3-13: Add the module-level pytest marker by inserting "pytestmark
= pytest.mark.v2" immediately below the import block in this test module
(tests/core/adapters/test_verification_wrapper.py); this ensures the new
verification-wrapper tests (which exercise VerificationWrapper and
build_escalation_question and related fixtures) are included in v2-selective
runs. Place the single line after the last import so the file-level marker
applies to all tests in the module.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e5f256cc-7f90-4811-91a9-c13575f48bfb
📒 Files selected for processing (4)
codeframe/core/adapters/verification_wrapper.pycodeframe/core/react_agent.pycodeframe/core/runtime.pytests/core/adapters/test_verification_wrapper.py
| return self._create_escalation_blocker( | ||
| task_id, error_summary, escalation, | ||
| ) | ||
|
|
||
| # 3. Try quick fix first (no adapter re-invocation needed) | ||
| if self._try_quick_fix(error_summary): | ||
| self.fix_tracker.record_outcome( | ||
| error_summary, "verification_gate", FixOutcome.SUCCESS, | ||
| ) | ||
| self._verbose_print( | ||
| f"[VerificationWrapper] Quick fix applied (round {round_num + 1})" | ||
| ) | ||
| continue # Re-run gates without re-invoking adapter |
There was a problem hiding this comment.
Emit events for the new quick-fix and blocker transitions.
These branches only print() or return an AgentResult; they never emit a dedicated event for quick-fix success, escalation-to-blocker, or retries-exhausted. That means the runtime bridge has nothing to forward for the most important new verification state changes.
As per coding guidelines, "All core modules must emit events for state transitions via core/events.py for audit and observability".
Also applies to: 190-192, 208-259
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/core/adapters/verification_wrapper.py` around lines 147 - 159, The
quick-fix, escalation-to-blocker, and retries-exhausted branches in
VerificationWrapper currently only print/return and must emit events via the
core/events.py API; update the branches around _try_quick_fix, the call to
_create_escalation_blocker, and the retries-exhausted path to emit explicit
events (e.g., VerificationQuickFixApplied, VerificationEscalatedToBlocker,
VerificationRetriesExhausted) including relevant metadata (task_id,
error_summary, escalation, round_num, outcome) and keep the existing calls to
fix_tracker.record_outcome and _verbose_print; ensure events are imported from
core/events.py and emitted before returning or continuing so the runtime bridge
can observe the state transitions.
…utput - Move build_escalation_question to core/fix_tracker.py (neutral location), fixing the layering concern where core/react_agent imported from adapters - Preserve adapter last output in blocker results (was silently dropped) - Consolidate test imports
Follow-up ReviewGood progress since the last review — several issues from earlier rounds have been addressed:
Three items remain: 1. Missing pytestmark = pytest.mark.v2 in test file (actionable)
2. Failed quick-fix attempts not recorded with specific description (minor) From the previous review — still present. When 3.
Overall the implementation is solid and earlier structural concerns are resolved. Items 1 and 2 are worth fixing before merge; item 3 is a quality improvement. |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/core/adapters/test_verification_wrapper.py (1)
1-10:⚠️ Potential issue | 🟡 MinorMark this test module as
v2.These new verification-wrapper tests are missing the required
pytest.mark.v2marker, so marker-based CI runs can skip them.As per coding guidelines, "Test files must use the `@pytest.mark.v2` decorator or module-level `pytestmark = pytest.mark.v2` for v2 functionality tests".Suggested fix
import pytest from pathlib import Path from unittest.mock import MagicMock, patch +pytestmark = pytest.mark.v2 + from codeframe.core.adapters.verification_wrapper import VerificationWrapper from codeframe.core.adapters.agent_adapter import AgentAdapter, AgentEvent, AgentResult from codeframe.core.fix_tracker import FixAttemptTracker, FixOutcome, build_escalation_question🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/core/adapters/test_verification_wrapper.py` around lines 1 - 10, The test module for VerificationWrapper is missing the v2 pytest marker; add the marker by either setting a module-level pytestmark = pytest.mark.v2 or decorating the tests/class with `@pytest.mark.v2` so CI will include them as v2 tests—update the test module (test_verification_wrapper.py) to import pytest if needed and apply pytest.mark.v2 via pytestmark or decorators around the tests that reference VerificationWrapper, AgentAdapter/AgentEvent/AgentResult, FixAttemptTracker, FixOutcome and GateStatus.
♻️ Duplicate comments (2)
codeframe/core/adapters/verification_wrapper.py (2)
127-140:⚠️ Potential issue | 🟠 MajorEmit explicit events for quick-fix, escalation, and exhaustion transitions.
The new branches change verification state in important ways, but they only
continueor return anAgentResult. Without dedicated events here, observers cannot distinguish “quick fix applied”, “escalated to blocker”, and “retries exhausted”.As per coding guidelines, "All core modules must emit events for state transitions via
core/events.pyfor audit and observability".Also applies to: 171-175, 203-255
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@codeframe/core/adapters/verification_wrapper.py` around lines 127 - 140, The verification branches currently return or continue without emitting transition events, so instrument the VerificationWrapper to publish explicit events (using the event API in core/events.py) whenever a quick-fix is applied, an escalation blocker is created, or retries are exhausted: call the event emitter in the code paths that call _try_quick_fix (after fix_tracker.record_outcome and before continue) to emit a "verification.quick_fix" event with task_id, error_summary and round_num; in the branch that returns the result from _create_escalation_blocker emit a "verification.escalation_blocker" event including task_id, escalation and last_output; and in the final exhaustion path (where retries are exceeded) emit a "verification.exhausted" event with task_id, error_summary and total rounds—use the existing symbols (_try_quick_fix, _create_escalation_blocker, fix_tracker.record_outcome, AgentResult/return path) to locate insertion points and follow the event payload conventions in core/events.py.
203-240:⚠️ Potential issue | 🟠 MajorLet the runtime stay the single owner of blocker persistence.
Both blocker helpers call
blockers.create(...)and then also returnstatus="blocked"with ablocker_question. If the runtime already persists blockers for blocked adapter results, this creates duplicate blocker records and the locally created blocker ID is discarded anyway.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@codeframe/core/adapters/verification_wrapper.py` around lines 203 - 240, The code currently calls blockers.create(...) inside the blocker helper methods and also returns an AgentResult(status="blocked", blocker_question=...), which duplicates persistence if the runtime already persists blockers; remove the local persistence so the runtime remains the single owner. Concretely, delete the try/except wrappers that call blockers.create in _create_exhaustion_blocker and in the corresponding escalation helper (the block that calls blockers.create then returns AgentResult with status="blocked" and blocker_question) and simply return the AgentResult with blocker_question and error info; keep the question construction and logging behavior but do not call blockers.create or swallow its exceptions here.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@codeframe/core/adapters/verification_wrapper.py`:
- Around line 59-60: Reset the instance's FixAttemptTracker at the start of each
run() to avoid bleed-over from prior tasks: inside the VerificationWrapper.run
method (or whichever class defines run), reassign self.fix_tracker =
FixAttemptTracker() at the start of the method (and similarly before any
per-task loop that may call escalation logic) so each invocation begins with a
fresh tracker and previous failures won't trigger immediate escalation or
exhaustion; update references to self.fix_tracker accordingly.
- Around line 118-145: The quick-fix path in verification_wrapper.py currently
marks a fix as FixOutcome.SUCCESS as soon as _try_quick_fix(error_summary)
returns true; instead, leave only the initial attempt recorded (use
fix_tracker.record_attempt) and defer calling fix_tracker.record_outcome(...,
FixOutcome.SUCCESS) until after the subsequent gate re-run actually succeeds; if
the re-run still fails, record FixOutcome.FAILED for that same error_summary and
verification_gate so the fix_tracker can prevent repeating bad fixes or
escalate; apply the same change to the identical logic around the other
quick-fix block referenced (the block at lines ~177-189) so both quick-fix
branches only record SUCCESS after a verified successful re-run.
---
Outside diff comments:
In `@tests/core/adapters/test_verification_wrapper.py`:
- Around line 1-10: The test module for VerificationWrapper is missing the v2
pytest marker; add the marker by either setting a module-level pytestmark =
pytest.mark.v2 or decorating the tests/class with `@pytest.mark.v2` so CI will
include them as v2 tests—update the test module (test_verification_wrapper.py)
to import pytest if needed and apply pytest.mark.v2 via pytestmark or decorators
around the tests that reference VerificationWrapper,
AgentAdapter/AgentEvent/AgentResult, FixAttemptTracker, FixOutcome and
GateStatus.
---
Duplicate comments:
In `@codeframe/core/adapters/verification_wrapper.py`:
- Around line 127-140: The verification branches currently return or continue
without emitting transition events, so instrument the VerificationWrapper to
publish explicit events (using the event API in core/events.py) whenever a
quick-fix is applied, an escalation blocker is created, or retries are
exhausted: call the event emitter in the code paths that call _try_quick_fix
(after fix_tracker.record_outcome and before continue) to emit a
"verification.quick_fix" event with task_id, error_summary and round_num; in the
branch that returns the result from _create_escalation_blocker emit a
"verification.escalation_blocker" event including task_id, escalation and
last_output; and in the final exhaustion path (where retries are exceeded) emit
a "verification.exhausted" event with task_id, error_summary and total
rounds—use the existing symbols (_try_quick_fix, _create_escalation_blocker,
fix_tracker.record_outcome, AgentResult/return path) to locate insertion points
and follow the event payload conventions in core/events.py.
- Around line 203-240: The code currently calls blockers.create(...) inside the
blocker helper methods and also returns an AgentResult(status="blocked",
blocker_question=...), which duplicates persistence if the runtime already
persists blockers; remove the local persistence so the runtime remains the
single owner. Concretely, delete the try/except wrappers that call
blockers.create in _create_exhaustion_blocker and in the corresponding
escalation helper (the block that calls blockers.create then returns AgentResult
with status="blocked" and blocker_question) and simply return the AgentResult
with blocker_question and error info; keep the question construction and logging
behavior but do not call blockers.create or swallow its exceptions here.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2ff383c6-4649-413d-9437-a4033f1efdf9
📒 Files selected for processing (4)
codeframe/core/adapters/verification_wrapper.pycodeframe/core/fix_tracker.pycodeframe/core/react_agent.pytests/core/adapters/test_verification_wrapper.py
| self.fix_tracker = FixAttemptTracker() | ||
|
|
There was a problem hiding this comment.
Reset the fix tracker at the start of each run().
FixAttemptTracker is stored on the wrapper instance, so failures from a previous task can bleed into the next run() and trigger an immediate escalation or early exhaustion on a fresh task.
Suggested fix
def run(
self,
task_id: str,
prompt: str,
workspace_path: Path,
on_event: Callable[[AgentEvent], None] | None = None,
) -> AgentResult:
"""Run the inner adapter, then verify with gates. Self-correct on failure."""
+ self.fix_tracker.reset()
# Initial run
result = self._inner.run(task_id, prompt, workspace_path, on_event)Also applies to: 65-75
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/core/adapters/verification_wrapper.py` around lines 59 - 60, Reset
the instance's FixAttemptTracker at the start of each run() to avoid bleed-over
from prior tasks: inside the VerificationWrapper.run method (or whichever class
defines run), reassign self.fix_tracker = FixAttemptTracker() at the start of
the method (and similarly before any per-task loop that may call escalation
logic) so each invocation begins with a fresh tracker and previous failures
won't trigger immediate escalation or exhaustion; update references to
self.fix_tracker accordingly.
| # 1. Record the attempt (outcome deferred until we know if fix works) | ||
| self.fix_tracker.record_attempt(error_summary, "verification_gate") | ||
|
|
||
| # 2. Check escalation based on prior history | ||
| escalation = self.fix_tracker.should_escalate(error_summary) | ||
| if escalation.should_escalate: | ||
| self.fix_tracker.record_outcome( | ||
| error_summary, "verification_gate", FixOutcome.FAILED, | ||
| ) | ||
| return self._create_escalation_blocker( | ||
| task_id, error_summary, escalation, | ||
| last_output=result.output, | ||
| ) | ||
|
|
||
| # 3. Try quick fix first (no adapter re-invocation needed) | ||
| if self._try_quick_fix(error_summary): | ||
| self.fix_tracker.record_outcome( | ||
| error_summary, "verification_gate", FixOutcome.SUCCESS, | ||
| ) | ||
| self._verbose_print( | ||
| f"[VerificationWrapper] Quick fix applied (round {round_num + 1})" | ||
| ) | ||
| continue # Re-run gates without re-invoking adapter | ||
|
|
||
| # 4. No quick fix — record failure and re-invoke adapter with error context | ||
| self.fix_tracker.record_outcome( | ||
| error_summary, "verification_gate", FixOutcome.FAILED, | ||
| ) |
There was a problem hiding this comment.
Don't mark a quick-fix path successful before the rerun actually passes.
Here a quick fix is counted as SUCCESS as soon as it applies, even though the next gate run may still fail with the same error. That leaves no failed record for the attempted quick fix, so the tracker can't reliably prevent retrying the same bad fix or escalate on repeated failures.
Based on learnings, self-correction attempts must use core/fix_tracker.py to prevent repeating failed fixes.
Also applies to: 177-189
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@codeframe/core/adapters/verification_wrapper.py` around lines 118 - 145, The
quick-fix path in verification_wrapper.py currently marks a fix as
FixOutcome.SUCCESS as soon as _try_quick_fix(error_summary) returns true;
instead, leave only the initial attempt recorded (use
fix_tracker.record_attempt) and defer calling fix_tracker.record_outcome(...,
FixOutcome.SUCCESS) until after the subsequent gate re-run actually succeeds; if
the re-run still fails, record FixOutcome.FAILED for that same error_summary and
verification_gate so the fix_tracker can prevent repeating bad fixes or
escalate; apply the same change to the identical logic around the other
quick-fix block referenced (the block at lines ~177-189) so both quick-fix
branches only record SUCCESS after a verified successful re-run.
Summary
Closes #415
Enhances the
VerificationWrapper(used by all external engines: claude-code, opencode, codex) to match ReactAgent's comprehensive self-correction capabilities:FixAttemptTrackerprevents repeating the same failed fix and detects escalation patternsblockedinstead offailed)build_escalation_question()used by bothVerificationWrapperandReactAgentfor consistent blocker messagesmax_correction_roundsfrom 3 → 5, matching ReactAgentFiles Changed
codeframe/core/adapters/verification_wrapper.pycodeframe/core/react_agent.py_create_escalation_blockerto use sharedbuild_escalation_questioncodeframe/core/runtime.pymax_correction_rounds=5for external enginestests/core/adapters/test_verification_wrapper.pyTest plan
blockedstatusbuild_escalation_questionshared helper (3 tests)Summary by CodeRabbit