add test-after-edit eval condition by orban · Pull Request #36 · orban/intent-layer

orban · 2026-04-15T20:49:21Z

Summary

Adds a fifth experimental condition test_after_edit to the eval harness. The condition injects a preamble that forces the agent to run relevant tests after every source-file edit before making further changes, isolating the effect of tight test-feedback loops from context-injection effects (none/flat_llm/intent_layer).

Wired through:

Condition enum (task_runner.py)
preamble map in TaskRunner._build_prompt
YAML-tasks condition list in run() (cli.py)
reporter display label (reporter.py)
new preamble string (prompt_builder.py)

Test plan

pytest eval-harness/tests/test_task_runner.py::test_condition_enum
Full run against one repo to confirm reports render the new condition label correctly

Notes

Dropped an unrelated bash -lc → sh -c revert in docker_runner.py that had slipped into the same working tree — that change would have reintroduced exit 127 for repos using uv (see commit 9ab8632).
Replaces add test-after-edit eval condition #35, which picked up 17 unrelated commits from nightshift/bus-factor.

adds a fifth experimental condition that constrains the agent to run tests after every source-file edit before making further changes. isolates the effect of tight test-driven feedback loops from context-injection effects (none/flat_llm/intent_layer). wired through Condition enum, prompt builder, reporter display labels, and the run() CLI YAML_CONDITIONS list. test_condition_enum updated to cover the new variant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add test-after-edit eval condition#36

add test-after-edit eval condition#36
orban wants to merge 1 commit into
mainfrom
eval/test-after-edit-v2

orban commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

orban commented Apr 15, 2026

Summary

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant