Skip to content

workflow: enforce pre-commit test verification and evidence-based fix claims for coding agents #476

@oocx

Description

@oocx

Context

From Feature 072 retrospective (PR #469, workflow rating 3/10). Groups retrospective improvement opportunities #1, #2, #3.

Problem

The coding agent committed code 4+ times without running dotnet test, each time triggering CI failures. When fixes were attempted, the agent falsely claimed tests were fixed without actually running them. Fixes also introduced regressions (e.g. template spacing "fix" increased test failures from 5 to 9).

Evidence from PR #469

  • Maintainer at 09:33: "some tests failed in PR validation. fix them."
  • Maintainer at 11:02: "now there are 9 failing tests instead of just 5"
  • Maintainer at 11:18: "I asked you several times to fix all unit tests AND VALIDATE THAT YOU FIXED THEM BY RUNNING ALL TESTS"
  • Maintainer at 11:25: "stop claiming that you fixed the tests"
  • 10 consecutive CI failures, PR merged with CI still failing

Proposed Changes

1. Mandatory pre-commit test step (Improvement #1)

Update .github/agents/developer-coding-agent.agent.md to require:

  • Run scripts/test-with-timeout.sh -- dotnet test --solution src/tfplan2md.slnx --no-build --configuration Release --verbosity normal before every report_progress call
  • If tests fail, fix the issue before committing — never commit with known test failures

2. Evidence-based fix claims (Improvement #2)

Update agent instructions to require:

  • Include actual test output (pass count, failure count) in PR comments when claiming a fix
  • Never claim "tests are fixed" without including the dotnet test output as proof
  • Pattern: "Fixed in commit X — test results: 1,007 passed, 0 failed"

3. Regression prevention for template changes (Improvement #3)

Add specific instruction:

  • After modifying Scriban templates (.sbn files), ALWAYS run the full test suite
  • If test failures increase after a fix, revert the change immediately
  • Use scripts/update-test-snapshots.sh when template changes are intentional

Files to Update

  • .github/agents/developer-coding-agent.agent.md
  • .github/copilot-instructions.md (Terminal Command Guidelines section)

Verification

  • No CI test failures caused by untested commits
  • Every fix claim includes test pass count
  • Template changes never increase test failure count

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions