agent-completion-gate

English · 中文

Make AI coding agents prove they are done. The agent can only propose done; a check reads your real output files and only then grants complete.

AI coding agents are goal-driven. Give Codex or Claude Code a goal and it optimizes hard toward the main line — build the page, fix the bug, produce the run. On longer tasks it often skips the user-visible details that were implied, scattered through the thread, or never written as a test.

The goal is not the acceptance criteria.

Example — "add a monthly sales report page" can end with a page that exists and tests that pass, while the CSV export is missing, the chart has one data point, the title still says "Untitled", and the empty state is broken. The agent honestly believes it's done. That's the problem.

agent-completion-gate turns "done" into an external acceptance check. The agent can only propose candidate_complete; a protected gate reads the real artifacts and grants complete only when a human-written acceptance manifest passes. Plain files + one Python script — no service, no account, no lock-in.

The gate does not infer what the user meant; a human distills the acceptance criteria into the manifest, and the gate prevents the agent from self-certifying against anything less. It complements your tests and CI — it checks user-visible acceptance surfaces teams rarely unit-test (a missing export, a degenerate chart, a renamed run), not code correctness.

OpenSpec helps you define what to build before coding; agent-completion-gate checks whether the finished artifacts satisfy acceptance before the agent can call the task done.

See it in action (30 seconds)

Watch the gate read real files and disagree with an agent that says "done":

pip install pyyaml
git clone https://github.com/zhjai/agent-completion-gate && cd agent-completion-gate
sh examples/minimal-project/run.sh

The everyday case — "add a monthly sales report page". The agent reports candidate_complete both times; only the real artifacts differ:

===== BEFORE — agent did the headline task, missed the details (expect BLOCKED) =====
FAIL report_has_multiple_points: rows points=1 (min 2)
FAIL csv_export_present:         file exports/monthly.csv exists=False
  -> BLOCKED (exit 1). The agent could NOT call this done.

===== AFTER — agent fixed the real artifacts (expect COMPLETE-OK) =====
PASS report_has_multiple_points: rows points=3 (min 2)
PASS csv_export_present:         file exports/monthly.csv exists=True
  -> COMPLETE-OK (exit 0).

More: examples/run.sh (overstep / blocked / granted), examples/diff_demo.sh (catch a worker under-reporting what it touched), examples/swanlab/ (the real ML incident that motivated this kit).

Quick start — add the gate to your repo

cd your-project
npx skills add zhjai/agent-completion-gate -g -a claude-code   # or -a codex, cursor, … any host

Then you (a human) scaffold the gate into your repo — one command, no manual copying:

# the engine + scaffolder live in the repo (the skill teaches the procedure; it doesn't ship the engine)
git clone https://github.com/zhjai/agent-completion-gate /tmp/acg
cd your-project && sh /tmp/acg/scripts/init.sh --dest .

It creates gate/ (the engine + an empty, passable manifest), control/surface_inventory.yaml, state/, .github/workflows/completion-gate.yml, and a CODEOWNERS example. Idempotent; never clobbers your edited specs without --force. (Prefer typing one line to your agent? Ask it to "set up the completion gate" — the completion-gate-init skill runs this same script. The script is the source of truth.)

Fresh install is intentionally permissive. Empty specs pass — at this point the gate only stops the agent from self-declaring complete; it does not yet know your project's artifacts. To make it useful you add at least one surface and one check.

1 — Define what "done" means (the human distills intent into checks; the gate doesn't infer it). Edit control/surface_inventory.yaml:

surfaces:
  - id: report
    user_visible: true
    paths: ["artifacts/report.json"]

…and gate/acceptance_manifest.yaml:

checks:
  - id: report_has_multiple_points
    surface: report
    type: min_series_points
    artifact: "artifacts/report.json"
    series: "rows"
    min_points: 2
review_items: []

Built-in check types: file_exists, config_not_disabled, min_series_points, max_chart_count, identity_in_name (extend run_machine_check() for your own). A fuller worked spec: examples/swanlab/.

2 — Run it locally:

printf 'status: candidate_complete\ntouched_surfaces: [report]\nreview_queue: []\n' > state/completion_candidate.yaml
python3 -E gate/check_acceptance.py --manifest gate/acceptance_manifest.yaml \
  --inventory control/surface_inventory.yaml --candidate state/completion_candidate.yaml --repo .

Missing the data points → BLOCKED. Once the real artifact is right → COMPLETE-OK.

3 — Make it the authority. The scaffolded .github/workflows/completion-gate.yml runs this on every PR. Mark the verify-completion job a required status check, and CODEOWNERS-protect gate/, control/, and the workflow (see the generated .github/CODEOWNERS.completion-gate.example). Now complete means exactly one thing: that check is green. Trust model + the agent Stop-hook option: integrations/README.md.

What the agent does after you install the skill

The completion-audit skill instructs the agent: at task wrap-up, write state/completion_candidate.yaml (status: candidate_complete, plus the surfaces it touched), then run the gate. The agent can reach at most candidate_complete — only the external verifier (CI / a hook) ever writes complete. If the gate blocks, it fixes the real artifacts and re-audits. Your loop: do the work → audit completion → CI verdict → fix blocked reasons or merge.

How it works — the four states

in_progress ──► candidate_complete ──►(EXTERNAL verifier)──► complete
     │                                                     └─► blocked
     └────────► blocked  (needs-review / unknown surface / missing evidence)

The worker can only reach candidate_complete or blocked. Only an external verifier writes complete. needs-review == blocked (not an annotation the agent can set and move past). The kit ships the check, the contract, and the wiring: check_acceptance.py returns a verdict; gate/verify_completion.sh enforces the state machine around it (rejects a worker that wrote complete itself; grants only on a clean pass); integrations/ attaches it as CI / hook. Full contract: STATE_MACHINE.md.

Why a gate, not a rule / skill / memory

A rule is advisory — a goal rationalizes past it.
A skill can be skipped — the agent chooses not to invoke it.
memory records belief, not verified truth.
Only a gate the agent can't edit, on a path it can't skip, reading artifacts it can't fake reliably stops "looks done but isn't."

Where it fits

OpenSpec               — planning before coding (agree on what to build)
agent-lessonbook       — capture corrections & drift lessons during the work
agent-completion-gate  — acceptance before "done"

agent-lessonbook is an optional companion that captures process lessons during execution. This gate is standalone — it never reads lessonbook (or any memory) at runtime; it reads only its own --manifest/--inventory. Only a human may translate a recurring lesson into the gate's protected manifest.

Security model & invariants

Hardened across multiple heterogeneous (Codex × Claude) review rounds — each invariant closed a reproduced bypass. "External + fail-closed under a trusted base branch + runner", not "unbypassable":

Gate + manifest + inventory are protected (read-only, outside the agent-writable workspace, maintained only through human/CI-reviewed changes). check_acceptance.py --agent-writable-root DIR enforces this at runtime.
Inspect real artifacts, never run_state.
Unknowns fail closed — a touched user-visible surface with no passing check → blocked. The touched_surfaces list is a worker self-report; use --strict-surfaces or --diff-base <ref> / --touched to derive it from the real git diff instead of trusting the worker.
One canonical completion signal (the gate's verdict); chat / PR / dashboard derive from it, never become an independent "complete".
Artifact content is hostile data, not instructions — deterministic checks first; an LLM verifier treats artifacts as untrusted.
Hermetic execution — the gate runs as python3 -E (ignores PYTHON* env / repo-planted yaml.py), and CI runs it from the trusted base branch so a PR can't edit the gate that judges it.

Docs

scripts/init.sh — scaffold the gate into your project (the authoritative setup path).
STATE_MACHINE.md — the completion contract (states, transitions, wiring).
integrations/README.md — CI / agent-hook / pre-push wiring + the trust model.
examples/ — runnable: minimal-project/ (everyday web task), run.sh, diff_demo.sh, diff_rename_test.sh, swanlab/ (the ML incident).
CHANGELOG.md · self-tests in tests/.

Status

v0.3.1 preview. MIT. Agent-agnostic, file-based, fail-closed. Optional companion: agent-lessonbook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-completion-gate

See it in action (30 seconds)

Quick start — add the gate to your repo

What the agent does after you install the skill

How it works — the four states

Why a gate, not a rule / skill / memory

Where it fits

Security model & invariants

Docs

Status

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
assets		assets
control		control
examples		examples
gate		gate
integrations		integrations
scripts		scripts
skills		skills
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
STATE_MACHINE.md		STATE_MACHINE.md

Folders and files

Latest commit

History

Repository files navigation

agent-completion-gate

See it in action (30 seconds)

Quick start — add the gate to your repo

What the agent does after you install the skill

How it works — the four states

Why a gate, not a rule / skill / memory

Where it fits

Security model & invariants

Docs

Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages