Skip to content

The engineering pipeline: soundness and stall points #16

Description

@kyau

The pipeline (brainstorming → prototype? → writing-plans → @tdd per task → verification → /check → @code-review, with @architect inserted for cross-cutting work and @debug prepended for bugs) is sound, and the AGENTS.md summary plus the build-agent prompt's "enforcement posture" list give it two independent anchors. Specific gaps, roughly in pipeline order:

  1. No execution/orchestration skill between plan and tasks. writing-plans ends by offering "task-by-task via @tdd (recommended)" or "inline execution," but there is no skill defining either mode: what the parent does between tasks, what review happens per task (spec-compliance vs code-quality), what to do when a task's tests won't go green after N attempts, when to halt vs re-plan, how context is managed across a 15-task plan. Superpowers ships this as two skills (executing-plans for batch-with-checkpoints, subagent-driven-development for dispatch-with-two-stage-review), and it's the single most consequential content gap in your harness — it's where multi-hour autonomous runs either hold together or wander. Direct content port, no mechanics involved (§2.1).

  2. Enforcement is prompt-level only. The hard-gate lives in a skill the model must choose to load, backed by the build prompt. Nothing structural prevents a fresh or degraded session from editing source directly — the hooks only catch lint/format/secrets at commit time, which is after the damage. Superpowers attacks this with (a) a session-start hook that injects the using-superpowers bootstrap (and re-injects after compaction) and (b) that bootstrap's rationalization red-flags table ("'This is just a simple question' → Questions are tasks; 'I'll just do this one thing first' → Check BEFORE doing anything"), which is empirically the highest-leverage anti-drift text in the ecosystem. Your build prompt covers the triggers; it doesn't inoculate against the rationalizations. Port the table into the build prompt or a tiny always-loaded doc (low effort), and consider the session-start injection via an OpenCode plugin later (§2.1).

  3. The /check coverage gate is under-specified. "Gate: ≥ 80% line coverage on changed files" — but pest --coverage reports per-file totals, not changed-file coverage; computing the gate as written requires intersecting git diff --name-only with the coverage report, which the command never spells out. An agent will either approximate (global ≥80%) or hand-wave. Spell out the mechanics (e.g., --coverage --min=80 for the global floor plus an explicit per-changed-file table the agent assembles from git diff + the coverage output) or relax the wording to match what the tool measures.

  4. Verification vs /check duplication is acceptable but should be named. Both run lint + tests; one is per-task, one is pre-push. A single sentence in each ("verification is the per-task gate; /check is the aggregate pre-push gate — both run because task-level green can rot by push time") prevents a future "optimization" that removes one.

  5. The handoff/resume loop is complete and good/handoff + context-management doc's degradation thresholds + "read the handoff and continue" resume protocol is better than most comparison repos. The context-management doc's rewind-over-correct and compact-with-a-hint guidance is state of the art.

  6. Eval framework is honest scaffolding, but its runner is fictional. opencode eval is not an OpenCode command; the README says execution is "pending API access," but the command block reads as if it exists. For 🌍 credibility, either mark it clearly aspirational, or (better, §2.1/§2.5) replace it with a real scripts/run-evals.sh that drives opencode run non-interactively against each case and greps for the expected behaviors — Superpowers' tests/ tree (which includes an opencode/ suite) is a working reference implementation you can crib directly.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Fields

No fields configured for Refactor.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions