The engineering pipeline: soundness and stall points

The pipeline (`brainstorming → prototype? → writing-plans → @tdd per task → verification → /check → @code-review`, with `@architect` inserted for cross-cutting work and `@debug` prepended for bugs) is sound, and the AGENTS.md summary plus the build-agent prompt's "enforcement posture" list give it two independent anchors. Specific gaps, roughly in pipeline order:

1. **No execution/orchestration skill between plan and tasks.** `writing-plans` ends by offering "task-by-task via @tdd (recommended)" or "inline execution," but there is no skill defining either mode: what the parent does between tasks, what review happens per task (spec-compliance vs code-quality), what to do when a task's tests won't go green after N attempts, when to halt vs re-plan, how context is managed across a 15-task plan. Superpowers ships this as two skills (`executing-plans` for batch-with-checkpoints, `subagent-driven-development` for dispatch-with-two-stage-review), and it's the single most consequential content gap in your harness — it's where multi-hour autonomous runs either hold together or wander. Direct content port, no mechanics involved (§2.1).

2. **Enforcement is prompt-level only.** The hard-gate lives in a skill the model must choose to load, backed by the build prompt. Nothing structural prevents a fresh or degraded session from editing source directly — the hooks only catch lint/format/secrets at commit time, which is after the damage. Superpowers attacks this with (a) a session-start hook that injects the `using-superpowers` bootstrap (and re-injects after compaction) and (b) that bootstrap's **rationalization red-flags table** ("'This is just a simple question' → Questions are tasks; 'I'll just do this one thing first' → Check BEFORE doing anything"), which is empirically the highest-leverage anti-drift text in the ecosystem. Your build prompt covers the *triggers*; it doesn't inoculate against the *rationalizations*. Port the table into the build prompt or a tiny always-loaded doc (low effort), and consider the session-start injection via an OpenCode plugin later (§2.1).

3. **The `/check` coverage gate is under-specified.** "Gate: ≥ 80% line coverage **on changed files**" — but `pest --coverage` reports per-file totals, not changed-file coverage; computing the gate as written requires intersecting `git diff --name-only` with the coverage report, which the command never spells out. An agent will either approximate (global ≥80%) or hand-wave. Spell out the mechanics (e.g., `--coverage --min=80` for the global floor plus an explicit per-changed-file table the agent assembles from `git diff` + the coverage output) or relax the wording to match what the tool measures.

4. **Verification vs `/check` duplication is acceptable but should be named.** Both run lint + tests; one is per-task, one is pre-push. A single sentence in each ("verification is the per-task gate; /check is the aggregate pre-push gate — both run because task-level green can rot by push time") prevents a future "optimization" that removes one.

5. **The handoff/resume loop is complete and good** — `/handoff` + context-management doc's degradation thresholds + "read the handoff and continue" resume protocol is better than most comparison repos. The context-management doc's rewind-over-correct and compact-with-a-hint guidance is state of the art.

6. **Eval framework is honest scaffolding, but its runner is fictional.** `opencode eval` is not an OpenCode command; the README says execution is "pending API access," but the command block reads as if it exists. For 🌍 credibility, either mark it clearly aspirational, or (better, §2.1/§2.5) replace it with a real `scripts/run-evals.sh` that drives `opencode run` non-interactively against each case and greps for the expected behaviors — Superpowers' `tests/` tree (which includes an `opencode/` suite) is a working reference implementation you can crib directly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

The engineering pipeline: soundness and stall points #16

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

The engineering pipeline: soundness and stall points #16

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions