gen-plan: drop go/no-go gates (human judgment) and default to lightweight p<0.05 statistics

## Summary

Two policy changes to the `gen-plan` command (`commands/gen-plan.md`) so generated plans match a human-in-the-loop, low-ceremony workflow.

### 1. No go/no-go gates — humans judge

Generated plans should not contain automated go/no-go gates, pass/fail thresholds, or stopping rules that decide success based on hitting a number. Quantitative metrics should be recorded as **reference targets and measured evidence**; a human reviews the evidence and makes every accept / proceed / pivot decision.

Affected today:
- **Step 3 (Confirm Quantitative Metrics)** currently frames each metric as a "hard requirement that must be achieved for the implementation to be considered successful" vs. an optimization trend — this produces go/no-go gates in the acceptance criteria.
- **Rule 7 (TDD-Style Tests)** says tests "enable deterministic verification", which for quantitative/statistical criteria reads as an auto pass/fail gate.

### 2. Lightweight statistics — p < 0.05 is enough

For any statistical comparison, a single significance test at **p < 0.05 is sufficient**. The command should NOT require or generate:
- bootstrap confidence intervals (per-item resampling, 10000 draws, 95% CI)
- per-item McNemar tests
- minimum-effect-size sidebars (e.g. "+1pp")
- separately reported robustness seeds

...unless the user explicitly asks for that extra rigor.

## Proposed changes

- Reword **Step 3** so metrics are reference targets / evidence for human judgment, never auto-gates; add the p<0.05 statistics note.
- Amend **Rule 7** to clarify quantitative/statistical criteria describe how to measure and report (not auto-gate).
- Add **Rule 15 (No Go/No-Go Gates — Human Judgment)** and **Rule 16 (Lightweight Statistics)**.

Same policy should be mirrored into the Codex-side copy of the skill (`skills/humanize-gen-plan/SKILL.md`) for consistency.

## Notes

I've applied this locally to validate it; happy to send a PR if the direction looks good.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gen-plan: drop go/no-go gates (human judgment) and default to lightweight p<0.05 statistics #209

Summary

1. No go/no-go gates — humans judge

2. Lightweight statistics — p < 0.05 is enough

Proposed changes

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

gen-plan: drop go/no-go gates (human judgment) and default to lightweight p<0.05 statistics #209

Description

Summary

1. No go/no-go gates — humans judge

2. Lightweight statistics — p < 0.05 is enough

Proposed changes

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions