Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
133 changes: 133 additions & 0 deletions docs/proposals/RFC-023-hook-bundle-hardening-consensus-plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# RFC-023: Hook Bundle Hardening — Consensus Execution Plan

Status: Draft
Plan type: Non-normative execution plan for the reference-implementation hook bundle
Created: 2026-05-21
Authors: Codex + Claude (claude-opus-4-7), iterated to a fixed point and owner-decided

Inputs strictly audited:

- A multi-round Codex × Claude consensus series (Codex r1–r5 × Claude v1–v6) stored as local scratch under `.agent-protocol-runtime/` (gitignored; not part of this tree).
- Direct empirical probing of the reference hook scripts under `reference-implementations/hooks-claude-code/hooks/`.

Scope: capture the findings of a risk audit of this repository's reference-implementation hook bundle, the dispositions both reviewers agreed on after error-correction, and the six owner decisions that resolved the open policy questions. This proposal does **not** modify canonical methodology and does not itself change any hook. It is an execution-ready plan that should be promoted to implementation wave-by-wave per the sequencing below.

This proposal is non-normative per `docs/proposals/README.md`. The hook bundle it plans to change is itself a reference implementation (`docs/runtime-hook-contract.md` is the canonical contract; the scripts are one conforming realization).

---

## 0. Verdict and consensus

Two independent agents reviewed the audit, corrected each other's factual errors over several rounds, and reached a fixed point (two consecutive rounds in which neither found a substantive factual error in the other). The owner then decided the six open policy questions.

- Fact / repro / wiring consensus: ~99% (every claim below was verified by running the actual hooks).
- Disposition consensus: ~97%.
- Decision-complete executable consensus: ~98%. The residual ~2% is execution-time validation that cannot close in a planning document — an implementer must add a failing fixture for each false-block first, and an independent reviewer must confirm every repro is fixed and every destructive positive control still blocks.

---

## 1. Two-layer wiring model

A recurring source of confusion in the audit was the word "wired." It has two distinct meanings in this repo, and every statement about a hook must name which layer it refers to:

- **Layer A — repo plugin entry point**: `hooks/hooks.json`, consumed by Claude Code plugin auto-discovery. This is the set that is active simply by installing the plugin. It wires 10 hook invocations.
- **Layer B — shipped reference / adopter examples**: the five `settings.example.*` files (`reference-implementations/hooks-claude-code/settings.example.json` plus the Codex / Cursor / Gemini-CLI [TOML] / Windsurf settings examples). This is the set a user gets after copying an example settings file, and it is broader than Layer A.

| Hook | Layer A | Layer B | Notes |
|---|---|---|---|
| `manifest-required.sh` | block | block | Blocks non-doc staged commits even when discovery finds no manifest (absent manifest **is** the violation). |
| `evidence-artifact-exists.sh` | block | block | No-op when no manifest is discovered or supplied. |
| `consumer-registry-check.sh` | warn | warn | **Only hook that invokes the network**; only after a manifest is discovered/supplied with an `external_registry_url`. |
| `cckn-canonical-sync-check.sh` | warn | warn | Local git + frontmatter drift; **does not** touch the network. |
| `risky-bash-block.sh` | block | block | Subject of P3 / P4 / P5 / P6. |
| `risky-file-block.sh` | block | block | P4 consistency reference. |
| `risky-mcp-block.sh` | block | block | Subject of P5 (MCP SQL literals). |
| `sot-drift-check.sh` | warn | warn | Post-edit; no-op without a manifest. |
| `drift-doc-refresh.sh` | warn | warn | Post-edit; no-op without a manifest. |
| `completion-audit.sh` | block | block | **Only Layer A Stop blocker**; no-op without a manifest. |
| `handoff-prompt-validator.sh` | not wired | block/warn | **Layer B** Stop / session-end blocker (`on-stop`, block on hard cap), **not** Layer A. |
| `context-budget-warn.sh` | not wired | warn | Layer B `pre-tool-use:Read,Grep`. |
| `resume-mode-attestation.sh` | not wired | warn | Layer B `pre-tool-use:Read,Grep,Bash`. |
| `router-justification-warn.sh` | not wired | warn | Layer B `pre-tool-use:Read,Grep`. |
| `file-emission-budget-warn.sh` | not wired | warn | Layer B `post-tool-use:Edit|Write|MultiEdit`. |
| `manifest-size-warn.sh` | not wired | warn | Layer B `pre-commit`. |

---

## 2. Findings

Each finding was confirmed by running the hook against the literal command. False-block repros below are verified to block on the current `main`.

- **P1 — `consumer-registry-check.sh` manifest-controlled network egress.** The hook reads `external_registry_url` values out of a manifest and probes them with `curl` on `git push`. It never blocks (warn-level), so this is a network-egress / transparency concern, not a correctness one. Default timeout is configurable via `AGENT_PROTOCOL_NET_TIMEOUT` (default 5). The bundle README (lines ~92 and ~153) incorrectly states the bundle has "no network" / "never ... invoke network", which contradicts this script.
- **P2 — `manifest-required.sh` docs-grade allowlist is narrow.** Allowlist is `*.md|docs/*|*.txt|CHANGELOG*|LICENSE*|.gitignore`. Non-doc edits require either a manifest or the existing Lean escape (`AGENT_PROTOCOL_LEAN_SKIP_MANIFEST=1` **and** a repo-root `lean-mode.flag`).
- **P3 — protected-branch force-push matcher is too broad.** Group 1's unqualified `\b$br\b` alternative matches a protected name anywhere in the command. Confirmed: `git push --force-with-lease origin release-1.5` blocks (false-block — a release-prefixed feature branch), and `git push --force origin feature/release-notes` blocks (false-block). Naively removing the broad alternative would *under*-block real protected destinations such as `feature:main` and `:main`, so the fix must add destination-refspec matching, not just delete.
- **P4 — Bash `.pem` read has no `tests/` exception.** `risky-bash-block.sh` Group 9 blocks `cat tests/fixtures/test.pem`, while `risky-file-block.sh` Matcher 5 already exempts `.pem` under `tests/` and `test/`. Inconsistent.
- **P5 — destructive verbs inside SQL string literals false-block.** `psql -c "SELECT 'DROP TABLE x' AS warning"` blocks; MCP `SELECT ' DROP TABLE x' AS warning` blocks (leading whitespace inside the quoted literal satisfies the matcher boundary). The originally-suspected examples (`'DELETE_USER'`, `aws lambda invoke --function-name delete-old-items`) are **not** repros — they already pass.
- **P6 — `eval` blocks common shell init.** Group 8 blocks `eval "$(direnv hook bash)"` and similar (`ssh-agent`, `starship`, `rbenv`, …).
- **P7 — completion / evidence gates have no bypass.** `completion-audit.sh` and `evidence-artifact-exists.sh` have no `*_ALLOW` escape. This asymmetry vs the risky-action hooks is by design — these gates enforce truthfulness, not operator-approved risk.
- **NEW-A — `evidence-artifact-exists.sh`** (commit-time block hook) was missing from the original audit's scope; folded into P7 / P8.
- **NEW-B — `sot-drift-check.sh` + `drift-doc-refresh.sh`** fire warn-level on every `Edit|Write|MultiEdit`; potential noise, observe-only for now.
- **NEW-C — shared manifest discovery.** Fourteen hooks use `MANIFEST_PATH=$(git ls-files 'change-manifest*.yaml' | head -1)`. This pathspec is root-anchored and returns nothing in this repo, so manifest-consuming hooks no-op via the fallback (and `manifest-required.sh` *blocks* non-doc commits because absent manifest is its violation). A nested manifest is invisible to the fallback; `AGENT_PROTOCOL_MANIFEST_PATH` is the reliable binding. This can stack with P2 friction.

---

## 3. Owner decisions (binding for this plan)

| # | Question | Decision |
|---|---|---|
| 1 | P1 execution mode | **Full** (default network-behavior change on a Layer A wired hook + cross-runtime adopter behavior). |
| 2 | P2 docs-grade allowlist | **Keep narrow (Option A)**; document the existing Lean-skip better. No hook change. |
| 3 | P3 release-prefixed branches | **Treat as normal feature branches** (`release-1.5`, `feature/release-notes` pass); only the protected name as a refspec **destination** blocks. |
| 4 | P7 / NEW-A bypass | **Keep both gates non-bypassable.** Recovery from a stuck gate is to fix the manifest, not to skip the check. |
| 5 | P5 cloud-CLI scope | **First wave does SQL/MCP literals only.** Cloud-CLI Group 6 argument/path narrowing is deferred. |
| 6 | NEW-C discovery hardening | **Document only.** No discovery change now. |

Decisions 2, 4, and 6 mean the corresponding work is documentation-only or closed; they do not open behavior-changing waves.

---

## 4. Execution waves

All acceptance uses the selftest harness (`reference-implementations/hooks-claude-code/selftests/`), never live `git commit` / `git push` / network / packet capture. Every false-block fix begins with a fixture that fails on the current `main`; every destructive positive control must keep blocking.

### Wave 0 — re-baseline (read-only)
Confirm the suite is green (`151 cases, 0 failed` for the Claude bundle; `4 cases, 0 failed` for each of the Codex/Cursor/Gemini-CLI/Windsurf adapters), record NEW-C current behavior (`test -z "$(git ls-files 'change-manifest*.yaml')"`), and have a reviewer accept the P3 token/refspec fixture matrix before any code change.

### Wave 1 — behavior-disclosure docs (Lean, docs-only) — do first
Files: `reference-implementations/hooks-claude-code/README.md`, `docs/runtime-hooks-in-practice.md`, possibly `reference-implementations/hooks-claude-code/DEVIATIONS.md`. Do **not** edit `AGENTS.md`.
Content: the two-layer wiring model; the only network hook (consumer-registry-check); the Stop blockers (Layer A = completion-audit; Layer B adds handoff-prompt-validator); commit blockers (manifest-required blocks on absent manifest, evidence-artifact-exists); each warn hook's verified trigger; the existing Lean-skip mechanism (decision 2); that completion/evidence gates are intentionally non-bypassable (decision 4); the NEW-C discovery behavior and `AGENT_PROTOCOL_MANIFEST_PATH` binding (decision 6); and a correction of the README "no network" contradiction.

### Wave 2 — Bash false-block narrowing (Lean) — P3 first (risk driver)
File: `risky-bash-block.sh` + its fixtures.
- **P3**: remove the broad whole-command `\b$br\b`; match a protected name only as a refspec **destination** — `<repo> main` (2nd positional, any remote), `src:dst`, `:dst` (delete), `HEAD:dst`, `refs/heads/dst`; a 1st-positional token (the repository, e.g. lone `release`) and a path-segment substring pass. Specification-first + fixture-driven; if the matcher grows beyond a local token/refspec helper, P3 becomes its own sub-wave.
- **P4**: add the `tests/` / `test/` exception to Group 9 to match `risky-file-block.sh` Matcher 5; do not touch `.env` in this wave.
- **P6**: narrow whitelist for `eval "$(<init-cmd> ...)"` over `{direnv, ssh-agent, starship, rbenv, pyenv, nodenv, goenv, fnm, nvm, zoxide, atuin, mise}`; do not downgrade all `eval` to warn.

### Wave 3 — network probe opt-in (Full) — P1
Files: `consumer-registry-check.sh` + fixtures + README + `docs/runtime-hooks-in-practice.md` + `DEVIATIONS.md`.
Add `AGENT_PROTOCOL_CONSUMER_PROBE` (default 0): no outbound probe unless set; on opt-in, print one stderr disclosure of the URLs before probing; preserve warn-only behavior. Fixtures use the `curl-exit` stub (default-off + `curl-exit=99` → exit 0 proves curl was not invoked; opt-in + `curl-exit=0` → exit 0 with disclosure; opt-in + `curl-exit=7` → exit 2). Full is justified by the Layer A / cross-runtime behavior change; the README correction is incidental, not the trigger.

### Wave 4 — SQL / MCP literal narrowing (Lean+) — P5, SQL/MCP only
Files: `risky-bash-block.sh` (Group 5) + `risky-mcp-block.sh` (Matcher 3) + fixtures.
Add failing fixtures first, then fix via fixture-proven literal masking (strip single/double-quoted literals before destructive-verb matching; statement-anchoring alone cannot fix leading whitespace inside a quote). Keep literal handling deliberately narrow. **Do not touch cloud-CLI Group 6** (decision 5). A shared parser helper across Bash and MCP would upgrade ceremony to Full.

### Closed by decision
P2 hook change (decision 2 → docs only), P7/NEW-A bypass (decision 4 → none), P5 cloud-CLI Group 6 (decision 5 → deferred), NEW-C discovery hardening (decision 6 → docs only). If any decision is later reversed, the corresponding wave reopens (NEW-C hardening would be Full and touch all fourteen scripts or introduce a shared helper).

---

## 5. Stop rules

Stop and escalate if: a baseline selftest is red; a fixture for a claimed false-block already passes on `main`; a destructive positive control flips from block to pass; acceptance would require live network / commit / push / packet capture; a Lean wave would change `docs/runtime-hook-contract.md` normative semantics; a patch silently broadens `.env` / credential / protected-branch / manifest-discovery / bypass policy beyond the named item; adapter settings drift from the shared hook script paths; or any plan / summary says "wired" without naming Layer A or Layer B.

---

## 6. Relationship to canonical content

Per `docs/proposals/README.md`, this RFC is never cited as a normative source. The canonical contract for these hooks is `docs/runtime-hook-contract.md`; if a Wave 1 doc correction reveals a contract gap (e.g. the network-degradation clause vs the README "no network" claim), the canonical layer is the source of truth and is updated through the standard `docs/file-role-map.md` flow — not by citing this RFC.

---

## Changelog

- 2026-05-21: Created (Status: Draft). Promoted from the Codex × Claude consensus scratch series after the series reached a fixed point and the owner decided the six open policy questions. Wave 1 Change Manifest drafted alongside at `docs/proposals/RFC-023-wave1.change-manifest.yaml`.
99 changes: 99 additions & 0 deletions docs/proposals/RFC-023-wave1.change-manifest.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Draft Change Manifest for RFC-023 Wave 1 (behavior-disclosure docs).
# Status: draft / phase: plan — not yet executed. Stored beside the RFC, NOT at
# repo root, so the root-anchored manifest-discovery fallback (RFC-023 §NEW-C)
# does not auto-bind the wired hooks to it prematurely. When Wave 1 execution
# begins, the implementer either moves this to the active location or exports
# AGENT_PROTOCOL_MANIFEST_PATH to point at it.

change_id: 2026-05-21-rfc023-w1-hook-disclosure-docs
title: "RFC-023 Wave 1 — hook bundle behavior-disclosure docs"
phase: plan
status: draft
execution_mode: lean
manifest_role: monolithic

authors:
- name: owner
role: human
- name: claude-opus-4-7
role: ai
identifier: claude-opus-4-7

last_updated: "2026-05-21T00:00:00Z"

strategic_parent:
kind: rfc
location: docs/proposals/RFC-023-hook-bundle-hardening-consensus-plan.md
summary: >-
RFC-023 Wave 1 is the docs-only first wave: disclose the two-layer wiring
model, the only network hook, the Stop/commit blockers, the existing
Lean-skip, the intentionally non-bypassable gates, and NEW-C discovery
behavior; and correct the README "no network" contradiction.

surfaces_touched:
- surface: information
role: primary
notes: >-
README + runtime-hooks-in-practice docs are the information surface that
describes hook behavior; they currently drift from the scripts.

sot_map:
- info_name: "hook bundle behavior (wiring + network/Stop/commit/bypass semantics)"
pattern: 2
source: "hooks/hooks.json + reference-implementations/hooks-claude-code/hooks/*.sh"
consumers:
- name: "reference-implementations/hooks-claude-code/README.md"
kind: documentation
sync_lag: manual
- name: "docs/runtime-hooks-in-practice.md"
kind: documentation
sync_lag: manual
sync_mechanism: "manual doc update on hook behavior change"
desync_risk: medium
notes: >-
README (~lines 92/153) claims the bundle has "no network" / "never
invoke network", which contradicts consumer-registry-check.sh's curl
probe. Wave 1 reconciles the docs to the scripts.

breaking_change:
level: L0
self_assessed_vs_worst_case: matches

rollback:
overall_mode: 1
per_surface_modes:
- surface: information
mode: 1
rationale: "Docs-only; revert the commit to restore prior text."

evidence_plan:
- type: runbook_update
surface: information
status: planned
summary: >-
README §Behavior summary + docs/runtime-hooks-in-practice.md disclosure:
two-layer (A/B) wiring table, only-network-hook = consumer-registry-check,
Stop blockers (A=completion-audit; B adds handoff-prompt-validator),
commit blockers, each warn hook's verified trigger, existing Lean-skip,
non-bypassable completion/evidence gates, and NEW-C discovery behavior +
AGENT_PROTOCOL_MANIFEST_PATH binding.
- type: changelog_entry
surface: information
status: planned
summary: >-
Correct the README "no network" / "never invoke network" claim to reflect
that consumer-registry-check.sh performs an outbound curl probe.

assumptions:
- statement: >-
Wave 1 is docs-only and changes no hook behavior, so the Claude bundle
selftest (151/0) and the four adapter selftests (4/0 each) remain green
with no fixture changes.
confidence: high
validation_plan: "Run all five selftest suites + git diff --check + make verify-local."
- statement: >-
Correcting the README "no network" text does not alter docs/runtime-hook-contract.md
normative semantics; if it surfaces a contract gap, the contract is the SoT
and is updated via the docs/file-role-map.md flow, not this wave.
confidence: medium
validation_plan: "Reviewer confirms no normative contract edit is required for Wave 1."
Loading
Loading