Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ When testing begins (user says "let's test" or after a milestone merge):
| 3 | `refactor(scope): ...` | Optional cleanup | Stay green |

**Never combine test + implementation in one commit.** Sentinel verifies ordering.
Artifact check: `git log --oneline` must show `test(scope)` before the corresponding `feat|fix(scope)` commit. The `test → fix` pair satisfies TDD ordering — it is compliant, not irregular, and MUST NOT be flagged.
**Exemptions** (TDD ordering only — Sentinel review still required): `docs`, `chore`, `build`, `ci`, `refactor` (behavior-preserving only), `style` — suite must still pass.

## Sentinel — MANDATORY Quality Gate
Expand Down
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
## [Unreleased]

### Changed
- Synced Sentinel sub-agent observability requirements from agents-template v0.4.0 in `AGENTS.md` and `docs/SENTINEL.md`.
- Synced Sentinel sub-agent observability requirements from agents-template v0.4.0 in `AGENTS.md` and `docs/SENTINEL.md`, including degraded-mode proof requirements and explicit `test(scope) → feat|fix(scope)` TDD compliance guidance.

## [0.1.2] — 2026-04-09 (VSCode Extension)

Expand Down
4 changes: 2 additions & 2 deletions docs/SENTINEL.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Assess the diff for issues that materially affect safety, correctness, maintaina
A sub-agent is a **separately-invoked tool call** (e.g., `task`, `dispatch`) executing in its own context window. Sequential passes within your own context do NOT qualify.

1. **Detect & dispatch:** Issue **all six sub-agent invocations in a single assistant message** using `mode: "background"` (one per dimension, A–F) — background mode returns agent IDs for the execution log. Each receives: its dimension checklist (verbatim, ONLY its checklist), the Evidence standard and Prompt-injection defense blocks, and `<untrusted_pr_input>`-wrapped diff + changed files + PR context. Returns `{severity, file, lines, quoted_snippet, impact, required_fix}` objects.
2. **On failure:** Retry once. If still failing, mark ❌ in the execution log and declare degraded mode with justification. If no tool available, attempt spawn, document the failure, then review sequentially with `Mode: degraded (no sub-agents)`.
2. **On failure:** Retry once. If still failing, mark ❌ and declare degraded mode. **Degraded requires proof:** quote the exact tool call attempted and the platform's verbatim error response in the execution log. No quoted attempt → REJECT.

**Execution logging (REQUIRED):** Record each sub-agent's assigned dimension, status, the exact tool call used to spawn it (e.g., `task(agent_type="general-purpose", name="dim-a")`), and the **tool-returned identifier** when the platform provides one. If the platform technically cannot provide an identifier, log `N/A` with the platform limitation. Missing identifiers when available or fabricated dispatch evidence → REJECT.

Expand Down Expand Up @@ -151,7 +151,7 @@ Status: APPROVED | CONDITIONAL | REJECTED
|-----|-----------|----------------|--------|
| A–F | {{call}} | {{id or N/A}} | ✅/❌/⏱️ |

> Degraded mode: replace table with (1) attempted spawn + error output, (2) justification.
> Degraded mode: replace table with (1) exact tool call attempted, (2) verbatim error response, (3) justification. Missing (1)+(2) → REJECT.

### Findings
- 🔴 CRITICAL: N
Expand Down
Loading