Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
215 changes: 71 additions & 144 deletions .claude/skills/lading-optimize-review/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
---
name: lading-optimize-review
description: Reviews optimization patches for lading using a 5-persona peer review system. Requires unanimous approval backed by benchmarks. Bugs discovered during review are valuable - invoke /lading-optimize-validate to validate them.
allowed-tools: Bash(cat:*) Bash(ci/*:*) Read Write Edit Glob Grep Skill
allowed-tools: Bash(cat:*) Bash(sample:*) Bash(samply:*) Bash(cargo:*) Bash(ci/*:*) Bash(hyperfine:*) Bash(*/payloadtool:*) Bash(tee:*) Read Write Edit Glob Grep Skill
---

# Optimization Patch Review

A rigorous 5-persona peer review system for optimization patches in lading. Requires unanimous approval backed by concrete benchmark data. **Duplicate Hunter persona prevents redundant work.**

## CRITICAL: Recording is MANDATORY
## Role: Judge

**EVERY review outcome MUST be recorded in db.yaml.**
**Review is the decision-maker. It does NOT record results.**

Review judges using benchmarks and 5-persona review, then returns a structured report.

## Valuable Outcomes

Expand All @@ -24,62 +26,43 @@ A rigorous 5-persona peer review system for optimization patches in lading. Requ

---

## Phase 0: Pre-flight

Run `/lading-preflight` first. Then check for duplicate work:

```bash
# Check previous reviews
cat .claude/skills/lading-optimize-review/assets/db.yaml
## Phase 1: Benchmark Execution

# Check previous hunts
cat .claude/skills/lading-optimize-hunt/assets/db.yaml
```
### Step 1: Read Baseline Data

**Check for:**
- Same optimization already reviewed -> REJECT as "DUPLICATE"
- Same file + technique already approved -> REJECT as "DUPLICATE"
Read the baseline benchmark files captured:

---
- `/tmp/criterion-baseline.log` — micro-benchmark baseline
- `/tmp/baseline.json` — macro-benchmark timing baseline
- `/tmp/baseline-mem.txt` — macro-benchmark memory baseline

## Phase 1: Validation Gate
**If baseline data is missing -> REJECT. Baselines must be captured before any code change and before this gets invoked.**

**MANDATORY: ci/validate must pass before any review proceeds.**
### Step 2: Run Post-Change Micro-benchmarks

```bash
ci/validate
cargo criterion 2>&1 | tee /tmp/criterion-optimized.log
```

**If ci/validate fails -> REJECT immediately. No exceptions.**

---
Note: Criterion automatically compares against the last run and reports percentage changes.

## Phase 2: Measurement
Compare results — look for "change:" lines showing improvement/regression.

**Review existing benchmark data.** The optimization should already have benchmark results. If missing, request them or REJECT.
Example output: `time: [1.2345 ms 1.2456 ms 1.2567 ms] change: [-5.1234% -4.5678% -4.0123%]`

**The optimization being reviewed MUST provide benchmark data.** Review should focus on analyzing existing data, not generating new benchmarks.
### Step 3: Run Post-Change Macro-benchmarks

### Expected Benchmark Data
Use the config provided by the caller (stored in `$PAYLOADTOOL_CONFIG`):

The optimization should include:

**Micro-benchmarks (criterion):**
- Benchmark name and throughput change
- Example: `syslog_100MiB: +42.0% throughput (481 -> 683 MiB/s)`

**Macro-benchmarks (payloadtool with hyperfine):**
- Time change with absolute values
- Memory change with absolute values
- Allocation count change with absolute values
- Example:
```
Time: -14.5% (8.3 ms -> 7.1 ms)
Memory: -35.8% (6.17 MiB -> 3.96 MiB)
Allocations: -49.3% (67,688 -> 34,331)
```
```bash
cargo build --release --bin payloadtool
hyperfine --warmup 3 --runs 30 --export-json /tmp/optimized.json \
"./target/release/payloadtool $PAYLOADTOOL_CONFIG"
./target/release/payloadtool "$PAYLOADTOOL_CONFIG" --memory-stats 2>&1 | tee /tmp/optimized-mem.txt
```

### Statistical Requirements

- Minimum 30 runs for hyperfine (`--runs 30`)
- Criterion handles statistical significance internally
- Time improvement >= 5% for significance
Expand All @@ -88,31 +71,33 @@ The optimization should include:

### NO EXCEPTIONS

**If benchmark data is missing -> REJECT. Period.**

- "Test dependencies don't work" -> REJECT. Fix deps first.
- "Test dependencies don't work" -> REJECT. Fix dependencies first.
- "Theoretically better" -> REJECT. Prove it with numbers.
- "Obviously an improvement" -> REJECT. Obvious is not measured.
- "Will benchmark later" -> REJECT. Benchmark now.

---

## Phase 3: Five-Persona Review
## Phase 2: Five-Persona Review

### 1. Duplicate Hunter (Checks for Redundant Work)
- [ ] Optimization not already in reviews.yaml
- [ ] Optimization not already in db.yaml
- [ ] File + technique combo not already approved
- [ ] No substantially similar optimization exists
- [ ] If duplicate found -> REJECT with "DUPLICATE: see <existing entry>"

### 2. Skeptic (Demands Proof)
- [ ] Baseline data verified present
- [ ] Post-change benchmarks executed with identical methodology
- [ ] Hot path verified via profiling (not just guessed)
- [ ] Benchmark improvement meets thresholds (5% time, 10% mem, 20% allocs)
- [ ] Micro threshold met (>=5% time)
- [ ] Macro threshold met (>=5% time OR >=10% mem OR >=20% allocs)
- [ ] Statistical significance confirmed (p<0.05 or criterion "faster")
- [ ] Improvement is real, not measurement noise
- [ ] Benchmark methodology sound (same config, same machine)

### 3. Conservative (Guards Correctness)
- [ ] `ci/validate` passes completely
- [ ] Run `ci/validate` and validate that it passes completely
- [ ] No semantic changes to output
- [ ] **Determinism preserved** (same seed -> same output)
- [ ] No `.unwrap()` or `.expect()` added (lading MUST NOT panic)
Expand All @@ -136,7 +121,7 @@ The optimization should include:

---

## Phase 4: Kani/Property Test Check
## Phase 3: Kani/Property Test Check

If the optimization touches critical code:

Expand All @@ -162,14 +147,26 @@ ci/kani lading_payload

---

## Phase 5: Decision
## Phase 4: Decision

| Outcome | Votes | Action |
|---------|-------|--------|
| **APPROVED** | 5/5 APPROVE | Record success |
| **REJECTED** | Any REJECT | Record lesson |
| **DUPLICATE** | Duplicate Hunter REJECT | Record as DUPLICATE |
| **BUG FOUND** | Correctness issue | Invoke `/lading-optimize-validate` |
| **APPROVED** | 5/5 APPROVE | Return APPROVED report |
| **REJECTED** | Any REJECT | Return REJECTED report |
| **DUPLICATE** | Duplicate Hunter REJECT | Return DUPLICATE report |
| **BUG FOUND** | Correctness issue | Invoke `/lading-optimize-validate`, return BUG_FOUND report |

### Bug Triage

When issues are discovered, determine the correct action:

| Signal | Is It a Bug? | Action |
| --------------------------------------- | ------------ | ---------------------------------- |
| ci/validate fails after optimization | Possible bug | Invoke `/lading-optimize-validate` |
| Determinism broken | Possible bug | Invoke `/lading-optimize-validate` |
| Conservative finds correctness issue | Possible bug | Invoke `/lading-optimize-validate` |
| Benchmark regression but correct output | NOT a bug | REJECT optimization |
| No improvement but correct output | NOT a bug | REJECT optimization |

### When Bug Is Found

Expand All @@ -180,101 +177,31 @@ If review discovers a bug instead of an optimization:
```

The validate skill will:
1. Attempt Kani proof (if feasible)
2. Create property test reproducing the bug
3. Verify the fix works
4. Record in its db.yaml
1. Confirm this is actually a bug (not just a failed optimization)
2. Attempt Kani proof (if feasible)
3. Create property test reproducing the bug
4. Verify the fix works
5. Record in its db.yaml

Then return here to record the finding as BUG_FOUND in Phase 6.
Then return here to return a BUG_FOUND report in Phase 6.

---

## Phase 6: MANDATORY Recording

### Update db.yaml

1. Add entry to `assets/db.yaml` index
2. Create detailed file in `assets/db/` directory
## Phase 5: Return Report

**assets/db.yaml entry:**
```yaml
entries:
- id: <descriptive-name>
verdict: <approved|rejected|duplicate|bug_found>
file: assets/db/<review-name>.yaml
```

**assets/db/<review-name>.yaml** for APPROVED:
```yaml
id: <descriptive-name>
target: <file:function>
technique: <technique>
verdict: approved
date: <YYYY-MM-DD>
votes:
duplicate_hunter: approve
skeptic: approve
conservative: approve
rust_expert: approve
greybeard: approve
measurements:
time: <-X%>
memory: <-X%>
allocations: <-X%>
reason: |
<summary of why approved>
lessons: |
<pattern learned>
```
**Review does NOT record results and does NOT create files. Return a structured YAML report to the caller.**

**assets/db/<review-name>.yaml** for REJECTED:
```yaml
id: <descriptive-name>
target: <file:function>
technique: <technique>
verdict: rejected
date: <YYYY-MM-DD>
votes:
duplicate_hunter: <approve|reject>
skeptic: <approve|reject>
conservative: <approve|reject>
rust_expert: <approve|reject>
greybeard: <approve|reject>
reason: |
<why rejected - which persona(s) and why>
lessons: |
<what NOT to do next time>
```
Fill in the appropriate template and return the completed YAML:

**assets/db/<review-name>.yaml** for DUPLICATE:
```yaml
id: <descriptive-name>
target: <file:function>
technique: <technique>
verdict: duplicate
date: <YYYY-MM-DD>
duplicate_of: <existing entry>
reason: |
<explanation of duplication>
```
| Verdict | Template |
| --------- | -------------------------------- |
| approved | `assets/approved.template.yaml` |
| rejected | `assets/rejected.template.yaml` |
| duplicate | `assets/duplicate.template.yaml` |
| bug_found | `assets/bug-found.template.yaml` |

**assets/db/<review-name>.yaml** for BUG_FOUND:
```yaml
id: <descriptive-name>
target: <file:function>
verdict: bug_found
date: <YYYY-MM-DD>
validation_file: <path to validate db entry>
reason: |
<what bug was found>
lessons: |
<what was learned>
```
1. Read the appropriate template from the `assets/` directory
2. Fill in all placeholder values with actual data from the review
3. Return the filled-in report

---

## Usage

```
/lading-optimize-review
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
id: <descriptive-name>
target: <file:function>
technique: <technique>
date: <YYYY-MM-DD>
status: success
verdict: approved
votes:
duplicate_hunter: approve
skeptic: approve
conservative: approve
rust_expert: approve
greybeard: approve
measurements:
benchmarks:
micro:
# Flat keys: {name}_{size}_{metric}
# With raw values: {name}_{size}_{metric}: <baseline> -> <optimized> (<improvement%>)
# Without raw values: {name}_{size}_{metric}: <improvement%>
macro:
# Required: time, memory, allocations
# Optional: peak_live
# With raw values: <category>: <baseline> -> <optimized> (<improvement%>)
# Without raw values: <category>: <improvement%>
time:
memory:
allocations:
reason: |
<summary of why approved>
lessons: |
<pattern learned if any>
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
id: <descriptive-name>
target: <file:function>
technique: <technique>
date: <YYYY-MM-DD>
status: failure
verdict: bug_found
reason: |
<what bug was found>
7 changes: 0 additions & 7 deletions .claude/skills/lading-optimize-review/assets/db.yaml

This file was deleted.

Loading
Loading