diff --git a/.claude/skills/lading-optimize-review/SKILL.md b/.claude/skills/lading-optimize-review/SKILL.md index d9dcdee56..f5e92998b 100644 --- a/.claude/skills/lading-optimize-review/SKILL.md +++ b/.claude/skills/lading-optimize-review/SKILL.md @@ -1,16 +1,18 @@ --- name: lading-optimize-review description: Reviews optimization patches for lading using a 5-persona peer review system. Requires unanimous approval backed by benchmarks. Bugs discovered during review are valuable - invoke /lading-optimize-validate to validate them. -allowed-tools: Bash(cat:*) Bash(ci/*:*) Read Write Edit Glob Grep Skill +allowed-tools: Bash(cat:*) Bash(sample:*) Bash(samply:*) Bash(cargo:*) Bash(ci/*:*) Bash(hyperfine:*) Bash(*/payloadtool:*) Bash(tee:*) Read Write Edit Glob Grep Skill --- # Optimization Patch Review A rigorous 5-persona peer review system for optimization patches in lading. Requires unanimous approval backed by concrete benchmark data. **Duplicate Hunter persona prevents redundant work.** -## CRITICAL: Recording is MANDATORY +## Role: Judge -**EVERY review outcome MUST be recorded in db.yaml.** +**Review is the decision-maker. It does NOT record results.** + +Review judges using benchmarks and 5-persona review, then returns a structured report. ## Valuable Outcomes @@ -24,62 +26,43 @@ A rigorous 5-persona peer review system for optimization patches in lading. Requ --- -## Phase 0: Pre-flight - -Run `/lading-preflight` first. Then check for duplicate work: - -```bash -# Check previous reviews -cat .claude/skills/lading-optimize-review/assets/db.yaml +## Phase 1: Benchmark Execution -# Check previous hunts -cat .claude/skills/lading-optimize-hunt/assets/db.yaml -``` +### Step 1: Read Baseline Data -**Check for:** -- Same optimization already reviewed -> REJECT as "DUPLICATE" -- Same file + technique already approved -> REJECT as "DUPLICATE" +Read the baseline benchmark files captured: ---- +- `/tmp/criterion-baseline.log` — micro-benchmark baseline +- `/tmp/baseline.json` — macro-benchmark timing baseline +- `/tmp/baseline-mem.txt` — macro-benchmark memory baseline -## Phase 1: Validation Gate +**If baseline data is missing -> REJECT. Baselines must be captured before any code change and before this gets invoked.** -**MANDATORY: ci/validate must pass before any review proceeds.** +### Step 2: Run Post-Change Micro-benchmarks ```bash -ci/validate +cargo criterion 2>&1 | tee /tmp/criterion-optimized.log ``` -**If ci/validate fails -> REJECT immediately. No exceptions.** - ---- +Note: Criterion automatically compares against the last run and reports percentage changes. -## Phase 2: Measurement +Compare results — look for "change:" lines showing improvement/regression. -**Review existing benchmark data.** The optimization should already have benchmark results. If missing, request them or REJECT. +Example output: `time: [1.2345 ms 1.2456 ms 1.2567 ms] change: [-5.1234% -4.5678% -4.0123%]` -**The optimization being reviewed MUST provide benchmark data.** Review should focus on analyzing existing data, not generating new benchmarks. +### Step 3: Run Post-Change Macro-benchmarks -### Expected Benchmark Data +Use the config provided by the caller (stored in `$PAYLOADTOOL_CONFIG`): -The optimization should include: - -**Micro-benchmarks (criterion):** -- Benchmark name and throughput change -- Example: `syslog_100MiB: +42.0% throughput (481 -> 683 MiB/s)` - -**Macro-benchmarks (payloadtool with hyperfine):** -- Time change with absolute values -- Memory change with absolute values -- Allocation count change with absolute values -- Example: - ``` - Time: -14.5% (8.3 ms -> 7.1 ms) - Memory: -35.8% (6.17 MiB -> 3.96 MiB) - Allocations: -49.3% (67,688 -> 34,331) - ``` +```bash +cargo build --release --bin payloadtool +hyperfine --warmup 3 --runs 30 --export-json /tmp/optimized.json \ + "./target/release/payloadtool $PAYLOADTOOL_CONFIG" +./target/release/payloadtool "$PAYLOADTOOL_CONFIG" --memory-stats 2>&1 | tee /tmp/optimized-mem.txt +``` ### Statistical Requirements + - Minimum 30 runs for hyperfine (`--runs 30`) - Criterion handles statistical significance internally - Time improvement >= 5% for significance @@ -88,31 +71,33 @@ The optimization should include: ### NO EXCEPTIONS -**If benchmark data is missing -> REJECT. Period.** - -- "Test dependencies don't work" -> REJECT. Fix deps first. +- "Test dependencies don't work" -> REJECT. Fix dependencies first. - "Theoretically better" -> REJECT. Prove it with numbers. - "Obviously an improvement" -> REJECT. Obvious is not measured. - "Will benchmark later" -> REJECT. Benchmark now. --- -## Phase 3: Five-Persona Review +## Phase 2: Five-Persona Review ### 1. Duplicate Hunter (Checks for Redundant Work) -- [ ] Optimization not already in reviews.yaml +- [ ] Optimization not already in db.yaml - [ ] File + technique combo not already approved - [ ] No substantially similar optimization exists - [ ] If duplicate found -> REJECT with "DUPLICATE: see " ### 2. Skeptic (Demands Proof) +- [ ] Baseline data verified present +- [ ] Post-change benchmarks executed with identical methodology - [ ] Hot path verified via profiling (not just guessed) -- [ ] Benchmark improvement meets thresholds (5% time, 10% mem, 20% allocs) +- [ ] Micro threshold met (>=5% time) +- [ ] Macro threshold met (>=5% time OR >=10% mem OR >=20% allocs) - [ ] Statistical significance confirmed (p<0.05 or criterion "faster") - [ ] Improvement is real, not measurement noise +- [ ] Benchmark methodology sound (same config, same machine) ### 3. Conservative (Guards Correctness) -- [ ] `ci/validate` passes completely +- [ ] Run `ci/validate` and validate that it passes completely - [ ] No semantic changes to output - [ ] **Determinism preserved** (same seed -> same output) - [ ] No `.unwrap()` or `.expect()` added (lading MUST NOT panic) @@ -136,7 +121,7 @@ The optimization should include: --- -## Phase 4: Kani/Property Test Check +## Phase 3: Kani/Property Test Check If the optimization touches critical code: @@ -162,14 +147,26 @@ ci/kani lading_payload --- -## Phase 5: Decision +## Phase 4: Decision | Outcome | Votes | Action | |---------|-------|--------| -| **APPROVED** | 5/5 APPROVE | Record success | -| **REJECTED** | Any REJECT | Record lesson | -| **DUPLICATE** | Duplicate Hunter REJECT | Record as DUPLICATE | -| **BUG FOUND** | Correctness issue | Invoke `/lading-optimize-validate` | +| **APPROVED** | 5/5 APPROVE | Return APPROVED report | +| **REJECTED** | Any REJECT | Return REJECTED report | +| **DUPLICATE** | Duplicate Hunter REJECT | Return DUPLICATE report | +| **BUG FOUND** | Correctness issue | Invoke `/lading-optimize-validate`, return BUG_FOUND report | + +### Bug Triage + +When issues are discovered, determine the correct action: + +| Signal | Is It a Bug? | Action | +| --------------------------------------- | ------------ | ---------------------------------- | +| ci/validate fails after optimization | Possible bug | Invoke `/lading-optimize-validate` | +| Determinism broken | Possible bug | Invoke `/lading-optimize-validate` | +| Conservative finds correctness issue | Possible bug | Invoke `/lading-optimize-validate` | +| Benchmark regression but correct output | NOT a bug | REJECT optimization | +| No improvement but correct output | NOT a bug | REJECT optimization | ### When Bug Is Found @@ -180,101 +177,31 @@ If review discovers a bug instead of an optimization: ``` The validate skill will: -1. Attempt Kani proof (if feasible) -2. Create property test reproducing the bug -3. Verify the fix works -4. Record in its db.yaml +1. Confirm this is actually a bug (not just a failed optimization) +2. Attempt Kani proof (if feasible) +3. Create property test reproducing the bug +4. Verify the fix works +5. Record in its db.yaml -Then return here to record the finding as BUG_FOUND in Phase 6. +Then return here to return a BUG_FOUND report in Phase 6. --- -## Phase 6: MANDATORY Recording - -### Update db.yaml - -1. Add entry to `assets/db.yaml` index -2. Create detailed file in `assets/db/` directory +## Phase 5: Return Report -**assets/db.yaml entry:** -```yaml -entries: - - id: - verdict: - file: assets/db/.yaml -``` - -**assets/db/.yaml** for APPROVED: -```yaml -id: -target: -technique: -verdict: approved -date: -votes: - duplicate_hunter: approve - skeptic: approve - conservative: approve - rust_expert: approve - greybeard: approve -measurements: - time: <-X%> - memory: <-X%> - allocations: <-X%> -reason: | - -lessons: | - -``` +**Review does NOT record results and does NOT create files. Return a structured YAML report to the caller.** -**assets/db/.yaml** for REJECTED: -```yaml -id: -target: -technique: -verdict: rejected -date: -votes: - duplicate_hunter: - skeptic: - conservative: - rust_expert: - greybeard: -reason: | - -lessons: | - -``` +Fill in the appropriate template and return the completed YAML: -**assets/db/.yaml** for DUPLICATE: -```yaml -id: -target: -technique: -verdict: duplicate -date: -duplicate_of: -reason: | - -``` +| Verdict | Template | +| --------- | -------------------------------- | +| approved | `assets/approved.template.yaml` | +| rejected | `assets/rejected.template.yaml` | +| duplicate | `assets/duplicate.template.yaml` | +| bug_found | `assets/bug-found.template.yaml` | -**assets/db/.yaml** for BUG_FOUND: -```yaml -id: -target: -verdict: bug_found -date: -validation_file: -reason: | - -lessons: | - -``` +1. Read the appropriate template from the `assets/` directory +2. Fill in all placeholder values with actual data from the review +3. Return the filled-in report --- - -## Usage - -``` -/lading-optimize-review -``` diff --git a/.claude/skills/lading-optimize-review/assets/approved.template.yaml b/.claude/skills/lading-optimize-review/assets/approved.template.yaml new file mode 100644 index 000000000..443159e2b --- /dev/null +++ b/.claude/skills/lading-optimize-review/assets/approved.template.yaml @@ -0,0 +1,30 @@ +id: +target: +technique: +date: +status: success +verdict: approved +votes: + duplicate_hunter: approve + skeptic: approve + conservative: approve + rust_expert: approve + greybeard: approve +measurements: + benchmarks: + micro: + # Flat keys: {name}_{size}_{metric} + # With raw values: {name}_{size}_{metric}: -> () + # Without raw values: {name}_{size}_{metric}: + macro: + # Required: time, memory, allocations + # Optional: peak_live + # With raw values: : -> () + # Without raw values: : + time: + memory: + allocations: +reason: | + +lessons: | + diff --git a/.claude/skills/lading-optimize-review/assets/bug-found.template.yaml b/.claude/skills/lading-optimize-review/assets/bug-found.template.yaml new file mode 100644 index 000000000..6212082e7 --- /dev/null +++ b/.claude/skills/lading-optimize-review/assets/bug-found.template.yaml @@ -0,0 +1,8 @@ +id: +target: +technique: +date: +status: failure +verdict: bug_found +reason: | + diff --git a/.claude/skills/lading-optimize-review/assets/db.yaml b/.claude/skills/lading-optimize-review/assets/db.yaml deleted file mode 100644 index c4b6256e3..000000000 --- a/.claude/skills/lading-optimize-review/assets/db.yaml +++ /dev/null @@ -1,7 +0,0 @@ -entries: - - id: fluent-on-demand-serialization - verdict: approved - file: assets/db/opt-fluent-on-demand-serialization.yaml - - id: payload-cache-inline - verdict: approved - file: assets/db/opt-payload-cache-inline.yaml diff --git a/.claude/skills/lading-optimize-review/assets/db/opt-fluent-on-demand-serialization.yaml b/.claude/skills/lading-optimize-review/assets/db/opt-fluent-on-demand-serialization.yaml deleted file mode 100644 index ae337afdc..000000000 --- a/.claude/skills/lading-optimize-review/assets/db/opt-fluent-on-demand-serialization.yaml +++ /dev/null @@ -1,107 +0,0 @@ -id: fluent-on-demand-serialization -verdict: approved -date: 2026-01-22 -target: lading_payload/src/fluent.rs::to_bytes -technique: on-demand-serialization - -votes: - duplicate_hunter: approve - skeptic: approve - conservative: approve - rust_expert: approve - greybeard: approve - -measurements: - time: -82.6% - memory: -90.7% - allocations: -82.6% - peak_live: -53.6% - - micro_benchmarks: - fluent_1MiB: +132% throughput (34.27 -> 79.65 MiB/s) - fluent_10MiB: +21% throughput (97.57 -> 117.97 MiB/s) - fluent_100MiB: +3% throughput (118.68 -> 122.22 MiB/s) - - macro_benchmarks: - time_baseline: 2347 ms - time_optimized: 409.8 ms - allocations_baseline: 3,622,824 - allocations_optimized: 629,596 - memory_baseline: 1.499 GiB - memory_optimized: 143.1 MiB - -reason: | - Unanimous approval from all five personas. - - Duplicate Hunter: No duplicate work detected. This is a unique optimization of fluent.rs - using on-demand serialization technique. - - Skeptic: Outstanding benchmark data exceeding all thresholds by wide margins: - - Time improvement: -82.6% (threshold: 5%) - 16x better than required - - Memory improvement: -90.7% (threshold: 10%) - 9x better than required - - Allocation reduction: -82.6% (threshold: 20%) - 4x better than required - Micro-benchmarks show 132% throughput improvement at 1 MiB. Statistical significance - confirmed with Criterion (100 samples) and hyperfine (10 runs). - - Conservative: All correctness requirements met: - - ci/validate passes completely (60/60 tests) - - No semantic changes to output format - - Determinism preserved (same seed produces same output) - - No unwrap/expect added (uses ? for error propagation) - - Property test payload_not_exceed_max_bytes validates core invariant - - Rust Expert: Excellent Rust patterns demonstrated: - - Use statements properly at file top - - Pre-computation in initialization (buffer with 4096 capacity) - - Proper borrow checker handling with std::mem::take idiom - - No unnecessary allocations in hot path - - Eliminated Vec> pattern completely - - Greybeard: Simplicity achieved: - - Actually simpler than original (one loop vs two-loop binary search) - - Clear comments explain non-obvious borrow checker pattern - - Minimal scope (41 lines removed, 35 added) - - Classic streaming serialization pattern ("obviously fast") - - Complexity fully justified by 90.7% memory reduction - -validation: - ci_validate: passed - tests_passed: 60/60 - cargo_fmt: passed - cargo_clippy: passed - shellcheck: passed - kani: not_applicable (lading_payload has no Kani proofs, property tests sufficient) - property_tests: payload_not_exceed_max_bytes validates core invariant - -lessons: | - This is a textbook example of a successful optimization: - - 1. **Clear hot path identification**: Vec> pre-serialization was obvious bottleneck - - 2. **Classic optimization pattern**: Stream serialization instead of batch-then-serialize - - 3. **Overwhelming performance gains**: - - 5.7x speedup (2347ms -> 409ms) - - 10x memory reduction (1.5 GiB -> 143 MiB) - - Eliminated 3 million allocations per run - - 4. **No correctness compromises**: All tests pass, determinism preserved - - 5. **Proper Rust idioms**: std::mem::take for borrow checker, ? for errors - - 6. **Actually simpler**: Replaced complex two-loop algorithm with single streaming loop - - Pattern to replicate: - - Identify Vec> pre-collection patterns - - Replace with on-demand processing + reusable buffer - - Use std::mem::take when buffer and data source share ownership - - Stream output incrementally rather than batch processing - - This optimization should be considered the gold standard for lading payload optimizations. - It demonstrates that the biggest wins come from eliminating algorithmic inefficiencies - (batch processing) rather than micro-optimizing hot loops. - - Future work: - - Survey other payload formats for similar Vec> patterns - - Consider if buffer size (4096) is optimal or should be tuned per format - - Document this pattern in contribution guidelines for future optimizers diff --git a/.claude/skills/lading-optimize-review/assets/db/opt-payload-cache-inline.yaml b/.claude/skills/lading-optimize-review/assets/db/opt-payload-cache-inline.yaml deleted file mode 100644 index f744a7b14..000000000 --- a/.claude/skills/lading-optimize-review/assets/db/opt-payload-cache-inline.yaml +++ /dev/null @@ -1,47 +0,0 @@ -id: payload-cache-inline -verdict: approved -date: 2026-01-20 -target: lading_payload/src/block.rs:Cache methods -technique: inline -votes: - duplicate_hunter: approve - skeptic: approve - conservative: approve - rust_expert: approve - greybeard: approve -measurements: - time: -7.2% - memory: ~0% - allocations: ~0% -reason: | - Added #[inline] attributes to five Cache methods (handle, total_size, peek_next_size, - peek_next_metadata, advance) to enable cross-crate inlining. These are small, hot-path - methods called 23+ times across 11 generator files. - - Without #[inline], Rust cannot inline public methods across crate boundaries, forcing - function call overhead on every generator iteration. The advance() method is particularly - critical, called for every block transmitted. - - All five personas approved unanimously: - - Duplicate Hunter: Confirmed new optimization, no duplicates - - Skeptic: 7.2% time improvement with statistical significance (6.914ms → 6.413ms) - - Conservative: ci/validate passed, determinism preserved, no semantic changes - - Rust Expert: Standard Rust performance practice, follows lading patterns - - Greybeard: Minimal change (5 one-line additions), justified by 7.2% improvement - - Kani proofs ran successfully (no harnesses exist, which is acceptable). -lessons: | - Cross-crate inlining in Rust requires explicit #[inline] attributes on public methods. - The compiler cannot inline across crate boundaries without this hint, even for trivial - methods. This optimization pattern is particularly valuable for: - - 1. Small accessor methods (1-3 lines) - 2. Methods called in hot loops - 3. Public APIs in library crates consumed by application crates - - The 7.2% improvement came from eliminating just 5 function call overheads, demonstrating - that cross-crate call overhead is non-trivial. The 58% reduction in variance suggests - more predictable performance. - - Key principle: When optimizing library crates, always consider cross-crate call patterns, - not just within-crate optimizations. diff --git a/.claude/skills/lading-optimize-review/assets/duplicate.template.yaml b/.claude/skills/lading-optimize-review/assets/duplicate.template.yaml new file mode 100644 index 000000000..4da4dd981 --- /dev/null +++ b/.claude/skills/lading-optimize-review/assets/duplicate.template.yaml @@ -0,0 +1,8 @@ +id: +target: +technique: +date: +status: failure +verdict: duplicate +reason: | + diff --git a/.claude/skills/lading-optimize-review/assets/rejected.template.yaml b/.claude/skills/lading-optimize-review/assets/rejected.template.yaml new file mode 100644 index 000000000..abd39eb38 --- /dev/null +++ b/.claude/skills/lading-optimize-review/assets/rejected.template.yaml @@ -0,0 +1,30 @@ +id: +target: +technique: +date: +status: failure +verdict: rejected +votes: + duplicate_hunter: approve|reject + skeptic: approve|reject + conservative: approve|reject + rust_expert: approve|reject + greybeard: approve|reject +measurements: + benchmarks: + micro: + # Flat keys: {name}_{size}_{metric} + # With raw values: {name}_{size}_{metric}: -> () + # Without raw values: {name}_{size}_{metric}: + macro: + # Required: time, memory, allocations + # Optional: peak_live + # With raw values: : -> () + # Without raw values: : + time: + memory: + allocations: +reason: | + +lessons: | +