Skip to content

Commit 9f99fa8

Browse files
rijnbclaude
andcommitted
docs: record final perf results for feat/optimize branch
Appends results table to design spec. Final speedup: ~12.6% on `time ./unittest` at -O3. Target was 20-50%; achieved 12.6%. Primary driver: B4 companion-table hot-loop reads. A2 reverted (regression), A4/A5 no-ops at -O3. Binary size delta: +2240 bytes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 2cdf52b commit 9f99fa8

1 file changed

Lines changed: 41 additions & 0 deletions

File tree

docs/superpowers/specs/2026-05-28-mapcodelib-optimize-design.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,3 +192,44 @@ Each commit ends with:
192192
- **Target:** ≥20% wall-time reduction on `time ./unittest` at `-O3` after commit #8.
193193
- **Stretch:** ≥30% wall-time reduction.
194194
- **Stop condition:** if Approach A alone delivers ≥30%, B is optional. If A+B underdeliver vs. 20%, re-evaluate before any further restructuring.
195+
196+
## 8. Results
197+
198+
Measurements taken on branch `feat/optimize`, `-O3` build, `time ./unittest`
199+
(best `user` of 3 back-to-back runs, multithreaded so user ≫ real).
200+
201+
| Commit | Description | Best user time | Delta vs. baseline |
202+
|--------|-------------|---------------|--------------------|
203+
| f1cb736 | baseline | 114.13s | 0.0% |
204+
| 430bbba | A1 — cache flags per iteration | 112.21s | +1.7% |
205+
| b9b3f2c | A2 — reorder fitsInsideBoundaries (REVERTED) | 121.10s | −6.1% |
206+
| 5246c79 | revert A2 | ~112s | recovered |
207+
| 1b01b82 | A3 — length-tracked result assembly | 113.19s | +0.8% |
208+
| 753c337 | A4 — encodeBase31 single division (no gain) | 114.35s | noise |
209+
|| A5 — repackIfAllDigits (already optimized, skipped) |||
210+
| 554af29 | B3 — companion tables init (hot loops unchanged) | 109.99s | +3.6% |
211+
| 3416f87 | B4 — hot loops read from companion tables | 99.40s | +12.9% |
212+
| 2cdf52b | B5 — DEBUG sanity check | 99.73s | +12.6% |
213+
214+
**Final cumulative speedup: ~12.6%** (best: 12.9% at B4 commit)
215+
216+
**Binary size delta:** +2240 bytes (constraint: ≤ 100,000 bytes)
217+
218+
**Success criteria check:**
219+
- ✅ All unit tests passing after every commit
220+
- ✅ Binary size growth ≤ 100 KB
221+
- ⚠️ ≥20% wall-time reduction — NOT achieved (12.6% achieved)
222+
- ❌ ≥30% stretch target — not achieved
223+
224+
**Notes:**
225+
- A2 (longitude-first boundary check) caused a regression because `isInRange()` has
226+
non-trivial overhead (handles longitude wrap-around) and displaced a cheaper
227+
lat-min comparison to second position.
228+
- A4 (encodeBase31) showed no gain at -O3 because the compiler already performs
229+
strength reduction on division by 31.
230+
- A5 was already implemented in the codebase (unconditional break in else branch).
231+
- The 12.6% improvement comes primarily from B4: replacing per-iteration mask/shift
232+
operations on a 4-byte flags field with single byte loads from the precomputed
233+
companion tables.
234+
- Further gains would require Approach C (loop restructuring) or SIMD, which were
235+
out of scope for this branch.

0 commit comments

Comments
 (0)