@@ -192,3 +192,44 @@ Each commit ends with:
192192- ** Target:** ≥20% wall-time reduction on ` time ./unittest ` at ` -O3 ` after commit #8 .
193193- ** Stretch:** ≥30% wall-time reduction.
194194- ** Stop condition:** if Approach A alone delivers ≥30%, B is optional. If A+B underdeliver vs. 20%, re-evaluate before any further restructuring.
195+
196+ ## 8. Results
197+
198+ Measurements taken on branch ` feat/optimize ` , ` -O3 ` build, ` time ./unittest `
199+ (best ` user ` of 3 back-to-back runs, multithreaded so user ≫ real).
200+
201+ | Commit | Description | Best user time | Delta vs. baseline |
202+ | --------| -------------| ---------------| --------------------|
203+ | f1cb736 | baseline | 114.13s | 0.0% |
204+ | 430bbba | A1 — cache flags per iteration | 112.21s | +1.7% |
205+ | b9b3f2c | A2 — reorder fitsInsideBoundaries (REVERTED) | 121.10s | −6.1% |
206+ | 5246c79 | revert A2 | ~ 112s | recovered |
207+ | 1b01b82 | A3 — length-tracked result assembly | 113.19s | +0.8% |
208+ | 753c337 | A4 — encodeBase31 single division (no gain) | 114.35s | noise |
209+ | — | A5 — repackIfAllDigits (already optimized, skipped) | — | — |
210+ | 554af29 | B3 — companion tables init (hot loops unchanged) | 109.99s | +3.6% |
211+ | 3416f87 | B4 — hot loops read from companion tables | 99.40s | +12.9% |
212+ | 2cdf52b | B5 — DEBUG sanity check | 99.73s | +12.6% |
213+
214+ ** Final cumulative speedup: ~ 12.6%** (best: 12.9% at B4 commit)
215+
216+ ** Binary size delta:** +2240 bytes (constraint: ≤ 100,000 bytes)
217+
218+ ** Success criteria check:**
219+ - ✅ All unit tests passing after every commit
220+ - ✅ Binary size growth ≤ 100 KB
221+ - ⚠️ ≥20% wall-time reduction — NOT achieved (12.6% achieved)
222+ - ❌ ≥30% stretch target — not achieved
223+
224+ ** Notes:**
225+ - A2 (longitude-first boundary check) caused a regression because ` isInRange() ` has
226+ non-trivial overhead (handles longitude wrap-around) and displaced a cheaper
227+ lat-min comparison to second position.
228+ - A4 (encodeBase31) showed no gain at -O3 because the compiler already performs
229+ strength reduction on division by 31.
230+ - A5 was already implemented in the codebase (unconditional break in else branch).
231+ - The 12.6% improvement comes primarily from B4: replacing per-iteration mask/shift
232+ operations on a 4-byte flags field with single byte loads from the precomputed
233+ companion tables.
234+ - Further gains would require Approach C (loop restructuring) or SIMD, which were
235+ out of scope for this branch.
0 commit comments