onebrc-probe: lane T — HHTL trie group-by + trie-vs-RAM measurement (honest negative result)#638
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c820fa104
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| | T8 256-ary byte trie | 58.3 | 54.5 | 66.5 | 3.8 | 1.9× | | ||
| | F flat Morton table | 84.6 | 75.1 | 86.3 | 3.6 | 2.7× | | ||
| | R flat radix table | 87.7 | 61.2 | 89.1 | 8.6 | 2.8× | | ||
| | **S SWAR + flat table** | **103.9** | 76.6 | 105.5 | 7.9 | **3.3×** | |
There was a problem hiding this comment.
Remove measurements for the missing S lane
In this commit the crate does not define a SWAR lane (rg 'lane_s|"s"' crates/onebrc-probe only hits this report, and main.rs has no "s" match arm), so this table and the reproduction loop below publish results for a lane users cannot run and whose claimed parity test does not exist. Either add the lane implementation/CLI case or drop the S rows from this T-vs-RAM report to keep the benchmark reproducible.
Useful? React with 👍 / 👎.
…duce loop Addresses codex P2 (#638): on this T-only branch lane S is not defined, so the report published results + a parity test for a lane users cannot run here. Adds a 'Lane S provenance' note (S ships in the companion lane-S PR; runnable lanes on this branch are a c r f t8 t) and drops `s` from the reproduce loop with a note to re-add it once the companion PR merges. S rows are kept in the ladder — the report's purpose is the full RAM-vs-trie comparison and S is the fastest RAM-table method. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
…eport Adds the HHTL trie group-by lane (name-as-prefix-descent instead of hash+slot) in two variants — 16-ary nibble trie (contract::hhtl NiblePath faithful) and 256-ary byte trie — plus a measurement report documenting the full RAM-table-vs-trie ladder. Measured, honest NEGATIVE result (10M rows, 4 workers, n=11 median, mrows/s): the trie is SLOWER than the flat table at ~400-station cardinality — T(16-ary)=54.2, T8(256-ary)=58.3 vs F(flat Morton)=84.6, R(flat radix)=87.7. At this cardinality a single hash + linear probe into a contiguous SoA table beats the trie's dependent-load descent. Parity: both tries produce aggregates identical to lane A (test both_tries_agree_with_lane_a_on_generated_corpus). The report (RESULTS_TRIE_VS_RAM.md) was corrected after adversarial review to report median/spread (n=11), not best-of-3; it flags the trie result as confounded by this arena implementation (runtime-field fanout + in-loop realloc), NOT the trie idea falsified, and marks the "trie wins at routing" claim as unmeasured CONJECTURE. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
…duce loop Addresses codex P2 (#638): on this T-only branch lane S is not defined, so the report published results + a parity test for a lane users cannot run here. Adds a 'Lane S provenance' note (S ships in the companion lane-S PR; runnable lanes on this branch are a c r f t8 t) and drops `s` from the reproduce loop with a note to re-add it once the companion PR merges. S rows are kept in the ladder — the report's purpose is the full RAM-vs-trie comparison and S is the fastest RAM-table method. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
#637 (lane S) merged to main; this branch is rebased on top, so lane_s and its parity test are present and all ladder lanes (a c r f t8 t s) run here. Restores `s` to the reproduce loop and rewrites the provenance note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
086daa8 to
6d624ca
Compare
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 6d624ca. Configure here.
|
Codex Review: Didn't find any major issues. Hooray! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Idea
Test the HHTL trie as the group-by structure — name-as-prefix-descent instead of hash+slot+probe — in two variants:
lane_t_trie) —contract::hhtl::NiblePath-faithful, 2 levels per name byte.lane_t_byte) — one level per byte, the depth control.Same scalar scan +
chunk_bounds/merge_mapsthreading as lanes A/C/F/R; the only variable vs lane F is the accumulator (trie descent instead of hash+slot). std-only.Measured — honest NEGATIVE result
10M rows, 4 workers, n=11 median, mrows/s:
The trie is slower than the flat table at ~400-station cardinality — a single hash + linear probe into a contiguous SoA table beats the trie's dependent-load descent. Parity: both tries produce aggregates identical to lane A (
both_tries_agree_with_lane_a_on_generated_corpus).Honesty guardrails (post-adversarial-review)
The bundled report (
RESULTS_TRIE_VS_RAM.md) was corrected after three review agents flagged the first draft's statistics:Value: a measured, parity-checked negative result that promotes the flat-table path and demotes the trie-for-aggregation hypothesis for this workload, with the confounds named.
Generated by Claude Code