onebrc-probe: lane S — SWAR delimiter scan + branchless temp parse (+23% v3, +31% native)#637
Conversation
Adds a SWAR group-by lane on top of lane F's flat open-addressed table (its group-by was already right — only scan/parse varies): - (b) SWAR delimiter find: haszero bit trick over u64 (`x = word ^ needle; (x - ONES) & !x & HIGH`), 8 bytes/step, scalar tail — replaces the byte-by-byte `while data[i] != b';'` loop. - (b) branchless temp parse: `-?\d?\d.\d` -> fixed-point tenths. - (c) name compare: kept as `&[u8] == &[u8]` (LLVM lowers to memcmp). Reuses lane F's SoaTable / fnv1a64 / morton_slot / table_to_map verbatim and the same chunk_bounds/merge_maps threading; std-only, zero-dep. Measured (10M rows, 4 workers, n=11 median): +23% at v3 (103.9 vs 84.6), +31% native (96.9 vs 74.0) over the plain-scalar flat table F. Parity: lane S produces aggregates identical to lane A (test lane_s_agrees_with_lane_a). Compute-only metric; mmap (read is outside the timer) is not measurable here. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
|
Warning Review limit reached
Next review available in: 36 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new SWAR-accelerated probe lane ( ChangesLane S SWAR Implementation
Estimated code review effort: 3 (Moderate) | ~25 minutes Sequence Diagram(s)sequenceDiagram
participant CLI
participant lane_s_swar
participant WorkerThread
participant accumulate_swar
participant SoaTable
CLI->>lane_s_swar: lane_s_swar(data, workers)
lane_s_swar->>lane_s_swar: compute chunk_bounds
loop per chunk
lane_s_swar->>WorkerThread: spawn scoped thread(chunk)
WorkerThread->>accumulate_swar: scan & parse chunk
accumulate_swar->>SoaTable: observe(name, temp)
WorkerThread-->>lane_s_swar: chunk map
end
lane_s_swar->>lane_s_swar: merge chunk maps
lane_s_swar-->>CLI: BTreeMap<String, Stats>
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/onebrc-probe/src/main.rs`:
- Line 135: The lane dispatch already handles "s" in the CLI, but the
user-facing usage/help text and unknown-lane error message are out of date.
Update the help/usage string in main and the unknown-lane branch that prints the
expected lanes so they include the newly supported lane "s" at minimum, and make
sure the listed options stay consistent with the dispatch arms in main.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: b9fb7888-3c1a-4295-97f7-4eb8e47fdfd3
📒 Files selected for processing (3)
crates/onebrc-probe/src/lane_s.rscrates/onebrc-probe/src/lib.rscrates/onebrc-probe/src/main.rs
Addresses CodeRabbit minor (#637): the dispatch handles "s" but the usage/help strings and unknown-lane message didn't list it. Scoped to this PR's lane. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
#637 (lane S) merged to main; this branch is rebased on top, so lane_s and its parity test are present and all ladder lanes (a c r f t8 t s) run here. Restores `s` to the reproduce loop and rewrites the provenance note. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
Idea
Add the two 1BRC-frontier compute levers on top of lane F's flat open-addressed table (its group-by was already right — only scan/parse varies):
haszerobit trick over au64(x = word ^ needle; (x - ONES) & !x & HIGH), 8 bytes/step with a scalar tail, replacing the byte-by-bytewhile data[i] != b';'loop.-?\d?\d.\d→ fixed-point tenths.&[u8] == &[u8], which LLVM lowers to a vectorizedmemcmp.Reuses lane F's
SoaTable/fnv1a64/morton_slot/table_to_mapverbatim and the samechunk_bounds/merge_mapsthreading. std-only, zero external deps.Measured
10M-row corpus, 4 workers, n=11 median,
throughput_mrows_s:Correctness
lane_s_agrees_with_lane_a— SWAR lane produces aggregates identical to lane A on a generated corpus.match_maskuses the strict(x - ONES) & !x & HIGHform;findreturns the earliest match, so the residual borrow-propagation false-positive mode cannot mislocate a delimiter (independently verified by a brutally-honest pre-merge review: LAND verdict, all findings P2).Scope note
throughput_mrows_sis compute-only (main.rsreads the file beforeInstant::now()), so (a) mmap — a wall-clock / 13 GB-allocation lever — is not measurable in this harness and is deliberately not implemented here (it would break the std-only, zero-dep contract).Generated by Claude Code
Summary by CodeRabbit
New Features
sprocessing mode for the CLI.Tests