Skip to content

onebrc-probe: lane S — SWAR delimiter scan + branchless temp parse (+23% v3, +31% native)#637

Merged
AdaWorldAPI merged 2 commits into
mainfrom
claude/onebrc-lane-s-swar
Jul 4, 2026
Merged

onebrc-probe: lane S — SWAR delimiter scan + branchless temp parse (+23% v3, +31% native)#637
AdaWorldAPI merged 2 commits into
mainfrom
claude/onebrc-lane-s-swar

Conversation

@AdaWorldAPI

@AdaWorldAPI AdaWorldAPI commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Idea

Add the two 1BRC-frontier compute levers on top of lane F's flat open-addressed table (its group-by was already right — only scan/parse varies):

  • (b) SWAR delimiter find — the haszero bit trick over a u64 (x = word ^ needle; (x - ONES) & !x & HIGH), 8 bytes/step with a scalar tail, replacing the byte-by-byte while data[i] != b';' loop.
  • (b) branchless temp parse-?\d?\d.\d → fixed-point tenths.
  • (c) name compare — kept as &[u8] == &[u8], which LLVM lowers to a vectorized memcmp.

Reuses lane F's SoaTable / fnv1a64 / morton_slot / table_to_map verbatim and the same chunk_bounds/merge_maps threading. std-only, zero external deps.

Measured

10M-row corpus, 4 workers, n=11 median, throughput_mrows_s:

build flat F (best) SWAR S (best) Δ
v3 (x86-64-v3) 84.6 103.9 +23%
native 74.0 96.9 +31%

Correctness

  • Parity: lane_s_agrees_with_lane_a — SWAR lane produces aggregates identical to lane A on a generated corpus.
  • The match_mask uses the strict (x - ONES) & !x & HIGH form; find returns the earliest match, so the residual borrow-propagation false-positive mode cannot mislocate a delimiter (independently verified by a brutally-honest pre-merge review: LAND verdict, all findings P2).

Scope note

throughput_mrows_s is compute-only (main.rs reads the file before Instant::now()), so (a) mmap — a wall-clock / 13 GB-allocation lever — is not measurable in this harness and is deliberately not implemented here (it would break the std-only, zero-dep contract).


Generated by Claude Code

Summary by CodeRabbit

  • New Features

    • Added a new s processing mode for the CLI.
    • Introduced a faster parsing path that improves record scanning and temperature handling.
    • Results are still merged consistently with existing output behavior.
  • Tests

    • Added coverage for delimiter detection, temperature parsing, and end-to-end output consistency.

Adds a SWAR group-by lane on top of lane F's flat open-addressed table
(its group-by was already right — only scan/parse varies):

- (b) SWAR delimiter find: haszero bit trick over u64
  (`x = word ^ needle; (x - ONES) & !x & HIGH`), 8 bytes/step, scalar tail
  — replaces the byte-by-byte `while data[i] != b';'` loop.
- (b) branchless temp parse: `-?\d?\d.\d` -> fixed-point tenths.
- (c) name compare: kept as `&[u8] == &[u8]` (LLVM lowers to memcmp).

Reuses lane F's SoaTable / fnv1a64 / morton_slot / table_to_map verbatim
and the same chunk_bounds/merge_maps threading; std-only, zero-dep.

Measured (10M rows, 4 workers, n=11 median): +23% at v3 (103.9 vs 84.6),
+31% native (96.9 vs 74.0) over the plain-scalar flat table F. Parity:
lane S produces aggregates identical to lane A (test lane_s_agrees_with_lane_a).
Compute-only metric; mmap (read is outside the timer) is not measurable here.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
@coderabbitai

coderabbitai Bot commented Jul 4, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@AdaWorldAPI, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 36 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b88a2993-1c0e-4aa1-bcad-bb2d29c60945

📥 Commits

Reviewing files that changed from the base of the PR and between 8a14994 and 3b62b78.

📒 Files selected for processing (1)
  • crates/onebrc-probe/src/main.rs
📝 Walkthrough

Walkthrough

Adds a new SWAR-accelerated probe lane (lane_s) to the onebrc-probe crate, implementing branchless delimiter search and temperature parsing, an entrypoint that splits work across threads and merges results, crate-level exports, and a new CLI dispatch option ("s") to invoke it.

Changes

Lane S SWAR Implementation

Layer / File(s) Summary
SWAR delimiter search and temperature parsing
crates/onebrc-probe/src/lane_s.rs
Implements match_mask, a SWAR-based find delimiter scanner with scalar tail handling, and parse_tenths for parsing fixed-format temperatures into tenths.
Record accumulation and threaded entrypoint
crates/onebrc-probe/src/lane_s.rs
Adds accumulate_swar to parse records and populate SoaTable, and the public lane_s_swar function that chunks input, spawns worker threads, and merges per-chunk results.
Crate wiring, CLI dispatch, and tests
crates/onebrc-probe/src/lib.rs, crates/onebrc-probe/src/main.rs
Declares and re-exports the lane_s module, adds a "s" lane option in the CLI, and includes tests verifying find, parse_tenths, and equality against lane_a_scalar.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant lane_s_swar
  participant WorkerThread
  participant accumulate_swar
  participant SoaTable

  CLI->>lane_s_swar: lane_s_swar(data, workers)
  lane_s_swar->>lane_s_swar: compute chunk_bounds
  loop per chunk
    lane_s_swar->>WorkerThread: spawn scoped thread(chunk)
    WorkerThread->>accumulate_swar: scan & parse chunk
    accumulate_swar->>SoaTable: observe(name, temp)
    WorkerThread-->>lane_s_swar: chunk map
  end
  lane_s_swar->>lane_s_swar: merge chunk maps
  lane_s_swar-->>CLI: BTreeMap<String, Stats>
Loading

Poem

A rabbit hops through bytes so fast,
SWAR words scanned, no cycles wasted,
Semicolons found, tenths well-tasted,
Threads race parallel, merged at last,
lane_s joins the field — hop hop, contrast! 🐇⚡

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the new lane S implementation and its main performance-oriented parsing changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/onebrc-probe/src/main.rs`:
- Line 135: The lane dispatch already handles "s" in the CLI, but the
user-facing usage/help text and unknown-lane error message are out of date.
Update the help/usage string in main and the unknown-lane branch that prints the
expected lanes so they include the newly supported lane "s" at minimum, and make
sure the listed options stay consistent with the dispatch arms in main.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b9fb7888-3c1a-4295-97f7-4eb8e47fdfd3

📥 Commits

Reviewing files that changed from the base of the PR and between e4bea83 and 8a14994.

📒 Files selected for processing (3)
  • crates/onebrc-probe/src/lane_s.rs
  • crates/onebrc-probe/src/lib.rs
  • crates/onebrc-probe/src/main.rs

Comment thread crates/onebrc-probe/src/main.rs
Repository owner deleted a comment from cursor Bot Jul 4, 2026
Addresses CodeRabbit minor (#637): the dispatch handles "s" but the
usage/help strings and unknown-lane message didn't list it. Scoped to this
PR's lane.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
@AdaWorldAPI AdaWorldAPI merged commit 427e63e into main Jul 4, 2026
6 checks passed
AdaWorldAPI pushed a commit that referenced this pull request Jul 4, 2026
#637 (lane S) merged to main; this branch is rebased on top, so lane_s and
its parity test are present and all ladder lanes (a c r f t8 t s) run here.
Restores `s` to the reproduce loop and rewrites the provenance note.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01MLBnPuScZy6w9di2QEjsXM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants