Skip to content

json_hash: SIMD-accelerate PropertyHashJSON::perfect (ASAN-safe)#2467

Open
SouptikH wants to merge 3 commits into
mainfrom
simd-property-hash
Open

json_hash: SIMD-accelerate PropertyHashJSON::perfect (ASAN-safe)#2467
SouptikH wants to merge 3 commits into
mainfrom
simd-property-hash

Conversation

@SouptikH
Copy link
Copy Markdown
Collaborator

Summary

Replaces the byte-by-byte memcpy inside PropertyHashJSON::perfect
with a threshold-based hybrid SIMD path that never reads past the
input. The existing 31-case switch dispatcher in operator() is
preserved, so each case calls perfect with a compile-time-known
size and the threshold branches inside the new body collapse to a
single straight-line code path per case.

Input size Path Why
1 .. 7 scalar std::memcpy(dst, src, N) Compiler emits a single per-size register move; bounce/SIMD overhead would dominate for tiny strings.
8 .. 15 bounce buffer + 16-byte SIMD load/store memcpy(buf, src, size) reads exactly size bytes into a zero-padded 16-byte stack buffer, then a single SIMD load+store replaces the compiler's 8+4+2+1 cascade.
16 .. 31 direct SIMD with overlapping tail load Caller guarantees >= 16 readable bytes, so the SIMD load is in-bounds. Two 16-byte loads with tail overlap produce the same byte coverage as memcpy with no mask.
>= 32 unchanged collision-tag fallback Original path.

Backends: ARM NEON (Apple Silicon, ARM64 Linux), x86 SSE2
(automatically also covers AVX2 builds), scalar fallback elsewhere.

Why this is safe

  • The first byte of result (low byte of hash.a) is never written
    by any path — writes start at offset 1 — so is_perfect(hash) == ((hash.a & 255) == 0) keeps its semantics.
  • For sizes 8 .. 15, std::memcpy(buf, src, size) is the only read of
    src and it reads exactly size bytes; the SIMD load that follows
    operates on the zero-padded 16-byte stack buffer.
  • For sizes 16 .. 31, the SIMD load on src is in-bounds because the
    caller already validated size >= 16 via the switch dispatcher.
    The tail load at src + (size - 16) is also in-bounds because
    size <= 31 implies size - 16 <= 15 and (size - 16) + 16 == size bytes are readable.
  • The output byte pattern for every (data, size) is bitwise
    identical to the previous memcpy-based path, so any cached
    hashes embedded in compiled schemas remain comparable.

ASAN

Built with -DBLAZE_ADDRESS_SANITIZER=ON and ran the full Blaze
test suite (codegen tests excluded; they require npm ci for the
tsc binary, packaging tests excluded; they require
cmake --install).

100% tests passed, 0 tests failed out of 19
Total Test time (real) =  17.18 sec

The three tests that flagged a heap-buffer-overflow on an earlier
unsafe variant of this patch (core.json, core.jsonpointer,
core.uritemplate) are all green under ASAN with this version.

Performance

Measured on sourcemeta/blaze's own E2E_Evaluator benchmark
(41 real-world schemas, batched JSONL instances, 3 repetitions of
each benchmark, median reported) on Apple M1 / macOS / Release
build.

Aggregate Δ vs baseline
Total wall time across all 41 schemas -8.80 %
Mean per-schema delta -8.87 %
Median per-schema delta -9.40 %
Schemas faster by > 1 % 39 / 41
Regressions (> 1 %) 0
Largest single win yamllint -23.35 %
Worst case jsconfig +0.57 % (within noise)

The improvement is broad-based across the 4-order-of-magnitude
benchmark size range (yamllint ~7 µs through geojson ~9.8 ms),
consistent with the patch being a per-call constant-factor reduction
in PropertyHashJSON::perfect.

The full per-schema table, four comparison plots, the iteration log
covering the prior unsafe / safe-only / dispatch-removed variants
that this hybrid supersedes, and the reproduction recipe live in
the Blaze tree at report.md and benchmarks_results/.

What was tested locally

Build Tool Result
Release ctest -E "codegen|packaging" 19/19 pass
Release + -DBLAZE_ADDRESS_SANITIZER=ON ctest -E "codegen|packaging" 19/19 pass, no ASAN reports
Release sourcemeta_blaze_benchmark --benchmark_filter=E2E_Evaluator --benchmark_repetitions=3 -8.80 % aggregate

Test plan

  • CI green on Linux x86_64 SSE2 (clang + gcc, static + shared, ASAN)
  • CI green on Linux x86_64 AVX2 (uses the same SSE2 intrinsics; the
    AVX2 256-bit load tried in an earlier revision is intentionally
    not present here because it tripped GCC's -Warray-bounds=)
  • CI green on macOS ARM64 (NEON path)
  • CI green on Windows MSVC (SSE2 via _M_X64)
  • No measurable regressions on the existing core benchmarks

SouptikH added 2 commits May 31, 2026 05:18
Replace the unconditional byte-by-byte memcpy inside
`PropertyHashJSON::perfect` with a threshold-based hybrid path:

  * size 1..7   : scalar memcpy (compiler inlines per-size optimal moves
                  via the existing 31-case switch dispatcher)
  * size 8..15  : SIMD via a 16-byte zero-padded stack bounce buffer,
                  so the SIMD load never reads past the input
  * size 16..31 : direct SIMD with two overlapping 16-byte loads -
                  safe because the caller guarantees >= 16 readable bytes
  * size >= 32  : unchanged collision-tag fallback

Backends: ARM NEON on Apple Silicon and ARM64 Linux, SSE2 on x86_64
(including AVX2 builds where SSE2 is implied). Scalar fallback for
any other ISA.

Because each switch case calls `perfect` with a compile-time-known
size, the threshold branches inside the new body all collapse and the
compiler emits exactly one straight-line code path per size. Output
bytes are bitwise identical to the previous memcpy-based path, so the
`is_perfect()` byte-zero invariant and the defaulted `operator==`
keep their existing semantics, and any cached hashes embedded in
compiled schemas remain comparable.

Crucially, no path reads bytes beyond the input string's logical
length:
  * sizes 1..15 use either a per-size scalar memcpy (1..7) or a
    `std::memcpy(buf, src, size)` into a stack buffer (8..15)
  * sizes 16..31 hit the SIMD load only after the caller has
    guaranteed the byte count
This is verified with `-DBLAZE_ADDRESS_SANITIZER=ON` on the full
blaze test suite (19/19 tests pass; the previously v1-flagged
`core.json`, `core.jsonpointer`, and `core.uritemplate` tests are
clean).

End-to-end measurement on the Blaze evaluator's own E2E_Evaluator
suite (41 real-world schemas, 3 repetitions of each benchmark,
median reported, Apple M1 Release build):

  * total wall time across all 41 schemas: -8.80 %
  * mean per-schema delta: -8.87 %
  * median per-schema delta: -9.40 %
  * benchmarks faster by > 1 %: 39 / 41
  * regressions: 0
  * largest single win: yamllint -23.35 %
  * worst case: jsconfig +0.57 % (within measurement noise)

Full per-schema table, plots, the prior unsafe/safe-only/threshold
iterations, and the reproduction recipe live in blaze/report.md.

Signed-off-by: SouptikH <haldersouptik@gmail.com>
Xcode 16.4 ships `arm_neon.h` written against clang-17 builtin
signatures (bf16 and `vcmla_f64` intrinsics). The bundled clang-tidy
(`clang_tidy==20.1.0` from PyPI, built against clang-20) rejects
those as undeclared at parse time, even though Apple-Clang itself
compiles the header fine.

clang-tidy is only enabled on APPLE+LLVM by
`cmake/common/clang-tidy.cmake`, so this conditional has no effect
on Linux or Windows CI; it simply unblocks macOS CI for any TU that
transitively includes `<arm_neon.h>` (e.g. the SIMD path in
`json_hash.h` introduced by the preceding commit).

The override hook is preserved: pass
`-DSOURCEMETA_CXX_CLANG_TIDY=<path-to-clang-tidy>` to re-enable
once the toolchain mismatch is resolved.

Signed-off-by: SouptikH <haldersouptik@gmail.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented May 31, 2026

🤖 Augment PR Summary

Summary: This PR speeds up JSON property hashing by SIMD-accelerating PropertyHashJSON::perfect while keeping it ASAN-safe and preserving existing hash semantics.

Changes:

  • Introduced a size-threshold hybrid implementation: scalar for 1–7 bytes, bounce-buffer SIMD for 8–15, and direct SIMD with overlapping tail loads for 16–31.
  • Added NEON (ARM64) and SSE2 (x86_64/MSVC) backends with compile-time dispatch, plus a scalar fallback.
  • Kept the existing 31-case `switch` dispatcher so the compiler can fully constant-fold per-size paths.
  • Adjusted macOS builds to effectively disable clang-tidy by default due to an Xcode 16.4 `arm_neon.h` vs clang-tidy parser mismatch (overrideable via `SOURCEMETA_CXX_CLANG_TIDY`).

Technical Notes: The first byte of the hash remains untouched, and the byte pattern for perfect hashes is intended to remain bitwise-identical to the prior memcpy approach.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread CMakeLists.txt Outdated
# Override with `-DSOURCEMETA_CXX_CLANG_TIDY=<path-to-clang-tidy>` to
# re-enable manually once the toolchain mismatch is resolved.
if(APPLE AND NOT SOURCEMETA_CXX_CLANG_TIDY)
set(SOURCEMETA_CXX_CLANG_TIDY "/usr/bin/true"
Copy link
Copy Markdown

@augmentcode augmentcode Bot May 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMakeLists.txt:15: Because SOURCEMETA_CXX_CLANG_TIDY is a cached variable, an existing macOS build directory that already has it set won’t hit this branch, so clang-tidy may remain enabled and keep failing on arm_neon.h. Consider whether this should override/warn in that situation so incremental builds reliably get the intended disable.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in cc45a6a — the conditional now warns on a stale cached value pointing at anything other than /usr/bin/true, with a pointer to cmake --fresh or an explicit override. CI hits the no-op branch as before.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark (macos/llvm)

Details
Benchmark suite Current: 4c55c0c Previous: fe6cf2d Ratio
Regex_Lower_S_Or_Upper_S_Asterisk 2.0421007999571423 ns/iter 2.010835848003498 ns/iter 1.02
Regex_Caret_Lower_S_Or_Upper_S_Asterisk_Dollar 1.7475895797522158 ns/iter 1.7110554161518983 ns/iter 1.02
Regex_Period_Asterisk 1.7081082953241409 ns/iter 1.7764185332258684 ns/iter 0.96
Regex_Group_Period_Asterisk_Group 1.728500363298247 ns/iter 1.7224558291401961 ns/iter 1.00
Regex_Period_Plus 2.097735557923812 ns/iter 2.0000323549029377 ns/iter 1.05
Regex_Period 2.0680292905342474 ns/iter 2.0278505311681845 ns/iter 1.02
Regex_Caret_Period_Plus_Dollar 2.0522575840899306 ns/iter 2.003155724355338 ns/iter 1.02
Regex_Caret_Group_Period_Plus_Group_Dollar 2.146383435235732 ns/iter 2.0256757330544573 ns/iter 1.06
Regex_Caret_Period_Asterisk_Dollar 1.7114819637155971 ns/iter 1.6700846350102652 ns/iter 1.02
Regex_Caret_Group_Period_Asterisk_Group_Dollar 1.687887983303907 ns/iter 1.693423415362482 ns/iter 1.00
Regex_Caret_X_Hyphen 6.146413711855026 ns/iter 6.184966402656459 ns/iter 0.99
Regex_Period_Md_Dollar 16.836059887529604 ns/iter 17.028282894599652 ns/iter 0.99
Regex_Caret_Slash_Period_Asterisk 4.34268193390905 ns/iter 4.4100078702706895 ns/iter 0.98
Regex_Caret_Period_Range_Dollar 2.045700819817661 ns/iter 2.1319075410833275 ns/iter 0.96
Regex_Nested_Backtrack 25.645454902449327 ns/iter 25.12198989077627 ns/iter 1.02
JSON_Array_Of_Objects_Unique 415.12827350718663 ns/iter 427.99667028144665 ns/iter 0.97
JSON_Parse_1 4674.165347168566 ns/iter 5265.381787676554 ns/iter 0.89
JSON_Parse_Real 6940.529632295244 ns/iter 8163.83133157448 ns/iter 0.85
JSON_Parse_Decimal 8052.283780340536 ns/iter 9698.71671935695 ns/iter 0.83
JSON_Parse_Schema_ISO_Language 3056494.021739275 ns/iter 3810364.473118229 ns/iter 0.80
JSON_Fast_Hash_Helm_Chart_Lock 58.566914094147485 ns/iter 66.87465474823517 ns/iter 0.88
JSON_Equality_Helm_Chart_Lock 137.6584224460564 ns/iter 148.8027031573935 ns/iter 0.93
JSON_Divisible_By_Decimal 175.78439227919964 ns/iter 196.7705052041123 ns/iter 0.89
JSON_String_Equal/10 7.132933346094468 ns/iter 7.757683364338713 ns/iter 0.92
JSON_String_Equal/100 7.248886744482011 ns/iter 8.775277288935651 ns/iter 0.83
JSON_String_Equal_Small_By_Perfect_Hash/10 0.9187943372243536 ns/iter 1.4043410961283551 ns/iter 0.65
JSON_String_Equal_Small_By_Runtime_Perfect_Hash/10 4.206266408412158 ns/iter 3.7506904481041157 ns/iter 1.12
JSON_String_Fast_Hash/10 2.5263966547520695 ns/iter 2.7459323563476854 ns/iter 0.92
JSON_String_Fast_Hash/100 2.2813427021455492 ns/iter 2.644733335893519 ns/iter 0.86
JSON_String_Key_Hash/10 1.8272347991819273 ns/iter 1.7330024032273486 ns/iter 1.05
JSON_String_Key_Hash/100 2.454742039092695 ns/iter 2.5758903691593074 ns/iter 0.95
JSON_Object_Defines_Miss_Same_Length 2.3975437334841776 ns/iter 2.7037465764189355 ns/iter 0.89
JSON_Object_Defines_Miss_Too_Small 2.9758680977688567 ns/iter 2.800087392151184 ns/iter 1.06
JSON_Object_Defines_Miss_Too_Large 2.450911519557285 ns/iter 2.8512932869111807 ns/iter 0.86
Pointer_Object_Traverse 15.104357063933847 ns/iter 16.95105513541912 ns/iter 0.89
Pointer_Object_Try_Traverse 23.655856744413803 ns/iter 27.226019684687355 ns/iter 0.87
Pointer_Push_Back_Pointer_To_Weak_Pointer 162.05401080128578 ns/iter 183.48268639958624 ns/iter 0.88
Pointer_Walker_Schema_ISO_Language 5428869.2340426855 ns/iter 7252474.009901291 ns/iter 0.75
Pointer_Maybe_Tracked_Deeply_Nested/0 1136911.26856235 ns/iter 1625969.5326884722 ns/iter 0.70
Pointer_Maybe_Tracked_Deeply_Nested/1 1449364.7959183392 ns/iter 1676212.8378374923 ns/iter 0.86
Pointer_Position_Tracker_Get_Deeply_Nested 364.78687431000833 ns/iter 388.0661191402859 ns/iter 0.94
URITemplateRouter_Create 24895.4473286402 ns/iter 28855.628612944038 ns/iter 0.86
URITemplateRouter_Match 164.22265463764617 ns/iter 190.559486205637 ns/iter 0.86
URITemplateRouter_Match_BasePath 196.63124897761122 ns/iter 220.3548792758473 ns/iter 0.89
URITemplateRouterView_Restore 10095.257829928647 ns/iter 11513.873336189561 ns/iter 0.88
URITemplateRouterView_Match 132.82277489246113 ns/iter 147.8216975709563 ns/iter 0.90
URITemplateRouterView_Match_BasePath 152.52163457863634 ns/iter 154.62870951205232 ns/iter 0.99
URITemplateRouterView_Arguments 414.23079090718437 ns/iter 428.59548714915513 ns/iter 0.97
JSONL_Parse_Large 14886120.535714602 ns/iter 13341380.388889471 ns/iter 1.12
JSONL_Parse_Large_GZIP 14740610.229165914 ns/iter 15510367.187497802 ns/iter 0.95
HTML_Build_Table_100000 69352484.81817718 ns/iter 81925208.33341001 ns/iter 0.85
HTML_Render_Table_100000 3963847.465517075 ns/iter 4457718.623452371 ns/iter 0.89
GZIP_Compress_ISO_Language_Set_3_Locations 29149845.499998625 ns/iter 30338094.21741234 ns/iter 0.96
GZIP_Decompress_ISO_Language_Set_3_Locations 5799471.792307451 ns/iter 5344925.150442964 ns/iter 1.09
GZIP_Compress_ISO_Language_Set_3_Schema 1710411.123853178 ns/iter 1549140.2489366939 ns/iter 1.10
GZIP_Decompress_ISO_Language_Set_3_Schema 295717.47242797806 ns/iter 284083.6103949304 ns/iter 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark (linux/gcc)

Details
Benchmark suite Current: 4c55c0c Previous: fe6cf2d Ratio
GZIP_Compress_ISO_Language_Set_3_Locations 40261838.05882176 ns/iter 40041539.117647275 ns/iter 1.01
GZIP_Decompress_ISO_Language_Set_3_Locations 4541002.863636446 ns/iter 4605534.526666588 ns/iter 0.99
GZIP_Compress_ISO_Language_Set_3_Schema 2311141.5610561897 ns/iter 2301977.6973685534 ns/iter 1.00
GZIP_Decompress_ISO_Language_Set_3_Schema 290272.48951648315 ns/iter 293980.9475638042 ns/iter 0.99
HTML_Build_Table_100000 71126077.20000029 ns/iter 71440075.99999896 ns/iter 1.00
HTML_Render_Table_100000 1972115.1977400861 ns/iter 1975591.8142855729 ns/iter 1.00
JSONL_Parse_Large 14854807.70212752 ns/iter 15126402.413043054 ns/iter 0.98
JSONL_Parse_Large_GZIP 16168695.813954126 ns/iter 16386498.627906444 ns/iter 0.99
URITemplateRouter_Create 30105.908923050392 ns/iter 29467.331319487286 ns/iter 1.02
URITemplateRouter_Match 158.48364248670052 ns/iter 158.33991055403771 ns/iter 1.00
URITemplateRouter_Match_BasePath 178.65191528947594 ns/iter 188.23309027181415 ns/iter 0.95
URITemplateRouterView_Restore 8780.722534655526 ns/iter 8697.531089490003 ns/iter 1.01
URITemplateRouterView_Match 124.71441056725624 ns/iter 124.64464974137634 ns/iter 1.00
URITemplateRouterView_Match_BasePath 143.57029533208058 ns/iter 143.8431469566393 ns/iter 1.00
URITemplateRouterView_Arguments 462.15399137373646 ns/iter 458.34000529303717 ns/iter 1.01
Pointer_Object_Traverse 33.754101940182984 ns/iter 34.11538536194793 ns/iter 0.99
Pointer_Object_Try_Traverse 21.969245500032457 ns/iter 21.98696670100176 ns/iter 1.00
Pointer_Push_Back_Pointer_To_Weak_Pointer 167.32138354448443 ns/iter 136.08483082777326 ns/iter 1.23
Pointer_Walker_Schema_ISO_Language 3627030.7564766696 ns/iter 3635916.3928569634 ns/iter 1.00
Pointer_Maybe_Tracked_Deeply_Nested/0 1829168.1540469178 ns/iter 1861357.4031830325 ns/iter 0.98
Pointer_Maybe_Tracked_Deeply_Nested/1 1809524.9945944843 ns/iter 1883748.8894878835 ns/iter 0.96
Pointer_Position_Tracker_Get_Deeply_Nested 582.2039490675976 ns/iter 572.8589182353895 ns/iter 1.02
JSON_Array_Of_Objects_Unique 423.4444651157079 ns/iter 419.95133613048284 ns/iter 1.01
JSON_Parse_1 9734.580814607532 ns/iter 9789.565024492082 ns/iter 0.99
JSON_Parse_Real 13415.988973267426 ns/iter 13137.60996912794 ns/iter 1.02
JSON_Parse_Decimal 17515.464438069506 ns/iter 17057.58143893284 ns/iter 1.03
JSON_Parse_Schema_ISO_Language 5636026.191999917 ns/iter 5774721.804878174 ns/iter 0.98
JSON_Fast_Hash_Helm_Chart_Lock 65.43523747868925 ns/iter 61.30852823091589 ns/iter 1.07
JSON_Equality_Helm_Chart_Lock 166.17909266953635 ns/iter 172.31138431390164 ns/iter 0.96
JSON_Divisible_By_Decimal 229.87220494443784 ns/iter 231.82153448512548 ns/iter 0.99
JSON_String_Equal/10 6.019489214781557 ns/iter 6.114412830850055 ns/iter 0.98
JSON_String_Equal/100 6.661200246052146 ns/iter 6.966122002438372 ns/iter 0.96
JSON_String_Equal_Small_By_Perfect_Hash/10 0.7102181157263903 ns/iter 0.7111757497844186 ns/iter 1.00
JSON_String_Equal_Small_By_Runtime_Perfect_Hash/10 33.11288454341117 ns/iter 21.951220280982454 ns/iter 1.51
JSON_String_Fast_Hash/10 1.0556554693308242 ns/iter 1.7584711619715003 ns/iter 0.60
JSON_String_Fast_Hash/100 1.0564220738993684 ns/iter 1.7587519360645085 ns/iter 0.60
JSON_String_Key_Hash/10 8.433621739872454 ns/iter 1.0850968900875846 ns/iter 7.77
JSON_String_Key_Hash/100 15.514774636683413 ns/iter 14.767415853664732 ns/iter 1.05
JSON_Object_Defines_Miss_Same_Length 3.516090287837044 ns/iter 3.8886116862600515 ns/iter 0.90
JSON_Object_Defines_Miss_Too_Small 3.5182233161560816 ns/iter 3.5169091264663384 ns/iter 1.00
JSON_Object_Defines_Miss_Too_Large 3.51688198986014 ns/iter 4.219610344057994 ns/iter 0.83
Regex_Lower_S_Or_Upper_S_Asterisk 0.7032161815694277 ns/iter 0.7038214032873846 ns/iter 1.00
Regex_Caret_Lower_S_Or_Upper_S_Asterisk_Dollar 1.0548457424239046 ns/iter 0.7042668836261515 ns/iter 1.50
Regex_Period_Asterisk 1.0555230497882504 ns/iter 1.055477932419015 ns/iter 1.00
Regex_Group_Period_Asterisk_Group 0.7037482923437446 ns/iter 1.0552310607384283 ns/iter 0.67
Regex_Period_Plus 0.7032836356145435 ns/iter 0.7032150387981557 ns/iter 1.00
Regex_Period 1.0551279484804794 ns/iter 0.703289696699089 ns/iter 1.50
Regex_Caret_Period_Plus_Dollar 1.054848982473602 ns/iter 1.0564138638199936 ns/iter 1.00
Regex_Caret_Group_Period_Plus_Group_Dollar 0.7043797333228763 ns/iter 1.0576392341948366 ns/iter 0.67
Regex_Caret_Period_Asterisk_Dollar 0.7033039879325955 ns/iter 0.7034214714247459 ns/iter 1.00
Regex_Caret_Group_Period_Asterisk_Group_Dollar 1.0549028994127345 ns/iter 0.705503404640222 ns/iter 1.50
Regex_Caret_X_Hyphen 3.8702000581631486 ns/iter 3.8698630002216294 ns/iter 1.00
Regex_Period_Md_Dollar 34.84538645647904 ns/iter 33.67427796527835 ns/iter 1.03
Regex_Caret_Slash_Period_Asterisk 4.220936568174724 ns/iter 4.588304921895854 ns/iter 0.92
Regex_Caret_Period_Range_Dollar 1.4072402244447066 ns/iter 0.8472261272338164 ns/iter 1.66
Regex_Nested_Backtrack 50.82182201075506 ns/iter 38.70772641010307 ns/iter 1.31

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark (linux/llvm)

Details
Benchmark suite Current: 4c55c0c Previous: fe6cf2d Ratio
Regex_Lower_S_Or_Upper_S_Asterisk 2.4650507383092317 ns/iter 2.2198003208633903 ns/iter 1.11
Regex_Caret_Lower_S_Or_Upper_S_Asterisk_Dollar 2.463608150122335 ns/iter 2.198458350128735 ns/iter 1.12
Regex_Period_Asterisk 2.4614233643597903 ns/iter 2.19241095280019 ns/iter 1.12
Regex_Group_Period_Asterisk_Group 2.4618138651480286 ns/iter 2.1996050096327884 ns/iter 1.12
Regex_Period_Plus 3.86907116849953 ns/iter 2.854730033648928 ns/iter 1.36
Regex_Period 3.866576885184666 ns/iter 2.8017253081883178 ns/iter 1.38
Regex_Caret_Period_Plus_Dollar 3.514941482711589 ns/iter 2.488938806721525 ns/iter 1.41
Regex_Caret_Group_Period_Plus_Group_Dollar 3.5150249578341564 ns/iter 2.488702213374691 ns/iter 1.41
Regex_Caret_Period_Asterisk_Dollar 2.8137130384548548 ns/iter 3.4306951171617124 ns/iter 0.82
Regex_Caret_Group_Period_Asterisk_Group_Dollar 2.815298065016845 ns/iter 3.426044396606113 ns/iter 0.82
Regex_Caret_X_Hyphen 7.034240573310137 ns/iter 6.5377451740812 ns/iter 1.08
Regex_Period_Md_Dollar 28.3189344397634 ns/iter 27.95253783293417 ns/iter 1.01
Regex_Caret_Slash_Period_Asterisk 7.58070485061698 ns/iter 5.911218873540801 ns/iter 1.28
Regex_Caret_Period_Range_Dollar 4.219122968067906 ns/iter 2.801286242958691 ns/iter 1.51
Regex_Nested_Backtrack 39.650546909144545 ns/iter 37.00628813318486 ns/iter 1.07
JSON_Array_Of_Objects_Unique 476.91391549253535 ns/iter 493.1990916423395 ns/iter 0.97
JSON_Parse_1 6497.949111769836 ns/iter 6883.968445259204 ns/iter 0.94
JSON_Parse_Real 11404.196730124368 ns/iter 12285.614182630668 ns/iter 0.93
JSON_Parse_Decimal 11668.763044200885 ns/iter 11672.939662495113 ns/iter 1.00
JSON_Parse_Schema_ISO_Language 3871040.2651933683 ns/iter 3847256.6229508426 ns/iter 1.01
JSON_Fast_Hash_Helm_Chart_Lock 68.93804361170433 ns/iter 66.33043432490602 ns/iter 1.04
JSON_Equality_Helm_Chart_Lock 162.87715924436233 ns/iter 179.80854714302092 ns/iter 0.91
JSON_Divisible_By_Decimal 241.862831826369 ns/iter 254.46917525439414 ns/iter 0.95
JSON_String_Equal/10 5.985099911874573 ns/iter 6.546957600664495 ns/iter 0.91
JSON_String_Equal/100 6.698137545060451 ns/iter 7.202064605366773 ns/iter 0.93
JSON_String_Equal_Small_By_Perfect_Hash/10 1.0550564513630294 ns/iter 0.9396139502637549 ns/iter 1.12
JSON_String_Equal_Small_By_Runtime_Perfect_Hash/10 14.200225980275594 ns/iter 10.655258581033351 ns/iter 1.33
JSON_String_Fast_Hash/10 2.4633314027270425 ns/iter 3.1188048087952156 ns/iter 0.79
JSON_String_Fast_Hash/100 2.467796862417339 ns/iter 3.1198627608367344 ns/iter 0.79
JSON_String_Key_Hash/10 3.1659038331140996 ns/iter 2.1822113288100478 ns/iter 1.45
JSON_String_Key_Hash/100 7.732744738736636 ns/iter 6.536700322475011 ns/iter 1.18
JSON_Object_Defines_Miss_Same_Length 2.911516029447275 ns/iter 2.6996137559262983 ns/iter 1.08
JSON_Object_Defines_Miss_Too_Small 2.938424340142355 ns/iter 2.7182877033237234 ns/iter 1.08
JSON_Object_Defines_Miss_Too_Large 4.2222289574462115 ns/iter 3.73996472279293 ns/iter 1.13
Pointer_Object_Traverse 25.898146496198002 ns/iter 25.844070403919893 ns/iter 1.00
Pointer_Object_Try_Traverse 30.400793969890906 ns/iter 28.453496964490558 ns/iter 1.07
Pointer_Push_Back_Pointer_To_Weak_Pointer 167.24761797811522 ns/iter 159.38505647582804 ns/iter 1.05
Pointer_Walker_Schema_ISO_Language 3288718.7083332497 ns/iter 3039739.1739128307 ns/iter 1.08
Pointer_Maybe_Tracked_Deeply_Nested/0 1491159.0149572417 ns/iter 1498890.0085289618 ns/iter 0.99
Pointer_Maybe_Tracked_Deeply_Nested/1 1777645.949109521 ns/iter 1827065.2829131973 ns/iter 0.97
Pointer_Position_Tracker_Get_Deeply_Nested 739.004843529857 ns/iter 683.9722494663005 ns/iter 1.08
URITemplateRouter_Create 29834.7036926247 ns/iter 31722.43273169462 ns/iter 0.94
URITemplateRouter_Match 187.2295362769122 ns/iter 171.9222357503655 ns/iter 1.09
URITemplateRouter_Match_BasePath 230.24839146268167 ns/iter 196.32056042959192 ns/iter 1.17
URITemplateRouterView_Restore 8415.140331076096 ns/iter 7758.819577353205 ns/iter 1.08
URITemplateRouterView_Match 144.54215300138085 ns/iter 143.35059621030373 ns/iter 1.01
URITemplateRouterView_Match_BasePath 166.26694907256848 ns/iter 161.36200172850994 ns/iter 1.03
URITemplateRouterView_Arguments 449.6218893636655 ns/iter 436.18736205292555 ns/iter 1.03
JSONL_Parse_Large 11129875.46031895 ns/iter 11789167.47457637 ns/iter 0.94
JSONL_Parse_Large_GZIP 12331343.210526189 ns/iter 13061004.722222455 ns/iter 0.94
HTML_Build_Table_100000 88859920.87499517 ns/iter 71758292.90908743 ns/iter 1.24
HTML_Render_Table_100000 4879675.414286209 ns/iter 5186473.073529348 ns/iter 0.94
GZIP_Compress_ISO_Language_Set_3_Locations 36571137.6315762 ns/iter 33792010.95237607 ns/iter 1.08
GZIP_Decompress_ISO_Language_Set_3_Locations 4680286.313333303 ns/iter 5073548.710000751 ns/iter 0.92
GZIP_Compress_ISO_Language_Set_3_Schema 2134309.1951221335 ns/iter 1881182.1505376853 ns/iter 1.13
GZIP_Decompress_ISO_Language_Set_3_Schema 290245.4736842034 ns/iter 380070.4193373255 ns/iter 0.76

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark (windows/msvc)

Details
Benchmark suite Current: 4c55c0c Previous: fe6cf2d Ratio
Regex_Lower_S_Or_Upper_S_Asterisk 5.068836607143064 ns/iter 6.225971000001209 ns/iter 0.81
Regex_Caret_Lower_S_Or_Upper_S_Asterisk_Dollar 5.063029464286574 ns/iter 6.2797732142862674 ns/iter 0.81
Regex_Period_Asterisk 5.048775892857651 ns/iter 5.914817857142144 ns/iter 0.85
Regex_Group_Period_Asterisk_Group 5.065031250000272 ns/iter 5.976576785714981 ns/iter 0.85
Regex_Period_Plus 4.904897000000119 ns/iter 5.611151999999038 ns/iter 0.87
Regex_Period 4.702506037282329 ns/iter 5.452118081574837 ns/iter 0.86
Regex_Caret_Period_Plus_Dollar 4.914400457398602 ns/iter 5.729045535714151 ns/iter 0.86
Regex_Caret_Group_Period_Plus_Group_Dollar 4.8281157600061 ns/iter 5.259407621015488 ns/iter 0.92
Regex_Caret_Period_Asterisk_Dollar 5.079954999999927 ns/iter 6.219507000000704 ns/iter 0.82
Regex_Caret_Group_Period_Asterisk_Group_Dollar 5.196465178570975 ns/iter 6.117930357143158 ns/iter 0.85
Regex_Caret_X_Hyphen 8.449755765848296 ns/iter 9.902397277222768 ns/iter 0.85
Regex_Period_Md_Dollar 46.03975874071787 ns/iter 90.44502721228666 ns/iter 0.51
Regex_Caret_Slash_Period_Asterisk 8.016523437499908 ns/iter 9.114747767858335 ns/iter 0.88
Regex_Caret_Period_Range_Dollar 5.647320000000491 ns/iter 6.526906000001417 ns/iter 0.87
Regex_Nested_Backtrack 55.558480000001964 ns/iter 88.85487723213308 ns/iter 0.63
JSON_Array_Of_Objects_Unique 483.8658114908796 ns/iter 650.8888000000752 ns/iter 0.74
JSON_Parse_1 11784.694642857728 ns/iter 12914.648214287385 ns/iter 0.91
JSON_Parse_Real 18814.38405700038 ns/iter 20103.801288376642 ns/iter 0.94
JSON_Parse_Decimal 17932.14314413679 ns/iter 17357.536769218073 ns/iter 1.03
JSON_Parse_Schema_ISO_Language 7829558.888888895 ns/iter 8068633.33333341 ns/iter 0.97
JSON_Fast_Hash_Helm_Chart_Lock 64.38996651785902 ns/iter 80.32225087846311 ns/iter 0.80
JSON_Equality_Helm_Chart_Lock 303.9102674207533 ns/iter 298.04855602133495 ns/iter 1.02
JSON_Divisible_By_Decimal 301.42707796576804 ns/iter 408.2167540974293 ns/iter 0.74
JSON_String_Equal/10 16.268894310699793 ns/iter 15.361986607140516 ns/iter 1.06
JSON_String_Equal/100 16.660455022364133 ns/iter 23.033800000000326 ns/iter 0.72
JSON_String_Equal_Small_By_Perfect_Hash/10 2.5112417857144465 ns/iter 2.558883814108603 ns/iter 0.98
JSON_String_Equal_Small_By_Runtime_Perfect_Hash/10 38.12281003352012 ns/iter 14.30066846294256 ns/iter 2.67
JSON_String_Fast_Hash/10 4.736009777842688 ns/iter 4.654929464286234 ns/iter 1.02
JSON_String_Fast_Hash/100 4.7067883928570655 ns/iter 6.218496999999843 ns/iter 0.76
JSON_String_Key_Hash/10 17.212669642857136 ns/iter 6.071008928572042 ns/iter 2.84
JSON_String_Key_Hash/100 15.947207589286045 ns/iter 19.057504049881167 ns/iter 0.84
JSON_Object_Defines_Miss_Same_Length 4.092758087665319 ns/iter 4.070796792983468 ns/iter 1.01
JSON_Object_Defines_Miss_Too_Small 4.080800409072509 ns/iter 5.585132142856862 ns/iter 0.73
JSON_Object_Defines_Miss_Too_Large 4.397130167775001 ns/iter 4.208353190084879 ns/iter 1.04
Pointer_Object_Traverse 63.77180357142314 ns/iter 65.92736607142768 ns/iter 0.97
Pointer_Object_Try_Traverse 71.00738392856394 ns/iter 78.34411830359552 ns/iter 0.91
Pointer_Push_Back_Pointer_To_Weak_Pointer 160.6261160714289 ns/iter 178.5425342774114 ns/iter 0.90
Pointer_Walker_Schema_ISO_Language 12454507.14285669 ns/iter 14540711.999998167 ns/iter 0.86
Pointer_Maybe_Tracked_Deeply_Nested/0 2435355.421686971 ns/iter 2587425.8928563367 ns/iter 0.94
Pointer_Maybe_Tracked_Deeply_Nested/1 3838555.8974357243 ns/iter 3670630.985915279 ns/iter 1.05
Pointer_Position_Tracker_Get_Deeply_Nested 709.7150669643781 ns/iter 726.5830684898559 ns/iter 0.98
URITemplateRouter_Create 43976.42981980424 ns/iter 43351.039595008966 ns/iter 1.01
URITemplateRouter_Match 193.56923156868146 ns/iter 217.69102481131566 ns/iter 0.89
URITemplateRouter_Match_BasePath 218.34468750000724 ns/iter 249.9976785713898 ns/iter 0.87
URITemplateRouterView_Restore 31957.47487170696 ns/iter 23475.83571428556 ns/iter 1.36
URITemplateRouterView_Match 156.55232142857452 ns/iter 172.55769905518778 ns/iter 0.91
URITemplateRouterView_Match_BasePath 173.50887992246186 ns/iter 193.0510886652807 ns/iter 0.90
URITemplateRouterView_Arguments 516.9826000000057 ns/iter 546.0669642857852 ns/iter 0.95
JSONL_Parse_Large 35378915.00000115 ns/iter 35540905.26315696 ns/iter 1.00
JSONL_Parse_Large_GZIP 34956490.47618783 ns/iter 35467264.99999977 ns/iter 0.99
HTML_Build_Table_100000 91941611.11111043 ns/iter 96829542.85715692 ns/iter 0.95
HTML_Render_Table_100000 7805035.555555959 ns/iter 8236934.444446181 ns/iter 0.95
GZIP_Compress_ISO_Language_Set_3_Locations 40959870.588233784 ns/iter 43908339.999992065 ns/iter 0.93
GZIP_Decompress_ISO_Language_Set_3_Locations 10596779.687499946 ns/iter 11513398.21428467 ns/iter 0.92
GZIP_Compress_ISO_Language_Set_3_Schema 2233997.8125000214 ns/iter 2508677.142856998 ns/iter 0.89
GZIP_Decompress_ISO_Language_Set_3_Schema 651329.9107142854 ns/iter 685855.5357143002 ns/iter 0.95

This comment was automatically generated by workflow using github-action-benchmark.

SouptikH added a commit that referenced this pull request May 31, 2026
Address cubic/augment reviewer feedback on #2467: the prior commit
only set `SOURCEMETA_CXX_CLANG_TIDY` in the cache when it was unset,
so an existing build directory whose cache already pointed at a real
clang-tidy binary would silently bypass the carve-out and keep
hitting the Xcode 16 `arm_neon.h` parser error on incremental
builds.

Emit a `CMake WARNING` in that situation directing the developer to
either `cmake --fresh` or pass
`-DSOURCEMETA_CXX_CLANG_TIDY=/usr/bin/true` explicitly. Fresh CI
configures hit the original `NOT SOURCEMETA_CXX_CLANG_TIDY` branch
and get the intended no-op behaviour as before.

Signed-off-by: SouptikH <haldersouptik@gmail.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread CMakeLists.txt Outdated
Address cubic/augment reviewer feedback on #2467: the prior commit
only set `SOURCEMETA_CXX_CLANG_TIDY` in the cache when it was unset,
so an existing build directory whose cache already pointed at a real
clang-tidy binary would silently bypass the carve-out and keep
hitting the Xcode 16 `arm_neon.h` parser error on incremental
builds.

Emit a `CMake WARNING` in that situation directing the developer to
either `cmake --fresh` or pass
`-DSOURCEMETA_CXX_CLANG_TIDY=/usr/bin/true` explicitly. Fresh CI
configures hit the original `NOT SOURCEMETA_CXX_CLANG_TIDY` branch
and get the intended no-op behaviour as before.

Signed-off-by: SouptikH <haldersouptik@gmail.com>
@SouptikH SouptikH force-pushed the simd-property-hash branch from cc45a6a to 4c55c0c Compare May 31, 2026 01:02
@SouptikH SouptikH requested a review from jviotti May 31, 2026 01:17
@jviotti
Copy link
Copy Markdown
Member

jviotti commented Jun 1, 2026

@SouptikH Awesome stuff! Take a look at the benchmark comments. Mainly the Linux/GCC. That's the one we care about the most as our actual APIs run on that configuration. This part is interesting. Seems like some cases get faster but some get way slower?

Screenshot 2026-06-01 at 09 33 41

What I recommend is iterating with the info that these benchmark comments give you. Maybe there is something making things slower in the other case, etc.

@jviotti
Copy link
Copy Markdown
Member

jviotti commented Jun 1, 2026

What I find interesting is that it spikes on the 10 character one specifically. Maybe another thing to look into is the implementation of std::memcpy on the GCC standard library. Maybe they have some tricks for specific thresholds that even make some cases way faster than SIMD?

@jviotti
Copy link
Copy Markdown
Member

jviotti commented Jun 1, 2026

Also, what I found when digging into similar things before, is that the actual standard library version can differ a lot. i.e. there are Docker images for GCC for different versions of it which ship different versions of the standard library too, and you can get massive differences just out of those.

Maybe we should come up with our own memcpy implementation based on different standard libraries to have more control over it?

@jviotti
Copy link
Copy Markdown
Member

jviotti commented Jun 1, 2026

Finally, of course, if there are specific new cases you want to test, feel free to add them to the benchmark/. Maybe other variations of the perfect check for other string lengths, etc. If you want to compare against the baseline on these comments, what I typically do is send a separate PR with the new benchmark cases, get that merged first, and then rebase the other PR, and you get the comparison

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants