json_hash: SIMD-accelerate PropertyHashJSON::perfect (ASAN-safe)#2467
json_hash: SIMD-accelerate PropertyHashJSON::perfect (ASAN-safe)#2467SouptikH wants to merge 3 commits into
Conversation
Replace the unconditional byte-by-byte memcpy inside
`PropertyHashJSON::perfect` with a threshold-based hybrid path:
* size 1..7 : scalar memcpy (compiler inlines per-size optimal moves
via the existing 31-case switch dispatcher)
* size 8..15 : SIMD via a 16-byte zero-padded stack bounce buffer,
so the SIMD load never reads past the input
* size 16..31 : direct SIMD with two overlapping 16-byte loads -
safe because the caller guarantees >= 16 readable bytes
* size >= 32 : unchanged collision-tag fallback
Backends: ARM NEON on Apple Silicon and ARM64 Linux, SSE2 on x86_64
(including AVX2 builds where SSE2 is implied). Scalar fallback for
any other ISA.
Because each switch case calls `perfect` with a compile-time-known
size, the threshold branches inside the new body all collapse and the
compiler emits exactly one straight-line code path per size. Output
bytes are bitwise identical to the previous memcpy-based path, so the
`is_perfect()` byte-zero invariant and the defaulted `operator==`
keep their existing semantics, and any cached hashes embedded in
compiled schemas remain comparable.
Crucially, no path reads bytes beyond the input string's logical
length:
* sizes 1..15 use either a per-size scalar memcpy (1..7) or a
`std::memcpy(buf, src, size)` into a stack buffer (8..15)
* sizes 16..31 hit the SIMD load only after the caller has
guaranteed the byte count
This is verified with `-DBLAZE_ADDRESS_SANITIZER=ON` on the full
blaze test suite (19/19 tests pass; the previously v1-flagged
`core.json`, `core.jsonpointer`, and `core.uritemplate` tests are
clean).
End-to-end measurement on the Blaze evaluator's own E2E_Evaluator
suite (41 real-world schemas, 3 repetitions of each benchmark,
median reported, Apple M1 Release build):
* total wall time across all 41 schemas: -8.80 %
* mean per-schema delta: -8.87 %
* median per-schema delta: -9.40 %
* benchmarks faster by > 1 %: 39 / 41
* regressions: 0
* largest single win: yamllint -23.35 %
* worst case: jsconfig +0.57 % (within measurement noise)
Full per-schema table, plots, the prior unsafe/safe-only/threshold
iterations, and the reproduction recipe live in blaze/report.md.
Signed-off-by: SouptikH <haldersouptik@gmail.com>
Xcode 16.4 ships `arm_neon.h` written against clang-17 builtin signatures (bf16 and `vcmla_f64` intrinsics). The bundled clang-tidy (`clang_tidy==20.1.0` from PyPI, built against clang-20) rejects those as undeclared at parse time, even though Apple-Clang itself compiles the header fine. clang-tidy is only enabled on APPLE+LLVM by `cmake/common/clang-tidy.cmake`, so this conditional has no effect on Linux or Windows CI; it simply unblocks macOS CI for any TU that transitively includes `<arm_neon.h>` (e.g. the SIMD path in `json_hash.h` introduced by the preceding commit). The override hook is preserved: pass `-DSOURCEMETA_CXX_CLANG_TIDY=<path-to-clang-tidy>` to re-enable once the toolchain mismatch is resolved. Signed-off-by: SouptikH <haldersouptik@gmail.com>
🤖 Augment PR SummarySummary: This PR speeds up JSON property hashing by SIMD-accelerating Changes:
Technical Notes: The first byte of the hash remains untouched, and the byte pattern for perfect hashes is intended to remain bitwise-identical to the prior 🤖 Was this summary useful? React with 👍 or 👎 |
| # Override with `-DSOURCEMETA_CXX_CLANG_TIDY=<path-to-clang-tidy>` to | ||
| # re-enable manually once the toolchain mismatch is resolved. | ||
| if(APPLE AND NOT SOURCEMETA_CXX_CLANG_TIDY) | ||
| set(SOURCEMETA_CXX_CLANG_TIDY "/usr/bin/true" |
There was a problem hiding this comment.
CMakeLists.txt:15: Because SOURCEMETA_CXX_CLANG_TIDY is a cached variable, an existing macOS build directory that already has it set won’t hit this branch, so clang-tidy may remain enabled and keep failing on arm_neon.h. Consider whether this should override/warn in that situation so incremental builds reliably get the intended disable.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Addressed in cc45a6a — the conditional now warns on a stale cached value pointing at anything other than /usr/bin/true, with a pointer to cmake --fresh or an explicit override. CI hits the no-op branch as before.
There was a problem hiding this comment.
Benchmark (macos/llvm)
Details
| Benchmark suite | Current: 4c55c0c | Previous: fe6cf2d | Ratio |
|---|---|---|---|
Regex_Lower_S_Or_Upper_S_Asterisk |
2.0421007999571423 ns/iter |
2.010835848003498 ns/iter |
1.02 |
Regex_Caret_Lower_S_Or_Upper_S_Asterisk_Dollar |
1.7475895797522158 ns/iter |
1.7110554161518983 ns/iter |
1.02 |
Regex_Period_Asterisk |
1.7081082953241409 ns/iter |
1.7764185332258684 ns/iter |
0.96 |
Regex_Group_Period_Asterisk_Group |
1.728500363298247 ns/iter |
1.7224558291401961 ns/iter |
1.00 |
Regex_Period_Plus |
2.097735557923812 ns/iter |
2.0000323549029377 ns/iter |
1.05 |
Regex_Period |
2.0680292905342474 ns/iter |
2.0278505311681845 ns/iter |
1.02 |
Regex_Caret_Period_Plus_Dollar |
2.0522575840899306 ns/iter |
2.003155724355338 ns/iter |
1.02 |
Regex_Caret_Group_Period_Plus_Group_Dollar |
2.146383435235732 ns/iter |
2.0256757330544573 ns/iter |
1.06 |
Regex_Caret_Period_Asterisk_Dollar |
1.7114819637155971 ns/iter |
1.6700846350102652 ns/iter |
1.02 |
Regex_Caret_Group_Period_Asterisk_Group_Dollar |
1.687887983303907 ns/iter |
1.693423415362482 ns/iter |
1.00 |
Regex_Caret_X_Hyphen |
6.146413711855026 ns/iter |
6.184966402656459 ns/iter |
0.99 |
Regex_Period_Md_Dollar |
16.836059887529604 ns/iter |
17.028282894599652 ns/iter |
0.99 |
Regex_Caret_Slash_Period_Asterisk |
4.34268193390905 ns/iter |
4.4100078702706895 ns/iter |
0.98 |
Regex_Caret_Period_Range_Dollar |
2.045700819817661 ns/iter |
2.1319075410833275 ns/iter |
0.96 |
Regex_Nested_Backtrack |
25.645454902449327 ns/iter |
25.12198989077627 ns/iter |
1.02 |
JSON_Array_Of_Objects_Unique |
415.12827350718663 ns/iter |
427.99667028144665 ns/iter |
0.97 |
JSON_Parse_1 |
4674.165347168566 ns/iter |
5265.381787676554 ns/iter |
0.89 |
JSON_Parse_Real |
6940.529632295244 ns/iter |
8163.83133157448 ns/iter |
0.85 |
JSON_Parse_Decimal |
8052.283780340536 ns/iter |
9698.71671935695 ns/iter |
0.83 |
JSON_Parse_Schema_ISO_Language |
3056494.021739275 ns/iter |
3810364.473118229 ns/iter |
0.80 |
JSON_Fast_Hash_Helm_Chart_Lock |
58.566914094147485 ns/iter |
66.87465474823517 ns/iter |
0.88 |
JSON_Equality_Helm_Chart_Lock |
137.6584224460564 ns/iter |
148.8027031573935 ns/iter |
0.93 |
JSON_Divisible_By_Decimal |
175.78439227919964 ns/iter |
196.7705052041123 ns/iter |
0.89 |
JSON_String_Equal/10 |
7.132933346094468 ns/iter |
7.757683364338713 ns/iter |
0.92 |
JSON_String_Equal/100 |
7.248886744482011 ns/iter |
8.775277288935651 ns/iter |
0.83 |
JSON_String_Equal_Small_By_Perfect_Hash/10 |
0.9187943372243536 ns/iter |
1.4043410961283551 ns/iter |
0.65 |
JSON_String_Equal_Small_By_Runtime_Perfect_Hash/10 |
4.206266408412158 ns/iter |
3.7506904481041157 ns/iter |
1.12 |
JSON_String_Fast_Hash/10 |
2.5263966547520695 ns/iter |
2.7459323563476854 ns/iter |
0.92 |
JSON_String_Fast_Hash/100 |
2.2813427021455492 ns/iter |
2.644733335893519 ns/iter |
0.86 |
JSON_String_Key_Hash/10 |
1.8272347991819273 ns/iter |
1.7330024032273486 ns/iter |
1.05 |
JSON_String_Key_Hash/100 |
2.454742039092695 ns/iter |
2.5758903691593074 ns/iter |
0.95 |
JSON_Object_Defines_Miss_Same_Length |
2.3975437334841776 ns/iter |
2.7037465764189355 ns/iter |
0.89 |
JSON_Object_Defines_Miss_Too_Small |
2.9758680977688567 ns/iter |
2.800087392151184 ns/iter |
1.06 |
JSON_Object_Defines_Miss_Too_Large |
2.450911519557285 ns/iter |
2.8512932869111807 ns/iter |
0.86 |
Pointer_Object_Traverse |
15.104357063933847 ns/iter |
16.95105513541912 ns/iter |
0.89 |
Pointer_Object_Try_Traverse |
23.655856744413803 ns/iter |
27.226019684687355 ns/iter |
0.87 |
Pointer_Push_Back_Pointer_To_Weak_Pointer |
162.05401080128578 ns/iter |
183.48268639958624 ns/iter |
0.88 |
Pointer_Walker_Schema_ISO_Language |
5428869.2340426855 ns/iter |
7252474.009901291 ns/iter |
0.75 |
Pointer_Maybe_Tracked_Deeply_Nested/0 |
1136911.26856235 ns/iter |
1625969.5326884722 ns/iter |
0.70 |
Pointer_Maybe_Tracked_Deeply_Nested/1 |
1449364.7959183392 ns/iter |
1676212.8378374923 ns/iter |
0.86 |
Pointer_Position_Tracker_Get_Deeply_Nested |
364.78687431000833 ns/iter |
388.0661191402859 ns/iter |
0.94 |
URITemplateRouter_Create |
24895.4473286402 ns/iter |
28855.628612944038 ns/iter |
0.86 |
URITemplateRouter_Match |
164.22265463764617 ns/iter |
190.559486205637 ns/iter |
0.86 |
URITemplateRouter_Match_BasePath |
196.63124897761122 ns/iter |
220.3548792758473 ns/iter |
0.89 |
URITemplateRouterView_Restore |
10095.257829928647 ns/iter |
11513.873336189561 ns/iter |
0.88 |
URITemplateRouterView_Match |
132.82277489246113 ns/iter |
147.8216975709563 ns/iter |
0.90 |
URITemplateRouterView_Match_BasePath |
152.52163457863634 ns/iter |
154.62870951205232 ns/iter |
0.99 |
URITemplateRouterView_Arguments |
414.23079090718437 ns/iter |
428.59548714915513 ns/iter |
0.97 |
JSONL_Parse_Large |
14886120.535714602 ns/iter |
13341380.388889471 ns/iter |
1.12 |
JSONL_Parse_Large_GZIP |
14740610.229165914 ns/iter |
15510367.187497802 ns/iter |
0.95 |
HTML_Build_Table_100000 |
69352484.81817718 ns/iter |
81925208.33341001 ns/iter |
0.85 |
HTML_Render_Table_100000 |
3963847.465517075 ns/iter |
4457718.623452371 ns/iter |
0.89 |
GZIP_Compress_ISO_Language_Set_3_Locations |
29149845.499998625 ns/iter |
30338094.21741234 ns/iter |
0.96 |
GZIP_Decompress_ISO_Language_Set_3_Locations |
5799471.792307451 ns/iter |
5344925.150442964 ns/iter |
1.09 |
GZIP_Compress_ISO_Language_Set_3_Schema |
1710411.123853178 ns/iter |
1549140.2489366939 ns/iter |
1.10 |
GZIP_Decompress_ISO_Language_Set_3_Schema |
295717.47242797806 ns/iter |
284083.6103949304 ns/iter |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Benchmark (linux/gcc)
Details
| Benchmark suite | Current: 4c55c0c | Previous: fe6cf2d | Ratio |
|---|---|---|---|
GZIP_Compress_ISO_Language_Set_3_Locations |
40261838.05882176 ns/iter |
40041539.117647275 ns/iter |
1.01 |
GZIP_Decompress_ISO_Language_Set_3_Locations |
4541002.863636446 ns/iter |
4605534.526666588 ns/iter |
0.99 |
GZIP_Compress_ISO_Language_Set_3_Schema |
2311141.5610561897 ns/iter |
2301977.6973685534 ns/iter |
1.00 |
GZIP_Decompress_ISO_Language_Set_3_Schema |
290272.48951648315 ns/iter |
293980.9475638042 ns/iter |
0.99 |
HTML_Build_Table_100000 |
71126077.20000029 ns/iter |
71440075.99999896 ns/iter |
1.00 |
HTML_Render_Table_100000 |
1972115.1977400861 ns/iter |
1975591.8142855729 ns/iter |
1.00 |
JSONL_Parse_Large |
14854807.70212752 ns/iter |
15126402.413043054 ns/iter |
0.98 |
JSONL_Parse_Large_GZIP |
16168695.813954126 ns/iter |
16386498.627906444 ns/iter |
0.99 |
URITemplateRouter_Create |
30105.908923050392 ns/iter |
29467.331319487286 ns/iter |
1.02 |
URITemplateRouter_Match |
158.48364248670052 ns/iter |
158.33991055403771 ns/iter |
1.00 |
URITemplateRouter_Match_BasePath |
178.65191528947594 ns/iter |
188.23309027181415 ns/iter |
0.95 |
URITemplateRouterView_Restore |
8780.722534655526 ns/iter |
8697.531089490003 ns/iter |
1.01 |
URITemplateRouterView_Match |
124.71441056725624 ns/iter |
124.64464974137634 ns/iter |
1.00 |
URITemplateRouterView_Match_BasePath |
143.57029533208058 ns/iter |
143.8431469566393 ns/iter |
1.00 |
URITemplateRouterView_Arguments |
462.15399137373646 ns/iter |
458.34000529303717 ns/iter |
1.01 |
Pointer_Object_Traverse |
33.754101940182984 ns/iter |
34.11538536194793 ns/iter |
0.99 |
Pointer_Object_Try_Traverse |
21.969245500032457 ns/iter |
21.98696670100176 ns/iter |
1.00 |
Pointer_Push_Back_Pointer_To_Weak_Pointer |
167.32138354448443 ns/iter |
136.08483082777326 ns/iter |
1.23 |
Pointer_Walker_Schema_ISO_Language |
3627030.7564766696 ns/iter |
3635916.3928569634 ns/iter |
1.00 |
Pointer_Maybe_Tracked_Deeply_Nested/0 |
1829168.1540469178 ns/iter |
1861357.4031830325 ns/iter |
0.98 |
Pointer_Maybe_Tracked_Deeply_Nested/1 |
1809524.9945944843 ns/iter |
1883748.8894878835 ns/iter |
0.96 |
Pointer_Position_Tracker_Get_Deeply_Nested |
582.2039490675976 ns/iter |
572.8589182353895 ns/iter |
1.02 |
JSON_Array_Of_Objects_Unique |
423.4444651157079 ns/iter |
419.95133613048284 ns/iter |
1.01 |
JSON_Parse_1 |
9734.580814607532 ns/iter |
9789.565024492082 ns/iter |
0.99 |
JSON_Parse_Real |
13415.988973267426 ns/iter |
13137.60996912794 ns/iter |
1.02 |
JSON_Parse_Decimal |
17515.464438069506 ns/iter |
17057.58143893284 ns/iter |
1.03 |
JSON_Parse_Schema_ISO_Language |
5636026.191999917 ns/iter |
5774721.804878174 ns/iter |
0.98 |
JSON_Fast_Hash_Helm_Chart_Lock |
65.43523747868925 ns/iter |
61.30852823091589 ns/iter |
1.07 |
JSON_Equality_Helm_Chart_Lock |
166.17909266953635 ns/iter |
172.31138431390164 ns/iter |
0.96 |
JSON_Divisible_By_Decimal |
229.87220494443784 ns/iter |
231.82153448512548 ns/iter |
0.99 |
JSON_String_Equal/10 |
6.019489214781557 ns/iter |
6.114412830850055 ns/iter |
0.98 |
JSON_String_Equal/100 |
6.661200246052146 ns/iter |
6.966122002438372 ns/iter |
0.96 |
JSON_String_Equal_Small_By_Perfect_Hash/10 |
0.7102181157263903 ns/iter |
0.7111757497844186 ns/iter |
1.00 |
JSON_String_Equal_Small_By_Runtime_Perfect_Hash/10 |
33.11288454341117 ns/iter |
21.951220280982454 ns/iter |
1.51 |
JSON_String_Fast_Hash/10 |
1.0556554693308242 ns/iter |
1.7584711619715003 ns/iter |
0.60 |
JSON_String_Fast_Hash/100 |
1.0564220738993684 ns/iter |
1.7587519360645085 ns/iter |
0.60 |
JSON_String_Key_Hash/10 |
8.433621739872454 ns/iter |
1.0850968900875846 ns/iter |
7.77 |
JSON_String_Key_Hash/100 |
15.514774636683413 ns/iter |
14.767415853664732 ns/iter |
1.05 |
JSON_Object_Defines_Miss_Same_Length |
3.516090287837044 ns/iter |
3.8886116862600515 ns/iter |
0.90 |
JSON_Object_Defines_Miss_Too_Small |
3.5182233161560816 ns/iter |
3.5169091264663384 ns/iter |
1.00 |
JSON_Object_Defines_Miss_Too_Large |
3.51688198986014 ns/iter |
4.219610344057994 ns/iter |
0.83 |
Regex_Lower_S_Or_Upper_S_Asterisk |
0.7032161815694277 ns/iter |
0.7038214032873846 ns/iter |
1.00 |
Regex_Caret_Lower_S_Or_Upper_S_Asterisk_Dollar |
1.0548457424239046 ns/iter |
0.7042668836261515 ns/iter |
1.50 |
Regex_Period_Asterisk |
1.0555230497882504 ns/iter |
1.055477932419015 ns/iter |
1.00 |
Regex_Group_Period_Asterisk_Group |
0.7037482923437446 ns/iter |
1.0552310607384283 ns/iter |
0.67 |
Regex_Period_Plus |
0.7032836356145435 ns/iter |
0.7032150387981557 ns/iter |
1.00 |
Regex_Period |
1.0551279484804794 ns/iter |
0.703289696699089 ns/iter |
1.50 |
Regex_Caret_Period_Plus_Dollar |
1.054848982473602 ns/iter |
1.0564138638199936 ns/iter |
1.00 |
Regex_Caret_Group_Period_Plus_Group_Dollar |
0.7043797333228763 ns/iter |
1.0576392341948366 ns/iter |
0.67 |
Regex_Caret_Period_Asterisk_Dollar |
0.7033039879325955 ns/iter |
0.7034214714247459 ns/iter |
1.00 |
Regex_Caret_Group_Period_Asterisk_Group_Dollar |
1.0549028994127345 ns/iter |
0.705503404640222 ns/iter |
1.50 |
Regex_Caret_X_Hyphen |
3.8702000581631486 ns/iter |
3.8698630002216294 ns/iter |
1.00 |
Regex_Period_Md_Dollar |
34.84538645647904 ns/iter |
33.67427796527835 ns/iter |
1.03 |
Regex_Caret_Slash_Period_Asterisk |
4.220936568174724 ns/iter |
4.588304921895854 ns/iter |
0.92 |
Regex_Caret_Period_Range_Dollar |
1.4072402244447066 ns/iter |
0.8472261272338164 ns/iter |
1.66 |
Regex_Nested_Backtrack |
50.82182201075506 ns/iter |
38.70772641010307 ns/iter |
1.31 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Benchmark (linux/llvm)
Details
| Benchmark suite | Current: 4c55c0c | Previous: fe6cf2d | Ratio |
|---|---|---|---|
Regex_Lower_S_Or_Upper_S_Asterisk |
2.4650507383092317 ns/iter |
2.2198003208633903 ns/iter |
1.11 |
Regex_Caret_Lower_S_Or_Upper_S_Asterisk_Dollar |
2.463608150122335 ns/iter |
2.198458350128735 ns/iter |
1.12 |
Regex_Period_Asterisk |
2.4614233643597903 ns/iter |
2.19241095280019 ns/iter |
1.12 |
Regex_Group_Period_Asterisk_Group |
2.4618138651480286 ns/iter |
2.1996050096327884 ns/iter |
1.12 |
Regex_Period_Plus |
3.86907116849953 ns/iter |
2.854730033648928 ns/iter |
1.36 |
Regex_Period |
3.866576885184666 ns/iter |
2.8017253081883178 ns/iter |
1.38 |
Regex_Caret_Period_Plus_Dollar |
3.514941482711589 ns/iter |
2.488938806721525 ns/iter |
1.41 |
Regex_Caret_Group_Period_Plus_Group_Dollar |
3.5150249578341564 ns/iter |
2.488702213374691 ns/iter |
1.41 |
Regex_Caret_Period_Asterisk_Dollar |
2.8137130384548548 ns/iter |
3.4306951171617124 ns/iter |
0.82 |
Regex_Caret_Group_Period_Asterisk_Group_Dollar |
2.815298065016845 ns/iter |
3.426044396606113 ns/iter |
0.82 |
Regex_Caret_X_Hyphen |
7.034240573310137 ns/iter |
6.5377451740812 ns/iter |
1.08 |
Regex_Period_Md_Dollar |
28.3189344397634 ns/iter |
27.95253783293417 ns/iter |
1.01 |
Regex_Caret_Slash_Period_Asterisk |
7.58070485061698 ns/iter |
5.911218873540801 ns/iter |
1.28 |
Regex_Caret_Period_Range_Dollar |
4.219122968067906 ns/iter |
2.801286242958691 ns/iter |
1.51 |
Regex_Nested_Backtrack |
39.650546909144545 ns/iter |
37.00628813318486 ns/iter |
1.07 |
JSON_Array_Of_Objects_Unique |
476.91391549253535 ns/iter |
493.1990916423395 ns/iter |
0.97 |
JSON_Parse_1 |
6497.949111769836 ns/iter |
6883.968445259204 ns/iter |
0.94 |
JSON_Parse_Real |
11404.196730124368 ns/iter |
12285.614182630668 ns/iter |
0.93 |
JSON_Parse_Decimal |
11668.763044200885 ns/iter |
11672.939662495113 ns/iter |
1.00 |
JSON_Parse_Schema_ISO_Language |
3871040.2651933683 ns/iter |
3847256.6229508426 ns/iter |
1.01 |
JSON_Fast_Hash_Helm_Chart_Lock |
68.93804361170433 ns/iter |
66.33043432490602 ns/iter |
1.04 |
JSON_Equality_Helm_Chart_Lock |
162.87715924436233 ns/iter |
179.80854714302092 ns/iter |
0.91 |
JSON_Divisible_By_Decimal |
241.862831826369 ns/iter |
254.46917525439414 ns/iter |
0.95 |
JSON_String_Equal/10 |
5.985099911874573 ns/iter |
6.546957600664495 ns/iter |
0.91 |
JSON_String_Equal/100 |
6.698137545060451 ns/iter |
7.202064605366773 ns/iter |
0.93 |
JSON_String_Equal_Small_By_Perfect_Hash/10 |
1.0550564513630294 ns/iter |
0.9396139502637549 ns/iter |
1.12 |
JSON_String_Equal_Small_By_Runtime_Perfect_Hash/10 |
14.200225980275594 ns/iter |
10.655258581033351 ns/iter |
1.33 |
JSON_String_Fast_Hash/10 |
2.4633314027270425 ns/iter |
3.1188048087952156 ns/iter |
0.79 |
JSON_String_Fast_Hash/100 |
2.467796862417339 ns/iter |
3.1198627608367344 ns/iter |
0.79 |
JSON_String_Key_Hash/10 |
3.1659038331140996 ns/iter |
2.1822113288100478 ns/iter |
1.45 |
JSON_String_Key_Hash/100 |
7.732744738736636 ns/iter |
6.536700322475011 ns/iter |
1.18 |
JSON_Object_Defines_Miss_Same_Length |
2.911516029447275 ns/iter |
2.6996137559262983 ns/iter |
1.08 |
JSON_Object_Defines_Miss_Too_Small |
2.938424340142355 ns/iter |
2.7182877033237234 ns/iter |
1.08 |
JSON_Object_Defines_Miss_Too_Large |
4.2222289574462115 ns/iter |
3.73996472279293 ns/iter |
1.13 |
Pointer_Object_Traverse |
25.898146496198002 ns/iter |
25.844070403919893 ns/iter |
1.00 |
Pointer_Object_Try_Traverse |
30.400793969890906 ns/iter |
28.453496964490558 ns/iter |
1.07 |
Pointer_Push_Back_Pointer_To_Weak_Pointer |
167.24761797811522 ns/iter |
159.38505647582804 ns/iter |
1.05 |
Pointer_Walker_Schema_ISO_Language |
3288718.7083332497 ns/iter |
3039739.1739128307 ns/iter |
1.08 |
Pointer_Maybe_Tracked_Deeply_Nested/0 |
1491159.0149572417 ns/iter |
1498890.0085289618 ns/iter |
0.99 |
Pointer_Maybe_Tracked_Deeply_Nested/1 |
1777645.949109521 ns/iter |
1827065.2829131973 ns/iter |
0.97 |
Pointer_Position_Tracker_Get_Deeply_Nested |
739.004843529857 ns/iter |
683.9722494663005 ns/iter |
1.08 |
URITemplateRouter_Create |
29834.7036926247 ns/iter |
31722.43273169462 ns/iter |
0.94 |
URITemplateRouter_Match |
187.2295362769122 ns/iter |
171.9222357503655 ns/iter |
1.09 |
URITemplateRouter_Match_BasePath |
230.24839146268167 ns/iter |
196.32056042959192 ns/iter |
1.17 |
URITemplateRouterView_Restore |
8415.140331076096 ns/iter |
7758.819577353205 ns/iter |
1.08 |
URITemplateRouterView_Match |
144.54215300138085 ns/iter |
143.35059621030373 ns/iter |
1.01 |
URITemplateRouterView_Match_BasePath |
166.26694907256848 ns/iter |
161.36200172850994 ns/iter |
1.03 |
URITemplateRouterView_Arguments |
449.6218893636655 ns/iter |
436.18736205292555 ns/iter |
1.03 |
JSONL_Parse_Large |
11129875.46031895 ns/iter |
11789167.47457637 ns/iter |
0.94 |
JSONL_Parse_Large_GZIP |
12331343.210526189 ns/iter |
13061004.722222455 ns/iter |
0.94 |
HTML_Build_Table_100000 |
88859920.87499517 ns/iter |
71758292.90908743 ns/iter |
1.24 |
HTML_Render_Table_100000 |
4879675.414286209 ns/iter |
5186473.073529348 ns/iter |
0.94 |
GZIP_Compress_ISO_Language_Set_3_Locations |
36571137.6315762 ns/iter |
33792010.95237607 ns/iter |
1.08 |
GZIP_Decompress_ISO_Language_Set_3_Locations |
4680286.313333303 ns/iter |
5073548.710000751 ns/iter |
0.92 |
GZIP_Compress_ISO_Language_Set_3_Schema |
2134309.1951221335 ns/iter |
1881182.1505376853 ns/iter |
1.13 |
GZIP_Decompress_ISO_Language_Set_3_Schema |
290245.4736842034 ns/iter |
380070.4193373255 ns/iter |
0.76 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Benchmark (windows/msvc)
Details
| Benchmark suite | Current: 4c55c0c | Previous: fe6cf2d | Ratio |
|---|---|---|---|
Regex_Lower_S_Or_Upper_S_Asterisk |
5.068836607143064 ns/iter |
6.225971000001209 ns/iter |
0.81 |
Regex_Caret_Lower_S_Or_Upper_S_Asterisk_Dollar |
5.063029464286574 ns/iter |
6.2797732142862674 ns/iter |
0.81 |
Regex_Period_Asterisk |
5.048775892857651 ns/iter |
5.914817857142144 ns/iter |
0.85 |
Regex_Group_Period_Asterisk_Group |
5.065031250000272 ns/iter |
5.976576785714981 ns/iter |
0.85 |
Regex_Period_Plus |
4.904897000000119 ns/iter |
5.611151999999038 ns/iter |
0.87 |
Regex_Period |
4.702506037282329 ns/iter |
5.452118081574837 ns/iter |
0.86 |
Regex_Caret_Period_Plus_Dollar |
4.914400457398602 ns/iter |
5.729045535714151 ns/iter |
0.86 |
Regex_Caret_Group_Period_Plus_Group_Dollar |
4.8281157600061 ns/iter |
5.259407621015488 ns/iter |
0.92 |
Regex_Caret_Period_Asterisk_Dollar |
5.079954999999927 ns/iter |
6.219507000000704 ns/iter |
0.82 |
Regex_Caret_Group_Period_Asterisk_Group_Dollar |
5.196465178570975 ns/iter |
6.117930357143158 ns/iter |
0.85 |
Regex_Caret_X_Hyphen |
8.449755765848296 ns/iter |
9.902397277222768 ns/iter |
0.85 |
Regex_Period_Md_Dollar |
46.03975874071787 ns/iter |
90.44502721228666 ns/iter |
0.51 |
Regex_Caret_Slash_Period_Asterisk |
8.016523437499908 ns/iter |
9.114747767858335 ns/iter |
0.88 |
Regex_Caret_Period_Range_Dollar |
5.647320000000491 ns/iter |
6.526906000001417 ns/iter |
0.87 |
Regex_Nested_Backtrack |
55.558480000001964 ns/iter |
88.85487723213308 ns/iter |
0.63 |
JSON_Array_Of_Objects_Unique |
483.8658114908796 ns/iter |
650.8888000000752 ns/iter |
0.74 |
JSON_Parse_1 |
11784.694642857728 ns/iter |
12914.648214287385 ns/iter |
0.91 |
JSON_Parse_Real |
18814.38405700038 ns/iter |
20103.801288376642 ns/iter |
0.94 |
JSON_Parse_Decimal |
17932.14314413679 ns/iter |
17357.536769218073 ns/iter |
1.03 |
JSON_Parse_Schema_ISO_Language |
7829558.888888895 ns/iter |
8068633.33333341 ns/iter |
0.97 |
JSON_Fast_Hash_Helm_Chart_Lock |
64.38996651785902 ns/iter |
80.32225087846311 ns/iter |
0.80 |
JSON_Equality_Helm_Chart_Lock |
303.9102674207533 ns/iter |
298.04855602133495 ns/iter |
1.02 |
JSON_Divisible_By_Decimal |
301.42707796576804 ns/iter |
408.2167540974293 ns/iter |
0.74 |
JSON_String_Equal/10 |
16.268894310699793 ns/iter |
15.361986607140516 ns/iter |
1.06 |
JSON_String_Equal/100 |
16.660455022364133 ns/iter |
23.033800000000326 ns/iter |
0.72 |
JSON_String_Equal_Small_By_Perfect_Hash/10 |
2.5112417857144465 ns/iter |
2.558883814108603 ns/iter |
0.98 |
JSON_String_Equal_Small_By_Runtime_Perfect_Hash/10 |
38.12281003352012 ns/iter |
14.30066846294256 ns/iter |
2.67 |
JSON_String_Fast_Hash/10 |
4.736009777842688 ns/iter |
4.654929464286234 ns/iter |
1.02 |
JSON_String_Fast_Hash/100 |
4.7067883928570655 ns/iter |
6.218496999999843 ns/iter |
0.76 |
JSON_String_Key_Hash/10 |
17.212669642857136 ns/iter |
6.071008928572042 ns/iter |
2.84 |
JSON_String_Key_Hash/100 |
15.947207589286045 ns/iter |
19.057504049881167 ns/iter |
0.84 |
JSON_Object_Defines_Miss_Same_Length |
4.092758087665319 ns/iter |
4.070796792983468 ns/iter |
1.01 |
JSON_Object_Defines_Miss_Too_Small |
4.080800409072509 ns/iter |
5.585132142856862 ns/iter |
0.73 |
JSON_Object_Defines_Miss_Too_Large |
4.397130167775001 ns/iter |
4.208353190084879 ns/iter |
1.04 |
Pointer_Object_Traverse |
63.77180357142314 ns/iter |
65.92736607142768 ns/iter |
0.97 |
Pointer_Object_Try_Traverse |
71.00738392856394 ns/iter |
78.34411830359552 ns/iter |
0.91 |
Pointer_Push_Back_Pointer_To_Weak_Pointer |
160.6261160714289 ns/iter |
178.5425342774114 ns/iter |
0.90 |
Pointer_Walker_Schema_ISO_Language |
12454507.14285669 ns/iter |
14540711.999998167 ns/iter |
0.86 |
Pointer_Maybe_Tracked_Deeply_Nested/0 |
2435355.421686971 ns/iter |
2587425.8928563367 ns/iter |
0.94 |
Pointer_Maybe_Tracked_Deeply_Nested/1 |
3838555.8974357243 ns/iter |
3670630.985915279 ns/iter |
1.05 |
Pointer_Position_Tracker_Get_Deeply_Nested |
709.7150669643781 ns/iter |
726.5830684898559 ns/iter |
0.98 |
URITemplateRouter_Create |
43976.42981980424 ns/iter |
43351.039595008966 ns/iter |
1.01 |
URITemplateRouter_Match |
193.56923156868146 ns/iter |
217.69102481131566 ns/iter |
0.89 |
URITemplateRouter_Match_BasePath |
218.34468750000724 ns/iter |
249.9976785713898 ns/iter |
0.87 |
URITemplateRouterView_Restore |
31957.47487170696 ns/iter |
23475.83571428556 ns/iter |
1.36 |
URITemplateRouterView_Match |
156.55232142857452 ns/iter |
172.55769905518778 ns/iter |
0.91 |
URITemplateRouterView_Match_BasePath |
173.50887992246186 ns/iter |
193.0510886652807 ns/iter |
0.90 |
URITemplateRouterView_Arguments |
516.9826000000057 ns/iter |
546.0669642857852 ns/iter |
0.95 |
JSONL_Parse_Large |
35378915.00000115 ns/iter |
35540905.26315696 ns/iter |
1.00 |
JSONL_Parse_Large_GZIP |
34956490.47618783 ns/iter |
35467264.99999977 ns/iter |
0.99 |
HTML_Build_Table_100000 |
91941611.11111043 ns/iter |
96829542.85715692 ns/iter |
0.95 |
HTML_Render_Table_100000 |
7805035.555555959 ns/iter |
8236934.444446181 ns/iter |
0.95 |
GZIP_Compress_ISO_Language_Set_3_Locations |
40959870.588233784 ns/iter |
43908339.999992065 ns/iter |
0.93 |
GZIP_Decompress_ISO_Language_Set_3_Locations |
10596779.687499946 ns/iter |
11513398.21428467 ns/iter |
0.92 |
GZIP_Compress_ISO_Language_Set_3_Schema |
2233997.8125000214 ns/iter |
2508677.142856998 ns/iter |
0.89 |
GZIP_Decompress_ISO_Language_Set_3_Schema |
651329.9107142854 ns/iter |
685855.5357143002 ns/iter |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
Address cubic/augment reviewer feedback on #2467: the prior commit only set `SOURCEMETA_CXX_CLANG_TIDY` in the cache when it was unset, so an existing build directory whose cache already pointed at a real clang-tidy binary would silently bypass the carve-out and keep hitting the Xcode 16 `arm_neon.h` parser error on incremental builds. Emit a `CMake WARNING` in that situation directing the developer to either `cmake --fresh` or pass `-DSOURCEMETA_CXX_CLANG_TIDY=/usr/bin/true` explicitly. Fresh CI configures hit the original `NOT SOURCEMETA_CXX_CLANG_TIDY` branch and get the intended no-op behaviour as before. Signed-off-by: SouptikH <haldersouptik@gmail.com>
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
Address cubic/augment reviewer feedback on #2467: the prior commit only set `SOURCEMETA_CXX_CLANG_TIDY` in the cache when it was unset, so an existing build directory whose cache already pointed at a real clang-tidy binary would silently bypass the carve-out and keep hitting the Xcode 16 `arm_neon.h` parser error on incremental builds. Emit a `CMake WARNING` in that situation directing the developer to either `cmake --fresh` or pass `-DSOURCEMETA_CXX_CLANG_TIDY=/usr/bin/true` explicitly. Fresh CI configures hit the original `NOT SOURCEMETA_CXX_CLANG_TIDY` branch and get the intended no-op behaviour as before. Signed-off-by: SouptikH <haldersouptik@gmail.com>
cc45a6a to
4c55c0c
Compare
|
@SouptikH Awesome stuff! Take a look at the benchmark comments. Mainly the Linux/GCC. That's the one we care about the most as our actual APIs run on that configuration. This part is interesting. Seems like some cases get faster but some get way slower?
What I recommend is iterating with the info that these benchmark comments give you. Maybe there is something making things slower in the other case, etc. |
|
What I find interesting is that it spikes on the 10 character one specifically. Maybe another thing to look into is the implementation of |
|
Also, what I found when digging into similar things before, is that the actual standard library version can differ a lot. i.e. there are Docker images for GCC for different versions of it which ship different versions of the standard library too, and you can get massive differences just out of those. Maybe we should come up with our own |
|
Finally, of course, if there are specific new cases you want to test, feel free to add them to the |

Summary
Replaces the byte-by-byte
memcpyinsidePropertyHashJSON::perfectwith a threshold-based hybrid SIMD path that never reads past the
input. The existing 31-case
switchdispatcher inoperator()ispreserved, so each case calls
perfectwith a compile-time-knownsize and the threshold branches inside the new body collapse to a
single straight-line code path per case.
std::memcpy(dst, src, N)memcpy(buf, src, size)reads exactlysizebytes into a zero-padded 16-byte stack buffer, then a single SIMD load+store replaces the compiler's 8+4+2+1 cascade.memcpywith no mask.Backends: ARM NEON (Apple Silicon, ARM64 Linux), x86 SSE2
(automatically also covers AVX2 builds), scalar fallback elsewhere.
Why this is safe
result(low byte ofhash.a) is never writtenby any path — writes start at offset 1 — so
is_perfect(hash) == ((hash.a & 255) == 0)keeps its semantics.std::memcpy(buf, src, size)is the only read ofsrcand it reads exactlysizebytes; the SIMD load that followsoperates on the zero-padded 16-byte stack buffer.
srcis in-bounds because thecaller already validated
size >= 16via the switch dispatcher.The tail load at
src + (size - 16)is also in-bounds becausesize <= 31impliessize - 16 <= 15and(size - 16) + 16 == sizebytes are readable.(data, size)is bitwiseidentical to the previous
memcpy-based path, so any cachedhashes embedded in compiled schemas remain comparable.
ASAN
Built with
-DBLAZE_ADDRESS_SANITIZER=ONand ran the full Blazetest suite (codegen tests excluded; they require
npm cifor thetscbinary, packaging tests excluded; they requirecmake --install).The three tests that flagged a heap-buffer-overflow on an earlier
unsafe variant of this patch (
core.json,core.jsonpointer,core.uritemplate) are all green under ASAN with this version.Performance
Measured on
sourcemeta/blaze's ownE2E_Evaluatorbenchmark(41 real-world schemas, batched JSONL instances, 3 repetitions of
each benchmark, median reported) on Apple M1 / macOS / Release
build.
yamllint-23.35 %jsconfig+0.57 % (within noise)The improvement is broad-based across the 4-order-of-magnitude
benchmark size range (
yamllint~7 µs throughgeojson~9.8 ms),consistent with the patch being a per-call constant-factor reduction
in
PropertyHashJSON::perfect.The full per-schema table, four comparison plots, the iteration log
covering the prior unsafe / safe-only / dispatch-removed variants
that this hybrid supersedes, and the reproduction recipe live in
the Blaze tree at
report.mdandbenchmarks_results/.What was tested locally
ctest -E "codegen|packaging"-DBLAZE_ADDRESS_SANITIZER=ONctest -E "codegen|packaging"sourcemeta_blaze_benchmark --benchmark_filter=E2E_Evaluator --benchmark_repetitions=3Test plan
AVX2 256-bit load tried in an earlier revision is intentionally
not present here because it tripped GCC's
-Warray-bounds=)_M_X64)corebenchmarks