Fair managed multi-seg: reuse a buffer instead of the ToArray API by MDA2AV · Pull Request #41 · dotnet-web-stack/Glyph11

MDA2AV · 2026-06-04T20:24:58Z

The managed multi-seg bench called TryExtractFullHeaderValidated, which does input.ToArray() per call (a heap allocation), while the FFI multi-seg bench reused a buffer (seq.CopyTo into a once-allocated array). That made the bindings look ~2x faster on multi-seg when the difference was allocation strategy, not parse speed. Now the managed multi-seg also linearizes into the reused buffer (seq.CopyTo + ROM parse), so every multi-seg path = reused-buffer linearize + parse. Result: multi-seg = contiguous + a memcpy for all, and the native-vs-managed gap matches contiguous. Managed multi-seg 32KB drops 9262 -> 5606.

The managed multi-seg bench called TryExtractFullHeaderValidated, which does input.ToArray() per call (a heap allocation), while the FFI multi-seg bench reused a buffer (seq.CopyTo into a once-allocated array). That made the bindings look ~2x faster on multi-seg when the difference was allocation strategy, not parse speed. Now the managed multi-seg also linearizes into the reused buffer (seq.CopyTo + ROM parse), so every multi-seg path = reused-buffer linearize + parse. Result: multi-seg = contiguous + a memcpy for all, and the native-vs-managed gap matches contiguous. Managed multi-seg 32KB drops 9262 -> 5606. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions

Benchmark

Details

Benchmark suite	Current: `62d6f9f`	Previous: `d3a06f0`	Ratio
`Benchmarks.FlexibleParserBenchmark.Small_ROM`	`139.72749559084573` ns (`± 0.6902930401856936`)	`139.20921897888184` ns (`± 0.3414138670448296`)	`1.00`
`Benchmarks.FlexibleParserBenchmark.Small_MultiSegment`	`354.3225266138713` ns (`± 0.8942941863343116`)	`349.4604838689168` ns (`± 2.860363450836272`)	`1.01`
`Benchmarks.FlexibleParserBenchmark.Header4K_ROM`	`694.9425118764242` ns (`± 2.3547129820667396`)	`708.5051829020182` ns (`± 2.487052061932313`)	`0.98`
`Benchmarks.FlexibleParserBenchmark.Header4K_MultiSegment`	`1778.321886698405` ns (`± 17.50093798247063`)	`1826.123291015625` ns (`± 16.460001723138515`)	`0.97`
`Benchmarks.FlexibleParserBenchmark.Header32K_ROM`	`4831.17500559489` ns (`± 26.374917042370054`)	`4949.9634958903` ns (`± 10.64757006352856`)	`0.98`
`Benchmarks.FlexibleParserBenchmark.Header32K_MultiSegment`	`12079.127970377604` ns (`± 74.47769506531404`)	`12010.515991210938` ns (`± 66.48393176840845`)	`1.01`
`Benchmarks.UltraHardenedParserBenchmark.Small_ROM`	`252.49250237147012` ns (`± 0.4901765172340976`)	`252.88829962412515` ns (`± 1.7230360417772392`)	`1.00`
`Benchmarks.UltraHardenedParserBenchmark.Small_MultiSegment`	`536.6848204930624` ns (`± 0.5783545803477097`)	`559.2204907735189` ns (`± 4.167496342803848`)	`0.96`
`Benchmarks.UltraHardenedParserBenchmark.Header4K_ROM`	`1135.0538514455159` ns (`± 2.1511206126767997`)	`1118.6354840596516` ns (`± 0.7807947528732518`)	`1.01`
`Benchmarks.UltraHardenedParserBenchmark.Header4K_MultiSegment`	`2202.86643854777` ns (`± 20.347785007036975`)	`2225.3782081604004` ns (`± 17.533121532742204`)	`0.99`
`Benchmarks.UltraHardenedParserBenchmark.Header32K_ROM`	`7217.457476298015` ns (`± 7.737945397352009`)	`7139.710075378418` ns (`± 18.12358659190381`)	`1.01`
`Benchmarks.UltraHardenedParserBenchmark.Header32K_MultiSegment`	`15316.31938680013` ns (`± 67.76101236052526`)	`15398.131754557291` ns (`± 158.60331507605568`)	`0.99`
`Benchmarks.FlexibleParserBenchmark.Small_ROM.Allocated`	`0` ns (`± 0`)	`0` ns (`± 0`)	`1`
`Benchmarks.FlexibleParserBenchmark.Small_MultiSegment.Allocated`	`112` ns (`± 0`)	`112` ns (`± 0`)	`1`
`Benchmarks.FlexibleParserBenchmark.Header4K_ROM.Allocated`	`0` ns (`± 0`)	`0` ns (`± 0`)	`1`
`Benchmarks.FlexibleParserBenchmark.Header4K_MultiSegment.Allocated`	`4128` ns (`± 0`)	`4128` ns (`± 0`)	`1`
`Benchmarks.FlexibleParserBenchmark.Header32K_ROM.Allocated`	`0` ns (`± 0`)	`0` ns (`± 0`)	`1`
`Benchmarks.FlexibleParserBenchmark.Header32K_MultiSegment.Allocated`	`32800` ns (`± 0`)	`32800` ns (`± 0`)	`1`
`Benchmarks.UltraHardenedParserBenchmark.Small_ROM.Allocated`	`0` ns (`± 0`)	`0` ns (`± 0`)	`1`
`Benchmarks.UltraHardenedParserBenchmark.Small_MultiSegment.Allocated`	`128` ns (`± 0`)	`128` ns (`± 0`)	`1`
`Benchmarks.UltraHardenedParserBenchmark.Header4K_ROM.Allocated`	`0` ns (`± 0`)	`0` ns (`± 0`)	`1`
`Benchmarks.UltraHardenedParserBenchmark.Header4K_MultiSegment.Allocated`	`4128` ns (`± 0`)	`4128` ns (`± 0`)	`1`
`Benchmarks.UltraHardenedParserBenchmark.Header32K_ROM.Allocated`	`0` ns (`± 0`)	`0` ns (`± 0`)	`1`
`Benchmarks.UltraHardenedParserBenchmark.Header32K_MultiSegment.Allocated`	`32800` ns (`± 0`)	`32800` ns (`± 0`)	`1`

This comment was automatically generated by workflow using github-action-benchmark.

Reverts the earlier 'fair' swap. Since the C core is single-slab, the binding must linearize, so that copy is the binding's real cost (counted: reused-buffer CopyTo). By the same logic the managed column must show ITS real linearization — TryExtractFullHeaderValidated, which input.ToArray()s every request — not a hand-rolled reused buffer. So multi-seg now reflects what each actually does: managed allocates the linearization buffer per request (~9200 ns @ 32KB), the bindings reuse one (~4500 ns). The copy is in both; the ~2x gap is the per-request allocation the single-slab binding avoids (a managed caller can match it by hand-rolling CopyTo+ROM). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… buffer strategy) The C# FFI does linearize on multi-segment — both paths do. The bug was that the managed column used the one-shot API (ToArray) while the FFI used a reused buffer, smuggling the linearization strategy into a parser comparison. Now every multi-seg path does CopyTo/memcpy into a reused buffer + parse, so the copy is counted identically and the column reflects the parser: multi-seg = contiguous + a memcpy for all, native ~1.2x ahead in both modes. The TryExtractFullHeaderValidated ToArray-per-request cost (~9.2us vs ~5.4us @ 32KB) is now a footnote, not a confound. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replaces the prose README with verified, copy-pasteable request-header-parsing examples for the C# library (UltraHardenedParser), the .NET binding (Glyph11Parser, zero-alloc caller storage), and the Kotlin binding (Glyph11.parse). To make the Kotlin example real, the binding now surfaces parsed headers/query as List<Glyph11Field> (name/value spans) instead of only a count. All three examples were compiled/run against the real libraries; Kotlin smoke now also asserts a header name/value. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions Bot reviewed Jun 4, 2026

View reviewed changes

MDA2AV and others added 3 commits June 4, 2026 21:51

MDA2AV merged commit a9f941a into main Jun 4, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fair managed multi-seg: reuse a buffer instead of the ToArray API#41

Fair managed multi-seg: reuse a buffer instead of the ToArray API#41
MDA2AV merged 4 commits into
mainfrom
feat/cross-bench

MDA2AV commented Jun 4, 2026

Uh oh!

github-actions Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MDA2AV commented Jun 4, 2026

Uh oh!

github-actions Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Benchmark

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot left a comment •

edited

Loading