perf: v0.12.20 — premultiplied StateIDs, break-at-match#154
Merged
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Benchmark ComparisonComparing Summary:
|
…ination DFA Core — Premultiplied + Tagged StateIDs: - StateID stores byte offset into flatTrans, eliminating multiply from hot loop - Match/dead/invalid flags encoded in StateID high bits (single IsTagged branch) - 4x loop unrolling in searchFirstAt, searchAt, searchEarliestMatch - safeOffset eliminated from all DFA search paths DFA Core — Rust-aligned Determinize: - 1-byte match delay (Rust determinize mod.rs:254-286) - Break-at-match: stop NFA iteration at Match state, drop prefix restarts - Epsilon closure rewrite: add-on-pop DFS with reverse Split push order, matching Rust sparse set insertion order (verified via cargo run) - Incremental per-target epsilon closure in moveWithWordContext - filterStatesAfterMatch removed (replaced by break-at-match) - BreakAtMatch config: true for forward DFA, false for reverse DFA - Phase 3 (SearchAtAnchored re-scan) eliminated — 2-pass bidirectional DFA - Fix: meta dfaConfig uses DefaultConfig() to inherit BreakAtMatch=true Meta Engine: - DFA direct FindAll path — skip meta prefilter layer, call DFA directly - Fast path for start-anchored FindAll — skip pool overhead - Inline first-byte rejection for anchored patterns - Prefilter candidate pass-through to bidirectional DFA - Skip reverse DFA for always-anchored patterns NFA/PikeVM: - Lazy SlotTable init — reduce cold start overhead - Fix anchored BoundedBacktracker on large input — truncate to MaxInputSize Prefilter: - Memmem: Memchr(rareByte) + verify (Rust approach) — replaces MemchrPair Benchmarks (EPYC CI, 6MB input, vs stdlib / vs Rust): - ip: 675x faster than stdlib, 18.5x faster than Rust - multiline_php: 288x faster than stdlib, 2.0x faster than Rust - char_class: 11x faster than stdlib, 1.3x faster than Rust - inner_literal: 668x faster than stdlib, at Rust parity - email: 506x faster than stdlib - LangArena total: 30x faster than stdlib, 3.9x gap vs Rust 27 files changed, +734 -583 lines. All tests pass.
a8e8632 to
d22c05c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Premultiplied/tagged StateIDs + Rust-aligned DFA determinize with break-at-match + Phase 3 elimination. 27 files, +734 -583 lines.
DFA Core
sid * stridemultiply from hot loop (RustLazyStateID)IsTagged()branchdeterminize::nextbreak semantics, replacesfilterStatesAfterMatchcargo run)Meta Engine
BreakAtMatchconfig: true for forward DFA, false for reverse DFAdfaConfignow usesDefaultConfig()to inheritBreakAtMatch=trueNFA/Prefilter
Benchmarks (EPYC CI, 6MB input)
No regressions vs v0.12.19 on any pattern.
Verification
go test ./...— all 9 packages passgofmt -l— cleangolangci-lint run— clean (onlyduplon intentional DFS duplication)SearchAtverified against Rustregex-automatafind_fwd— identical on 7 patterns