perf: v0.12.20 — premultiplied StateIDs, break-at-match by kolkov · Pull Request #154 · coregx/coregex

kolkov · 2026-03-25T18:50:20Z

Summary

Premultiplied/tagged StateIDs + Rust-aligned DFA determinize with break-at-match + Phase 3 elimination. 27 files, +734 -583 lines.

DFA Core

Premultiplied StateIDs — eliminate sid * stride multiply from hot loop (Rust LazyStateID)
Tagged StateIDs — match/dead/invalid in high bits, single IsTagged() branch
4x loop unrolling in all DFA search functions
1-byte match delay — Rust determinize approach (mod.rs:254-286)
Break-at-match — Rust determinize::next break semantics, replaces filterStatesAfterMatch
Epsilon closure rewrite — add-on-pop DFS, reverse Split push, incremental per-target (verified vs Rust cargo run)
Phase 3 eliminated — bidirectional DFA reduced from 3-pass to 2-pass

Meta Engine

DFA direct FindAll path (skip meta prefilter layer)
Anchored FindAll fast paths (skip pool overhead, first-byte rejection)
BreakAtMatch config: true for forward DFA, false for reverse DFA
Fix: dfaConfig now uses DefaultConfig() to inherit BreakAtMatch=true

NFA/Prefilter

Lazy SlotTable init, anchored BT large input fix
Memmem: Memchr(rareByte) + verify (Rust approach)

Benchmarks (EPYC CI, 6MB input)

Pattern	vs stdlib	vs Rust
ip	675x	18.5x faster
multiline_php	288x	2.0x faster
char_class	11x	1.3x faster
inner_literal	668x	~parity
email	506x	1.8x slower
LangArena total (13 pat)	30x	3.9x gap

No regressions vs v0.12.19 on any pattern.

Verification

go test ./... — all 9 packages pass
gofmt -l — clean
golangci-lint run — clean (only dupl on intentional DFS duplication)
DFA SearchAt verified against Rust regex-automata find_fwd — identical on 7 patterns
regex-bench CI green (EPYC 9V74 + EPYC 7763)
Regression check: tokens +0.3%, peak_hours +5.2% (within noise)

codecov · 2026-03-25T18:52:41Z

Codecov Report

❌ Patch coverage is 78.26087% with 60 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
dfa/lazy/lazy.go	71.57%	20 Missing and 7 partials ⚠️
dfa/lazy/cache.go	69.56%	4 Missing and 3 partials ⚠️
meta/find_indices.go	14.28%	4 Missing and 2 partials ⚠️
meta/engine.go	0.00%	5 Missing ⚠️
meta/findall.go	78.26%	2 Missing and 3 partials ⚠️
dfa/lazy/builder.go	91.83%	2 Missing and 2 partials ⚠️
dfa/lazy/state.go	93.75%	2 Missing ⚠️
nfa/pikevm.go	83.33%	1 Missing and 1 partial ⚠️
nfa/slot_table.go	66.66%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-03-25T18:57:16Z

Benchmark Comparison

Comparing main → PR #154

Summary: geomean 84.33n 81.26n -3.64%

⚠️ Potential regressions detected:

Accelerate/memchr1-4       103.2n ± ∞ ¹   140.5n ± ∞ ¹  +36.14% (p=0.008 n=5)
Accelerate/memchr3-4       101.8n ± ∞ ¹   112.8n ± ∞ ¹  +10.81% (p=0.016 n=5)
geomean                               ³                +0.00%               ³
geomean                               ³                +0.00%               ³
geomean              36.19n         36.22n        +0.08%
geomean                         ³                +0.00%               ³
geomean                         ³                +0.00%               ³
AnchoredLiteralVsStdlib/stdlib_short-4                  258.0n ± ∞ ¹    259.1n ± ∞ ¹     +0.43% (p=0.024 n=5)
AnchoredLiteralVsStdlib/stdlib_medium-4                 369.5n ± ∞ ¹    382.2n ± ∞ ¹     +3.44% (p=0.016 n=5)
AnchoredLiteralVsStdlib/coregex_no_match-4              5.645n ± ∞ ¹    6.346n ± ∞ ¹    +12.42% (p=0.008 n=5)

Full results available in workflow artifacts. CI runners have ~10-20% variance.
For accurate benchmarks, run locally: ./scripts/bench.sh --compare

…ination DFA Core — Premultiplied + Tagged StateIDs: - StateID stores byte offset into flatTrans, eliminating multiply from hot loop - Match/dead/invalid flags encoded in StateID high bits (single IsTagged branch) - 4x loop unrolling in searchFirstAt, searchAt, searchEarliestMatch - safeOffset eliminated from all DFA search paths DFA Core — Rust-aligned Determinize: - 1-byte match delay (Rust determinize mod.rs:254-286) - Break-at-match: stop NFA iteration at Match state, drop prefix restarts - Epsilon closure rewrite: add-on-pop DFS with reverse Split push order, matching Rust sparse set insertion order (verified via cargo run) - Incremental per-target epsilon closure in moveWithWordContext - filterStatesAfterMatch removed (replaced by break-at-match) - BreakAtMatch config: true for forward DFA, false for reverse DFA - Phase 3 (SearchAtAnchored re-scan) eliminated — 2-pass bidirectional DFA - Fix: meta dfaConfig uses DefaultConfig() to inherit BreakAtMatch=true Meta Engine: - DFA direct FindAll path — skip meta prefilter layer, call DFA directly - Fast path for start-anchored FindAll — skip pool overhead - Inline first-byte rejection for anchored patterns - Prefilter candidate pass-through to bidirectional DFA - Skip reverse DFA for always-anchored patterns NFA/PikeVM: - Lazy SlotTable init — reduce cold start overhead - Fix anchored BoundedBacktracker on large input — truncate to MaxInputSize Prefilter: - Memmem: Memchr(rareByte) + verify (Rust approach) — replaces MemchrPair Benchmarks (EPYC CI, 6MB input, vs stdlib / vs Rust): - ip: 675x faster than stdlib, 18.5x faster than Rust - multiline_php: 288x faster than stdlib, 2.0x faster than Rust - char_class: 11x faster than stdlib, 1.3x faster than Rust - inner_literal: 668x faster than stdlib, at Rust parity - email: 506x faster than stdlib - LangArena total: 30x faster than stdlib, 3.9x gap vs Rust 27 files changed, +734 -583 lines. All tests pass.

kolkov force-pushed the release/v0.12.20 branch from a8e8632 to d22c05c Compare March 25, 2026 19:06

kolkov merged commit 90d77fd into main Mar 25, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: v0.12.20 — premultiplied StateIDs, break-at-match#154

perf: v0.12.20 — premultiplied StateIDs, break-at-match#154
kolkov merged 1 commit intomainfrom
release/v0.12.20

kolkov commented Mar 25, 2026

Uh oh!

codecov bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kolkov commented Mar 25, 2026

Summary

DFA Core

Meta Engine

NFA/Prefilter

Benchmarks (EPYC CI, 6MB input)

Verification

Uh oh!

codecov bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Mar 25, 2026 •

edited

Loading