perf: v0.12.19 — zero-alloc captures, 95% less memory#152
Merged
Conversation
Remove transitions []StateID and transitionCount from State struct. Transitions now stored exclusively in DFACache.flatTrans flat table. - Remove State.AddTransition(), Transition(), Stride(), TransitionCount() - Remove Builder.move() (unused after DetectAcceleration simplification) - Simplify DetectAcceleration/DetectAccelerationFromCached to return nil - Add DetectAccelerationFromFlat() reading from flat table - Simplify tryDetectAccelerationWithCache (flatTrans-only path) - Remove 3 redundant AddTransition calls from determinize - Update tests: add TestDetectAccelerationFromFlat, remove State transition tests Memory: ~222MB -> ~150MB (eliminates redundant per-state transition slices)
Add NewBoundedBacktrackerSmall() with 128K entries (256KB) visited capacity, matching Rust regex's default visited_capacity. UseNFA path now creates BT with small limit. When haystack exceeds BT capacity, falls back to PikeVM (correct for leftmost-first). UseBoundedBacktracker strategy retains 32M limit for POSIX longest-match. LangArena LogParser (7MB log, 13 patterns, 10 iterations): - Total alloc: 89MB -> 25MB (-72%) - RSS (Sys): 353MB -> 41MB (-88%) - errors pattern: 66MB -> 2.4MB (-96%) - Speed: no regression (113-126ms per iter)
- Remove dual transition storage (State.transitions eliminated) - Rust-aligned BT visited limit for UseNFA (128K entries = 256KB) - Memory: 89MB -> 25MB alloc (-72%), RSS 353MB -> 41MB (-88%)
Replace MaxStates (count) with CacheCapacityBytes (bytes). Default: 2MB matching Rust regex's hybrid_cache_capacity. - Add DFACache.MemoryUsage() (mirrors Rust Cache::memory_usage) - Insert checks MemoryUsage() >= capacityBytes instead of state count - Config: CacheCapacityBytes (new), MaxStates (deprecated, backward compat) - Self-adjusting: fewer states for large stride, more for small - effectiveCapacityBytes() bridges legacy MaxStates to bytes (~100B/state)
SearchWithSlotTableCapturesAt now uses SlotTable instead of legacy COW. Works for simple patterns like (foo)(bar), but greedy repetitions (a+)(b+) lose group start positions during loop iterations. Root cause: addSearchThread CopySlots overwrites capture slots on each loop iteration. Need stack-based epsilon closure with RestoreCapture frames (Rust approach) to preserve capture context through loops. TODO: Convert recursive addSearchThread to stack-based with save/restore Status: 2 NFA unit test failures, all meta tests pass (meta still on COW)
Converted addSearchThread and addSearchThreadToNext from recursive to stack-based with captureFrame (Explore + RestoreCapture frames). Mirrors Rust pikevm.rs FollowEpsilon::RestoreCapture pattern. Still failing: greedy loop captures (a+)(b+) — per-state SlotTable overwrites group start on each loop iteration (State visited again in next generation). Per-thread COW preserves all variants. Root issue: per-state storage loses capture history across byte transitions in greedy loops. Need either per-thread indexing or generation-aware slot preservation. Status: 2 NFA unit tests fail, all meta tests pass
Implement Rust-style dual SlotTable (curr/next) for capture propagation across byte transitions. Stack-based epsilon closure with RestoreCapture frames preserves capture context through greedy loops. Key changes: - Add NextSlotTable + captureStack + currSlots to PikeVMState - addSearchThread: stack-based with captureFrame (Explore + RestoreCapture) - addSearchThreadToNext: loads from curr SlotTable, writes to next - Swap SlotTable/NextSlotTable after each byte (Rust mem::swap pattern) - Don't clear Visited before seed — prevents SlotTable row overwrite - Wire meta FindSubmatch to use SlotTable path - Fix empty match capture groups (buildCapturesFromSlots) FindAllSubmatch (5 patterns, 50K matches, 800KB input): - Alloc: 554MB -> 26MB (-95%) - Mallocs: 12.5M -> 440K (-96%) - Time: 1.48s -> 0.45s (3.3x faster)
- CHANGELOG: add SlotTable capture tracking entry - OPTIMIZATIONS: add #10 Dual SlotTable (95% less memory), update version - ARCHITECTURE.md: new file documenting engine architecture, memory model, thread safety, and Rust alignment
… SlotTable - Dual SlotTable (curr/next) capture tracking (Rust approach) - Stack-based epsilon closure with RestoreCapture frames - FindAllSubmatch: 554MB -> 26MB (-95%), 3.3x faster - Updated ARCHITECTURE.md, OPTIMIZATIONS.md, README.md
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Benchmark ComparisonComparing Summary:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Major memory optimization release — Rust-aligned architecture for DFA cache and PikeVM capture tracking.
Performance Changes
CacheCapacityBytes(2MB default) replacesMaxStates. Matches Rusthybrid_cache_capacityMemory Impact (Kostya LangArena, 13 patterns, 7MB log)
Documentation
docs/ARCHITECTURE.md— engine architecture, memory model, thread safetydocs/OPTIMIZATIONS.md— added Bug: ^ anchor not working correctly in MatchString #10 Dual SlotTableREADME.md— 17 strategies diagram, architecture linksTest plan
golangci-lint run— 0 issuesgofmt -l .— clean