High-performance character detection using SIMD instructions. Finds ASCII and multi-byte UTF-8 characters in byte sequences at ~2 GB/s per core. A constraint solver (Z3) finds optimal shuffle mask configurations at build time; the runtime engine executes detection at full vector width.
Use cases: tokenizers, parsers, CSV/JSON/TOML pre-processing, or any workload involving fast character classification on large byte buffers.
The specification (73 requirements) defines the observable contract. Two conforming implementations:
| Java | Rust | |
|---|---|---|
| Location | java/ |
rust/ |
| Runtime | JDK 25 Vector API | std::arch intrinsics |
| Backends | AVX-512, AVX2, NEON | AVX-512, AVX2, NEON |
| Maturity | Production | In progress |
- How It Works
- Quick Start - Java | Rust
- Installation - Java | Rust
- Benchmark: Character Detection
- Benchmark: CSV Parser
- VPA Chunk Filter Framework
- CSV Parser
- Solver Limits
- Building
-
Z3 nibble-matrix solver. Each byte splits into low and high nibble (4 bits each), giving a 16x16 lookup grid. Z3 finds two 16-entry shuffle vectors whose AND produces a unique literal byte for every target character and zero for everything else. A single group solves ~12 ASCII literals; auto-split doubles that to ~20-24.
-
UTF-8 multi-byte detection via per-round shuffle mask solving across continuation bytes. 2-byte, 3-byte, and 4-byte codepoints are detected by gating lead byte classification with per-round literal matching.
-
Vector range operations for contiguous byte ranges (e.g.
0-9,<=>) without consuming nibble-matrix capacity -- each range costs 1 literal ID. -
Bytecode-compiled engine (Java) / monomorphized generics (Rust) for zero-overhead inlined SIMD.
-
VPA chunk filters for parsing beyond regular languages. Vectorized Hillis-Steele parallel prefix operations (
prefixXorfor quote toggle,prefixSumfor nesting depth) enable SIMD-accelerated CSV, JSON, and TOML parsing.
var result = Utf8EngineBuilder.builder()
.codepoints("whitespace", '\r', '\n', '\t', ' ')
.codepoints("punct", ':', ';', '{', '}')
.range("digits", (byte) 0x30, (byte) 0x39)
.build();
var engine = result.engine();
var literals = result.literals();
var storage = new MatchStorage(data.length / 4, 32);
var match = engine.find(mappedFile, storage);
for (int i = 0; i < match.size(); i++) {
byte lit = match.getLiteralAt(storage, i);
long pos = match.getPositionAt(storage, i);
}let result = findchars::EngineBuilder::new()
.codepoints("whitespace", &[b'\r', b'\n', b'\t', b' '])
.codepoints("punct", &[b':', b';', b'{', b'}'])
.range("digits", b'0', b'9')
.build()?;
let mut storage = MatchStorage::new(data.len() / 4);
let view = result.engine.find(data, &mut storage);
for i in 0..view.len() {
let lit = view.literal(i);
let pos = view.position(i);
}Requires JDK 25. Bytecode is compiled with --enable-preview and depends on jdk.incubator.vector.
Maven:
<dependency>
<groupId>org.knownhosts</groupId>
<artifactId>libfindchars-compiler</artifactId>
<version>0.4.0-jdk25-preview</version>
</dependency>Gradle:
implementation("org.knownhosts:libfindchars-compiler:0.4.0-jdk25-preview")Runtime JVM arguments:
--enable-preview --add-modules=jdk.incubator.vector
Requires Rust 1.94+. The solver crate depends on Z3 (built from source on first compile, ~5 min, cached thereafter).
# Cargo.toml
[dependencies]
findchars = "0.1"
# For CSV parsing
findchars-csv = "0.1"
# For build.rs codegen (build-dependency only, not shipped)
[build-dependencies]
findchars-solver = "0.1"SIMD backend is auto-detected at engine construction time (AVX-512 > AVX2 > NEON > scalar).
Environment: AVX-512, single core, 10 MB data. Both implementations use the same SIMD algorithm.
| Java (JMH) | Rust (Criterion) | |
|---|---|---|
| ASCII scan (4-8 targets) | ~2.0 GB/s | ~2.0 GB/s |
| vs regex | 7-16x faster | 3-9x faster |
| Mixed UTF-8 (2/3/4-byte) | ~1.5 GB/s | ~1.5 GB/s |
Parameter sweep: ASCII literal count, match density, multi-byte codepoint count, group count.
Cost model (hardware counters)
Hardware counters (-prof perfnorm) show cost is driven by match density and shuffle rounds. ASCII count and group count have minimal effect.
| Engine | a (base) | b (density) | c (rounds) | R^2 |
|---|---|---|---|---|
| SIMD Compiled | 1.70 | 5.68 | 1.75 | 0.90 |
| Regex | 7.42 | 127.84 | 38.46 | 0.87 |
The compiled engine's density cost is 22x smaller than regex (5.7 vs 128 insn/byte).
Parameter sweep: ASCII target count (2-12) and match density (5-50%).
ASCII target count (15% density, 10 MB):
Match density (8 targets, 10 MB):
| Targets | findchars | regex crate | Speedup |
|---|---|---|---|
| 2 | 1.65 GiB/s | 467 MiB/s | 3.6x |
| 4 | 1.68 GiB/s | 253 MiB/s | 6.8x |
| 8 | 1.64 GiB/s | 251 MiB/s | 6.7x |
| 12 | 1.47 GiB/s | 168 MiB/s | 9.0x |
Throughput stays flat regardless of target count (sublinear scaling). Regex degrades linearly.
# Java sweep
scripts/run-sweep.sh # Full (~6 min)
scripts/run-sweep.sh --quick # Smoke test (~1 min)
scripts/run-sweep.sh --perfnorm # With hardware counters
# Rust sweep
scripts/run-sweep-rust.sh # Full
scripts/run-sweep-rust.sh --quick # Smoke testEnvironment: AVX-512, single core, 10 MB generated CSV data. Both implementations use two-phase architecture: SIMD scan + quote filter + linear match walk.
| Backend | Java (JMH) | Rust (Criterion) |
|---|---|---|
| AVX-512 | ~1.3 GB/s | 1.24 GiB/s |
| AVX2 | -- | 1.23 GiB/s |
| Scalar | -- | 157 MiB/s |
Baseline: 10 columns, 5% quotes, avg field length 16. Compared against FastCSV.
SIMD Parse reaches ~0.86-1.3 GB/s, 2.5x faster than FastCSV (~0.35 GB/s).
Backend comparison (10 cols, 5% quotes, field_len 16):
Column count sweep:
Quote percentage sweep:
Field length sweep:
| Field Length | Throughput |
|---|---|
| 4 bytes | 479 MiB/s |
| 16 bytes | 1.24 GiB/s |
| 32 bytes | 1.68 GiB/s |
| 64 bytes | 1.92 GiB/s |
The Rust CSV parser uses inline AVX-512 Hillis-Steele prefix XOR for the quote filter -- no scalar callback, no store/reload. The same vectorized prefix XOR is implemented for AVX2 (cross-lane via permute2x128 + alignr) and NEON (vextq_u8).
# Java CSV sweep
scripts/run-csv-sweep.sh
scripts/run-csv-sweep.sh --quick
# Rust CSV sweep
scripts/run-csv-sweep-rust.sh
scripts/run-csv-sweep-rust.sh --quickThe base engine recognizes regular languages. Real-world formats need more: CSV requires knowing whether a comma is inside quotes, JSON requires tracking brace depth.
Regular (Type 3) -- finite automaton, no stack
^ libfindchars base engine: character detection, range ops
Visibly Pushdown (VPL) -- stack ops determined by input symbol
^ VPA chunk filters: prefixXor (toggle), prefixSum (depth)
Context-Free (Type 2) -- full PDA, stack ops depend on state
A Visibly Pushdown Automaton partitions the alphabet into call, return, and internal symbols. Stack action is determined by the input alone, making prefix computation vectorizable.
| Primitive | Stack model | Operation | Use cases |
|---|---|---|---|
prefixXor |
Binary toggle | result[i] = v[0] ^ ... ^ v[i] |
CSV quotes, JSON strings |
prefixSum |
Bounded depth | result[i] = v[0] + ... + v[i] |
JSON {}[] nesting, XML depth |
Both use the Hillis-Steele parallel prefix pattern in O(log2 V) vector steps. On AVX-512 that's 6 steps; AVX2 uses 5 steps with cross-lane shifts.
CSV -- Stack-1 only. Detect , " \n \r. prefixXor on quote positions toggles inside/outside. Zero structural chars inside quotes.
JSON -- Stack-1 + Stack-n. prefixXor on " masks inside strings. prefixSum on {[ (+1) and }] (-1) computes nesting depth.
var parser = CsvParser.builder()
.delimiter(',').quote('"').hasHeader(true)
.build();
try (var arena = Arena.ofConfined()) {
var result = parser.parse(Path.of("data.csv"), arena);
for (int i = 0; i < result.rowCount(); i++) {
System.out.println(result.row(i).get(0));
}
}Note:
libfindchars-csvis not yet published to Maven Central. Build from source:cd java && ./mvnw install -pl libfindchars-csv -am.
let parser = findchars_csv::CsvParser::builder()
.delimiter(b',').quote(b'"').has_header(true)
.build()?;
let mut storage = findchars::MatchStorage::new(data.len() / 4);
let result = parser.parse(data, &mut storage)?;
for i in 0..result.row_count() {
let row = result.row(i);
println!("{}", row.get(0, data));
}Both parsers support configurable delimiters (\t, |, ;) and quote characters (', ").
Nibble matrix (per group): A single shuffle group solves ~12 ASCII literals. Auto-split doubles to ~20-24 by solving two halves.
Literal namespace (per engine): Every literal gets a unique byte in [1, vectorByteSize-1]. AVX-512 (64B): 63 max. AVX2 (32B): 31 max. NEON (16B): 15 max.
Ranges cost 1 literal each and don't consume nibble-matrix capacity.
Typical engine: 20+ ASCII groups, several multi-byte codepoints, multiple ranges.
Requires JDK 25 with --enable-preview and --add-modules=jdk.incubator.vector.
cd java && ./mvnw clean installRequires Rust 1.94+. First build compiles Z3 from source (~5 min, cached).
cd rust && cargo build
cd rust && cargo test -p findchars -p findchars-csvThe spec/ directory contains the language-agnostic specification: 73 requirements across 7 documents (ENGINE, SOLVE, UTF8, VPA, COMP, CSV, PERF). Both implementations are tested against these requirements.




