M7 follow-up: Port ArrayShuffle for BHC C++ parity (#116) by kalwalt · Pull Request #117 · webarkit/WebARKitLib-rs

kalwalt · 2026-05-08T12:00:25Z

Summary

Closes the BHC RNG divergence noted in PR #114 / PR #115 by porting vision::FastRandom and vision::ArrayShuffle from math/rand.h to Rust with byte-identical semantics. With this change:

The Rust BHC index produces a tree topology byte-identical to the C++ baseline given the same seed.
The match_indexed dual-mode test now asserts sorted pair equality with C++ (matching the brute-force and guided variants).
A new CI step runs dual-mode tests to enforce the parity guarantee on every push.

Changes

Algorithm port (commit `6146ab8`)

clustering.rs:

fast_random(seed: &mut i32) -> i32 — exact port of vision::FastRandom. Uses i32 + wrapping arithmetic so the bit pattern matches C++ int.
array_shuffle<T>(v, sample_size, seed) — exact port of vision::ArrayShuffle. Note: NOT Fisher–Yates; k is drawn from [0, pop_size) per C++.
KMedoids::assign() reworked to:
- Initialize rand_indices once before the hypothesis loop (matches C++ kmedoids.h:163)
- Use array_shuffle with persistent rand_seed (mutated across calls and recursive BHC builds)
- rand_seed type changed from u64 to i32 to match C++ int
BHC query_recursive now visits all children with the minimum Hamming distance, matching C++ Node::nearest (""Any nodes that are the SAME distance as the minimum node are added"").

kpm_c_api.cpp/.h: Add webarkit_cpp_fast_random and webarkit_cpp_array_shuffle FFI shims.

matcher.rs: Strengthen dual_mode_match_indexed_within_tolerance from count-only to count + sorted pair equality.

CI integration (commit `f15f30a`)

.github/workflows/ci.yml: Add cargo test -p webarkitlib-rs --features dual-mode step to the existing kpm-build job (runs on ubuntu, macOS, windows).

Tests

New unit tests (always run in CI)

test_fast_random_first_call — frozen expected values (seed 1234 → 4068)
test_fast_random_deterministic
test_fast_random_range
test_fast_random_negative_seed
test_array_shuffle_preserves_elements
test_array_shuffle_deterministic
test_array_shuffle_empty_pop
test_array_shuffle_zero_sample
test_array_shuffle_seed_evolves

New dual-mode tests (now run in CI via the new job)

dual_mode_fast_random_byte_identical — 1000 calls × 9 starting seeds, value AND seed state match C++ at every step
dual_mode_array_shuffle_byte_identical — 8 (pop, sample, seed) cases, permutation AND seed match C++
dual_mode_array_shuffle_persistent_seed — 10 sequential shuffles, state evolution matches C++ at every round

Strengthened existing test

dual_mode_match_indexed_within_tolerance — was count-only; now asserts sorted pair equality with C++

Verification

✅ cargo test -p webarkitlib-rs --lib → 274 passed
✅ cargo test --features dual-mode -p webarkitlib-rs --lib → 292 passed
✅ cargo clippy --workspace -- -D warnings → clean
✅ cargo fmt --all -- --check → clean

Why this matters

Before this PR: match_indexed dual-mode could only assert non-negative match counts because BHC RNG diverged. A bug in the indexed matcher would have been invisible to the dual-mode validation gate.

After this PR: the indexed matcher is held to the same correctness standard as brute-force and guided — sorted pair equality with the C++ baseline. The CI dual-mode job ensures regressions are caught automatically, not just when someone happens to run --features dual-mode locally.

Fixes feat(kpm): port ArrayShuffle for BHC parity in dual-mode tests #116
Closes the divergence noted in PR M7-2: Implement K-Medoids clustering and Binary Hierarchical Clustering for FREAK descriptors #114 (M7-2) and PR M7-3: Port FeatureStore and FeatureMatcher with three match variants #115 (M7-3)
Parent feat(kpm): M7 — Hough voting & feature matching in pure Rust #108 (M7 milestone)

Base Branch

Target: feat/milestone7-hough-voting-feature-match (M7 milestone branch)

Replaces the LCG-based shuffle in BinaryHierarchicalClustering with an exact port of vision::FastRandom and vision::ArrayShuffle from math/rand.h. With this change, the Rust BHC index produces a tree topology byte-identical to the C++ baseline given the same seed. This unlocks pair-equality dual-mode validation for match_indexed, which previously could only assert non-negative match counts because of RNG divergence (documented as a follow-up in PR #114 / PR #115). Changes: clustering.rs: - Add fast_random(seed) — 15-bit LCG matching vision::FastRandom exactly. Uses i32 with wrapping arithmetic to match C++ int. - Add array_shuffle(v, sample_size, seed) — matches vision::ArrayShuffle. Note: NOT Fisher-Yates; k is drawn from [0, pop_size) per C++. - Replace SimpleRng (LCG) usage in KMedoids::assign with array_shuffle. - Initialize rand_indices ONCE before the hypothesis loop (matches C++ kmedoids.h:163 SequentialVector + per-hypothesis ArrayShuffle). - Change KMedoids::rand_seed type from u64 to i32 to match C++ int and enable byte-identical PRNG sequence. - Add KMedoids::rand_seed() getter for state-evolution tests. - Fix BHC query: query_recursive now visits ALL children with the minimum Hamming distance, matching C++ Node::nearest behavior in binary_hierarchical_clustering.h ("Any nodes that are the SAME distance as the minimum node are added to the output vector"). kpm_c_api.cpp/.h: - Add webarkit_cpp_fast_random — wraps vision::FastRandom - Add webarkit_cpp_array_shuffle — wraps vision::ArrayShuffle - Include math/rand.h matcher.rs: - Strengthen dual_mode_match_indexed_within_tolerance: now asserts count agreement (±2) AND sorted pair equality with C++, matching the brute-force and guided variants. Tests added (clustering.rs): - 8 unit tests for fast_random and array_shuffle (frozen expected values + property-based checks) - 3 dual-mode tests (gated on `dual-mode` feature) verifying byte-identical PRNG output and persistent seed evolution vs C++ All tests pass: - cargo test -p webarkitlib-rs --lib -> 274 passed - cargo test --features dual-mode -p webarkitlib-rs -> 292 passed - cargo clippy --workspace -- -D warnings -> clean - cargo fmt --all -- --check -> clean Note: dual-mode tests are local-only (CI does not run --features dual-mode currently). Adding a dual-mode CI job is a follow-up. Fixes #116

Adds a test step to the existing kpm-build job that runs: cargo test -p webarkitlib-rs --features dual-mode This step runs after the FFI backend build (the dependency C++ shims are already compiled by that point) and exercises the pure-Rust ports against the C++ baseline via the webarkit_cpp_* FFI bridges: - homography exp / robust homography (M6-3, #65) - math utils: fast_atan2, fast_sqrt1, fast_exp6 (M6-1, #63) - linear solvers (M6-2, #64) - feature matcher: brute, indexed, guided (M7-3, #111) - PRNG: fast_random, array_shuffle (M7, #116) Without this CI step, dual-mode regressions could slip in unnoticed because the pure-Rust test job (build-and-test) does not enable the dual-mode feature. The kpm-build job already runs on all three platforms (ubuntu, macOS, windows) so we get cross-platform parity coverage at no extra setup. Refs #116

…test failures The previous step ran all tests in the package including integration tests under tests/, which have pre-existing cross-platform failures (test_full_pipeline_pose on Windows, linker errors on macOS) that are unrelated to dual-mode validation. The dual-mode tests live inside the library (crates/core/src/...) under #[cfg(feature = "dual-mode")] modules, so --lib is the correct scope for this CI step. Refs #116

…stdc++ on Linux) The build script unconditionally linked stdc++ on all non-MSVC targets, which fails on macOS where the system C++ stdlib is libc++. The error was hidden until now because the existing kpm-build CI step only ran 'cargo build' (compiles rlib without linking a binary). The new dual-mode test step links a test binary and exposed the bug: ld: library 'stdc++' not found Fix: select the C++ stdlib based on target triple: - *-apple-* targets link libc++ - Other non-MSVC targets link libstdc++ (Linux convention) - MSVC handles its own C++ runtime Refs #116

…tests The fixed 1e-5 absolute tolerance in solve_linear_system_2x2_matches_cpp and solve_symmetric_linear_system_3x3_matches_cpp is too tight for f32 values with magnitude > ~10. Apple Silicon ARM64 uses FMA differently from x86_64, producing tiny precision differences that exceed 1e-5 for larger results. Example failure on macOS ARM64: rust=-40.826614, cpp=-40.8266, diff=1.5e-5 The diff of 1.5e-5 represents a relative error of ~3.7e-7, which is well within f32 precision but exceeds the absolute tolerance. Fix: use combined tolerance: tol = max(1e-5, |value| * 1e-6) This gives: - |x| <= 10: tolerance unchanged at 1e-5 (preserves existing strictness) - |x| = 100: tolerance 1e-4 (~1e-6 relative) - |x| = 1000: tolerance 1e-3 (~1e-6 relative) These bounds are consistent with f32's ~7-digit precision and accommodate platform-specific FMA rounding without losing test sensitivity for typical values. Surfaced now because PR #117's new dual-mode CI step exposed this pre-existing issue on macOS runners. The math itself is correct; only the test tolerance was too strict. Refs #116

The previous 1e-6 relative scaling was still too tight for near-singular 2x2 systems on Apple Silicon ARM64. A failing case showed: rust=-358.62683, cpp=-358.62802, diff=1.19e-3, tol=3.59e-4 The relative error of ~3.3e-6 exceeded the 1e-6 scale factor. Random 2x2 systems occasionally hit ill-conditioned configurations where solutions have large magnitude and are very sensitive to FMA rounding order. f32 has ~7 decimal digits of precision, so even well-conditioned results can diverge by ~1e-5 relative across platforms; ill-conditioned ones diverge more. Bump relative scale from 1e-5 (per |x|) to ensure cross-platform stability without losing test sensitivity for typical values: tol = max(1e-5, |x| * 1e-5) This is consistent with f32 precision floor and accommodates the worst observed cross-platform divergence with a safety margin. Refs #116

kalwalt · 2026-05-08T16:49:01Z

Final state — all 16 CI checks green ✅

Beyond the planned ArrayShuffle port, this PR's new dual-mode CI gate exposed three dormant pre-existing bugs that were lying in wait. Documenting them here for the record:

1. `<limits>` include order in `kpm_c_api.cpp`

Symptom: error: 'numeric_limits' is not a member of 'std' on Linux GCC during build.

Root cause: The matcher headers I added (feature_matcher-inline.h etc.) transitively pull in hamming.h, which uses std::numeric_limits without including <limits> itself. MSVC pulled it in transitively; GCC did not. My initial fix added #include <limits> after the matcher headers — too late, because include guards meant hamming.h had already been parsed and failed.

Fix: Moved all standard library headers to the top of kpm_c_api.cpp, before project headers.

2. Wrong C++ stdlib on macOS in `build.rs`

Symptom: ld: library 'stdc++' not found on macOS when linking test binaries.

Root cause: build.rs unconditionally emitted cargo:rustc-link-lib=stdc++ for all non-MSVC targets. macOS uses LLVM's libc++, not GNU's libstdc++. The pre-existing kpm-build job didn't catch this because cargo build for a library produces an .rlib (no linking step). The new cargo test --features dual-mode --lib step links a test binary and exposed the long-standing bug.

Fix: Branch on target triple — *-apple-* links c++, others link stdc++.

3. M6-2 dual-mode test tolerance too tight for cross-platform FMA

Symptom: solve_linear_system_2x2_matches_cpp failed on Apple Silicon ARM64:

rust=-358.62683, cpp=-358.62802, diff=1.19e-3

Root cause: The fixed 1e-5 absolute tolerance was sub-f32-precision for results with magnitude > ~10. ARM64 (Apple Silicon) uses FMA differently from x86_64 (Linux runners), producing rounding differences that are real (~3.3e-6 relative) but exceed an absolute 1e-5 for larger values. Random near-singular 2×2 systems exacerbate this — small determinants produce large solutions that are very sensitive to FMA order.

Fix: Combined absolute + relative tolerance:

let tol = 1e-5_f32.max(x_rust[i].abs() * 1e-5);

Applied to both solve_linear_system_2x2_matches_cpp and solve_symmetric_linear_system_3x3_matches_cpp.

Why this matters for the project going forward

All three issues were dormant — they would have eventually bitten somebody trying to run integration tests with FFI on macOS, or building dual-mode validation on a fresh GCC. The new dual-mode CI gate (commit f15f30a) now exercises the full FFI test path on ubuntu/macOS/windows on every push, so:

Future regressions in cross-platform C++ integration → caught in CI, not after a release
Future ports that affect FMA-sensitive math → tolerance failures surface immediately, not after someone happens to run dual-mode locally
The match_indexed parity guarantee is now enforced, not just observable

This is exactly the kind of payoff a strong validation gate provides — you find latent issues at integration time, not at use time.

Refs #116

…tests The fixed 1e-5 absolute tolerance in solve_linear_system_2x2_matches_cpp and solve_symmetric_linear_system_3x3_matches_cpp is too tight for f32 values with magnitude > ~10. Apple Silicon ARM64 uses FMA differently from x86_64, producing tiny precision differences that exceed 1e-5 for larger results. Example failure on macOS ARM64: rust=-40.826614, cpp=-40.8266, diff=1.5e-5 The diff of 1.5e-5 represents a relative error of ~3.7e-7, which is well within f32 precision but exceeds the absolute tolerance. Fix: use combined tolerance: tol = max(1e-5, |value| * 1e-6) This gives: - |x| <= 10: tolerance unchanged at 1e-5 (preserves existing strictness) - |x| = 100: tolerance 1e-4 (~1e-6 relative) - |x| = 1000: tolerance 1e-3 (~1e-6 relative) These bounds are consistent with f32's ~7-digit precision and accommodate platform-specific FMA rounding without losing test sensitivity for typical values. Surfaced now because PR #117's new dual-mode CI step exposed this pre-existing issue on macOS runners. The math itself is correct; only the test tolerance was too strict. Refs #116

kalwalt added 2 commits May 8, 2026 13:58

kalwalt self-assigned this May 8, 2026

kalwalt added enhancement New feature or request rust code kpm architecture FREAK labels May 8, 2026

kalwalt added this to Plan to port KPM to rust May 8, 2026

github-project-automation Bot moved this to Backlog in Plan to port KPM to rust May 8, 2026

kalwalt added this to the Milestone 7 — Hough voting & feature matching milestone May 8, 2026

kalwalt moved this from Backlog to In progress in Plan to port KPM to rust May 8, 2026

kalwalt added 4 commits May 8, 2026 17:36

kalwalt moved this from In progress to In review in Plan to port KPM to rust May 8, 2026

This was referenced May 9, 2026

test(kpm): test_full_pipeline_pose fails on Windows with pose component out of tolerance #118

Open

test(kpm): verify macOS integration tests + examples build with ffi-backend after #117 #119

Open

kalwalt merged commit 5bb2895 into feat/milestone7-hough-voting-feature-match May 9, 2026
16 checks passed

github-project-automation Bot moved this from In review to Done in Plan to port KPM to rust May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

M7 follow-up: Port ArrayShuffle for BHC C++ parity (#116)#117

M7 follow-up: Port ArrayShuffle for BHC C++ parity (#116)#117
kalwalt merged 6 commits intofeat/milestone7-hough-voting-feature-matchfrom
feat/milestone7-array-shuffle

kalwalt commented May 8, 2026

Uh oh!

kalwalt commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kalwalt commented May 8, 2026

Summary

Changes

Algorithm port (commit 6146ab8)

CI integration (commit f15f30a)

Tests

New unit tests (always run in CI)

New dual-mode tests (now run in CI via the new job)

Strengthened existing test

Verification

Why this matters

Related

Base Branch

Uh oh!

kalwalt commented May 8, 2026

Final state — all 16 CI checks green ✅

1. <limits> include order in kpm_c_api.cpp

2. Wrong C++ stdlib on macOS in build.rs

3. M6-2 dual-mode test tolerance too tight for cross-platform FMA

Why this matters for the project going forward

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Algorithm port (commit `6146ab8`)

CI integration (commit `f15f30a`)

1. `<limits>` include order in `kpm_c_api.cpp`

2. Wrong C++ stdlib on macOS in `build.rs`