Skip to content

M7 follow-up: Port ArrayShuffle for BHC C++ parity (#116)#117

Merged
kalwalt merged 6 commits intofeat/milestone7-hough-voting-feature-matchfrom
feat/milestone7-array-shuffle
May 9, 2026
Merged

M7 follow-up: Port ArrayShuffle for BHC C++ parity (#116)#117
kalwalt merged 6 commits intofeat/milestone7-hough-voting-feature-matchfrom
feat/milestone7-array-shuffle

Conversation

@kalwalt
Copy link
Copy Markdown
Member

@kalwalt kalwalt commented May 8, 2026

Summary

Closes the BHC RNG divergence noted in PR #114 / PR #115 by porting vision::FastRandom and vision::ArrayShuffle from math/rand.h to Rust with byte-identical semantics. With this change:

  • The Rust BHC index produces a tree topology byte-identical to the C++ baseline given the same seed.
  • The match_indexed dual-mode test now asserts sorted pair equality with C++ (matching the brute-force and guided variants).
  • A new CI step runs dual-mode tests to enforce the parity guarantee on every push.

Changes

Algorithm port (commit 6146ab8)

clustering.rs:

  • fast_random(seed: &mut i32) -> i32 — exact port of vision::FastRandom. Uses i32 + wrapping arithmetic so the bit pattern matches C++ int.
  • array_shuffle<T>(v, sample_size, seed) — exact port of vision::ArrayShuffle. Note: NOT Fisher–Yates; k is drawn from [0, pop_size) per C++.
  • KMedoids::assign() reworked to:
    • Initialize rand_indices once before the hypothesis loop (matches C++ kmedoids.h:163)
    • Use array_shuffle with persistent rand_seed (mutated across calls and recursive BHC builds)
    • rand_seed type changed from u64 to i32 to match C++ int
  • BHC query_recursive now visits all children with the minimum Hamming distance, matching C++ Node::nearest (""Any nodes that are the SAME distance as the minimum node are added"").

kpm_c_api.cpp/.h: Add webarkit_cpp_fast_random and webarkit_cpp_array_shuffle FFI shims.

matcher.rs: Strengthen dual_mode_match_indexed_within_tolerance from count-only to count + sorted pair equality.

CI integration (commit f15f30a)

.github/workflows/ci.yml: Add cargo test -p webarkitlib-rs --features dual-mode step to the existing kpm-build job (runs on ubuntu, macOS, windows).

Tests

New unit tests (always run in CI)

  • test_fast_random_first_call — frozen expected values (seed 1234 → 4068)
  • test_fast_random_deterministic
  • test_fast_random_range
  • test_fast_random_negative_seed
  • test_array_shuffle_preserves_elements
  • test_array_shuffle_deterministic
  • test_array_shuffle_empty_pop
  • test_array_shuffle_zero_sample
  • test_array_shuffle_seed_evolves

New dual-mode tests (now run in CI via the new job)

  • dual_mode_fast_random_byte_identical — 1000 calls × 9 starting seeds, value AND seed state match C++ at every step
  • dual_mode_array_shuffle_byte_identical — 8 (pop, sample, seed) cases, permutation AND seed match C++
  • dual_mode_array_shuffle_persistent_seed — 10 sequential shuffles, state evolution matches C++ at every round

Strengthened existing test

  • dual_mode_match_indexed_within_tolerance — was count-only; now asserts sorted pair equality with C++

Verification

  • cargo test -p webarkitlib-rs --lib → 274 passed
  • cargo test --features dual-mode -p webarkitlib-rs --lib → 292 passed
  • cargo clippy --workspace -- -D warnings → clean
  • cargo fmt --all -- --check → clean

Why this matters

Before this PR: match_indexed dual-mode could only assert non-negative match counts because BHC RNG diverged. A bug in the indexed matcher would have been invisible to the dual-mode validation gate.

After this PR: the indexed matcher is held to the same correctness standard as brute-force and guided — sorted pair equality with the C++ baseline. The CI dual-mode job ensures regressions are caught automatically, not just when someone happens to run --features dual-mode locally.

Related

Base Branch

Target: feat/milestone7-hough-voting-feature-match (M7 milestone branch)

kalwalt added 2 commits May 8, 2026 13:58
Replaces the LCG-based shuffle in BinaryHierarchicalClustering with an
exact port of vision::FastRandom and vision::ArrayShuffle from
math/rand.h. With this change, the Rust BHC index produces a tree
topology byte-identical to the C++ baseline given the same seed.

This unlocks pair-equality dual-mode validation for match_indexed,
which previously could only assert non-negative match counts because
of RNG divergence (documented as a follow-up in PR #114 / PR #115).

Changes:

clustering.rs:
- Add fast_random(seed) — 15-bit LCG matching vision::FastRandom
  exactly. Uses i32 with wrapping arithmetic to match C++ int.
- Add array_shuffle(v, sample_size, seed) — matches vision::ArrayShuffle.
  Note: NOT Fisher-Yates; k is drawn from [0, pop_size) per C++.
- Replace SimpleRng (LCG) usage in KMedoids::assign with array_shuffle.
- Initialize rand_indices ONCE before the hypothesis loop (matches C++
  kmedoids.h:163 SequentialVector + per-hypothesis ArrayShuffle).
- Change KMedoids::rand_seed type from u64 to i32 to match C++ int and
  enable byte-identical PRNG sequence.
- Add KMedoids::rand_seed() getter for state-evolution tests.
- Fix BHC query: query_recursive now visits ALL children with the
  minimum Hamming distance, matching C++ Node::nearest behavior in
  binary_hierarchical_clustering.h ("Any nodes that are the SAME
  distance as the minimum node are added to the output vector").

kpm_c_api.cpp/.h:
- Add webarkit_cpp_fast_random — wraps vision::FastRandom
- Add webarkit_cpp_array_shuffle — wraps vision::ArrayShuffle
- Include math/rand.h

matcher.rs:
- Strengthen dual_mode_match_indexed_within_tolerance: now asserts
  count agreement (±2) AND sorted pair equality with C++, matching
  the brute-force and guided variants.

Tests added (clustering.rs):
- 8 unit tests for fast_random and array_shuffle (frozen expected
  values + property-based checks)
- 3 dual-mode tests (gated on `dual-mode` feature) verifying
  byte-identical PRNG output and persistent seed evolution vs C++

All tests pass:
- cargo test -p webarkitlib-rs --lib                  -> 274 passed
- cargo test --features dual-mode -p webarkitlib-rs   -> 292 passed
- cargo clippy --workspace -- -D warnings             -> clean
- cargo fmt --all -- --check                          -> clean

Note: dual-mode tests are local-only (CI does not run --features
dual-mode currently). Adding a dual-mode CI job is a follow-up.

Fixes #116
Adds a test step to the existing kpm-build job that runs:
  cargo test -p webarkitlib-rs --features dual-mode

This step runs after the FFI backend build (the dependency C++ shims
are already compiled by that point) and exercises the pure-Rust ports
against the C++ baseline via the webarkit_cpp_* FFI bridges:

  - homography exp / robust homography (M6-3, #65)
  - math utils: fast_atan2, fast_sqrt1, fast_exp6 (M6-1, #63)
  - linear solvers (M6-2, #64)
  - feature matcher: brute, indexed, guided (M7-3, #111)
  - PRNG: fast_random, array_shuffle (M7, #116)

Without this CI step, dual-mode regressions could slip in unnoticed
because the pure-Rust test job (build-and-test) does not enable the
dual-mode feature.

The kpm-build job already runs on all three platforms (ubuntu, macOS,
windows) so we get cross-platform parity coverage at no extra setup.

Refs #116
kalwalt added 4 commits May 8, 2026 17:36
…test failures

The previous step ran all tests in the package including integration
tests under tests/, which have pre-existing cross-platform failures
(test_full_pipeline_pose on Windows, linker errors on macOS) that
are unrelated to dual-mode validation.

The dual-mode tests live inside the library (crates/core/src/...)
under #[cfg(feature = "dual-mode")] modules, so --lib is the
correct scope for this CI step.

Refs #116
…stdc++ on Linux)

The build script unconditionally linked stdc++ on all non-MSVC targets,
which fails on macOS where the system C++ stdlib is libc++. The error
was hidden until now because the existing kpm-build CI step only ran
'cargo build' (compiles rlib without linking a binary). The new
dual-mode test step links a test binary and exposed the bug:

    ld: library 'stdc++' not found

Fix: select the C++ stdlib based on target triple:
- *-apple-* targets link libc++
- Other non-MSVC targets link libstdc++ (Linux convention)
- MSVC handles its own C++ runtime

Refs #116
…tests

The fixed 1e-5 absolute tolerance in solve_linear_system_2x2_matches_cpp
and solve_symmetric_linear_system_3x3_matches_cpp is too tight for f32
values with magnitude > ~10. Apple Silicon ARM64 uses FMA differently
from x86_64, producing tiny precision differences that exceed 1e-5 for
larger results.

Example failure on macOS ARM64:
    rust=-40.826614, cpp=-40.8266, diff=1.5e-5

The diff of 1.5e-5 represents a relative error of ~3.7e-7, which is
well within f32 precision but exceeds the absolute tolerance.

Fix: use combined tolerance:
    tol = max(1e-5, |value| * 1e-6)

This gives:
- |x| <= 10: tolerance unchanged at 1e-5 (preserves existing strictness)
- |x|  = 100: tolerance 1e-4 (~1e-6 relative)
- |x|  = 1000: tolerance 1e-3 (~1e-6 relative)

These bounds are consistent with f32's ~7-digit precision and
accommodate platform-specific FMA rounding without losing test
sensitivity for typical values.

Surfaced now because PR #117's new dual-mode CI step exposed this
pre-existing issue on macOS runners. The math itself is correct;
only the test tolerance was too strict.

Refs #116
The previous 1e-6 relative scaling was still too tight for near-singular
2x2 systems on Apple Silicon ARM64. A failing case showed:

    rust=-358.62683, cpp=-358.62802, diff=1.19e-3, tol=3.59e-4

The relative error of ~3.3e-6 exceeded the 1e-6 scale factor.

Random 2x2 systems occasionally hit ill-conditioned configurations where
solutions have large magnitude and are very sensitive to FMA rounding
order. f32 has ~7 decimal digits of precision, so even well-conditioned
results can diverge by ~1e-5 relative across platforms; ill-conditioned
ones diverge more.

Bump relative scale from 1e-5 (per |x|) to ensure cross-platform
stability without losing test sensitivity for typical values:

    tol = max(1e-5, |x| * 1e-5)

This is consistent with f32 precision floor and accommodates the worst
observed cross-platform divergence with a safety margin.

Refs #116
@kalwalt
Copy link
Copy Markdown
Member Author

kalwalt commented May 8, 2026

Final state — all 16 CI checks green ✅

Beyond the planned ArrayShuffle port, this PR's new dual-mode CI gate exposed three dormant pre-existing bugs that were lying in wait. Documenting them here for the record:

1. <limits> include order in kpm_c_api.cpp

Symptom: error: 'numeric_limits' is not a member of 'std' on Linux GCC during build.

Root cause: The matcher headers I added (feature_matcher-inline.h etc.) transitively pull in hamming.h, which uses std::numeric_limits without including <limits> itself. MSVC pulled it in transitively; GCC did not. My initial fix added #include <limits> after the matcher headers — too late, because include guards meant hamming.h had already been parsed and failed.

Fix: Moved all standard library headers to the top of kpm_c_api.cpp, before project headers.

2. Wrong C++ stdlib on macOS in build.rs

Symptom: ld: library 'stdc++' not found on macOS when linking test binaries.

Root cause: build.rs unconditionally emitted cargo:rustc-link-lib=stdc++ for all non-MSVC targets. macOS uses LLVM's libc++, not GNU's libstdc++. The pre-existing kpm-build job didn't catch this because cargo build for a library produces an .rlib (no linking step). The new cargo test --features dual-mode --lib step links a test binary and exposed the long-standing bug.

Fix: Branch on target triple — *-apple-* links c++, others link stdc++.

3. M6-2 dual-mode test tolerance too tight for cross-platform FMA

Symptom: solve_linear_system_2x2_matches_cpp failed on Apple Silicon ARM64:

rust=-358.62683, cpp=-358.62802, diff=1.19e-3

Root cause: The fixed 1e-5 absolute tolerance was sub-f32-precision for results with magnitude > ~10. ARM64 (Apple Silicon) uses FMA differently from x86_64 (Linux runners), producing rounding differences that are real (~3.3e-6 relative) but exceed an absolute 1e-5 for larger values. Random near-singular 2×2 systems exacerbate this — small determinants produce large solutions that are very sensitive to FMA order.

Fix: Combined absolute + relative tolerance:

let tol = 1e-5_f32.max(x_rust[i].abs() * 1e-5);

Applied to both solve_linear_system_2x2_matches_cpp and solve_symmetric_linear_system_3x3_matches_cpp.


Why this matters for the project going forward

All three issues were dormant — they would have eventually bitten somebody trying to run integration tests with FFI on macOS, or building dual-mode validation on a fresh GCC. The new dual-mode CI gate (commit f15f30a) now exercises the full FFI test path on ubuntu/macOS/windows on every push, so:

  • Future regressions in cross-platform C++ integration → caught in CI, not after a release
  • Future ports that affect FMA-sensitive math → tolerance failures surface immediately, not after someone happens to run dual-mode locally
  • The match_indexed parity guarantee is now enforced, not just observable

This is exactly the kind of payoff a strong validation gate provides — you find latent issues at integration time, not at use time.

Refs #116

@kalwalt kalwalt moved this from In progress to In review in Plan to port KPM to rust May 8, 2026
@kalwalt kalwalt merged commit 5bb2895 into feat/milestone7-hough-voting-feature-match May 9, 2026
16 checks passed
kalwalt added a commit that referenced this pull request May 9, 2026
…tests

The fixed 1e-5 absolute tolerance in solve_linear_system_2x2_matches_cpp
and solve_symmetric_linear_system_3x3_matches_cpp is too tight for f32
values with magnitude > ~10. Apple Silicon ARM64 uses FMA differently
from x86_64, producing tiny precision differences that exceed 1e-5 for
larger results.

Example failure on macOS ARM64:
    rust=-40.826614, cpp=-40.8266, diff=1.5e-5

The diff of 1.5e-5 represents a relative error of ~3.7e-7, which is
well within f32 precision but exceeds the absolute tolerance.

Fix: use combined tolerance:
    tol = max(1e-5, |value| * 1e-6)

This gives:
- |x| <= 10: tolerance unchanged at 1e-5 (preserves existing strictness)
- |x|  = 100: tolerance 1e-4 (~1e-6 relative)
- |x|  = 1000: tolerance 1e-3 (~1e-6 relative)

These bounds are consistent with f32's ~7-digit precision and
accommodate platform-specific FMA rounding without losing test
sensitivity for typical values.

Surfaced now because PR #117's new dual-mode CI step exposed this
pre-existing issue on macOS runners. The math itself is correct;
only the test tolerance was too strict.

Refs #116
@github-project-automation github-project-automation Bot moved this from In review to Done in Plan to port KPM to rust May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant