From ad337a37771a053ef5028f08352a3a6f4766b853 Mon Sep 17 00:00:00 2001 From: Vittorio Distefano <101409677+blackopsrepl@users.noreply.github.com> Date: Mon, 20 Apr 2026 14:42:40 +0200 Subject: [PATCH] Add full-repo repetition audit and LOC-saving refactor plan --- docs/repetition-audit-2026-04-20.md | 195 ++++++++++++++++++++++++++++ 1 file changed, 195 insertions(+) create mode 100644 docs/repetition-audit-2026-04-20.md diff --git a/docs/repetition-audit-2026-04-20.md b/docs/repetition-audit-2026-04-20.md new file mode 100644 index 00000000..8e46f942 --- /dev/null +++ b/docs/repetition-audit-2026-04-20.md @@ -0,0 +1,195 @@ +# Repetition Audit (2026-04-20) + +This document records a full-repo repetition audit focused on Rust sources and a refactor plan that minimizes repeated logic while preserving SolverForge architecture constraints (zero-erasure, ownership transfer, no `Arc`/`Rc`/`dyn` in hot paths). + +## Methodology + +1. Scanned all Rust files in `crates/**`. +2. Used sliding-window duplicate detection (normalized, comment-stripped) to find high-confidence repeated blocks. +3. Manually reviewed top clone clusters and separated: + - intentional repetition (acceptable tradeoff), + - accidental/legacy repetition (high-priority refactor), + - structural repetition suitable for macro/helper extraction. +4. Estimated net LOC savings conservatively for each proposal. + +## Priority 1 — Remove legacy duplicate test suites in solver selectors/moves + +### Evidence + +The move and selector modules include direct `#[path = "*_tests.rs"]` module wiring in production files, while equivalent suites also exist under `tests/` module trees. + +Examples: +- `k_opt.rs` includes `k_opt_tests.rs`. (`crates/solverforge-solver/src/heuristic/move/k_opt.rs`) +- `list_change.rs` includes `list_change_tests.rs`. (`crates/solverforge-solver/src/heuristic/move/list_change.rs`) +- `sublist_change.rs` includes `sublist_change_tests.rs`. (`crates/solverforge-solver/src/heuristic/move/sublist_change.rs`) +- `move_selector.rs`, `pillar.rs`, and `mimic.rs` mirror the same pattern in selector code. + +Parallel suites already exist under: +- `crates/solverforge-solver/src/heuristic/move/tests/*.rs` +- `crates/solverforge-solver/src/heuristic/selector/tests/*.rs` + +### Refactor + +- Keep a single canonical test layout (`tests/` submodule tree). +- Remove `*_tests.rs` legacy files and inline `#[path = ...]` hooks. +- Preserve any unique assertions by migrating them into the canonical files before deletion. + +### LOC savings estimate + +- Deleting duplicate legacy files: **~1,652 LOC** (sum of seven confirmed duplicated `*_tests.rs` files). +- Removing legacy `#[path = ...]` hooks + cleanup: **~10 LOC**. +- **Estimated net savings: ~1,660 LOC**. + +## Priority 2 — Unify repeated weighting DSL builders across stream types + +### Evidence + +`penalize*`/`reward*` families are repeated with near-identical control flow and `ImpactType` handling across multiple stream types: + +- `cross_bi_stream/weighting.rs` +- `flattened_bi_stream/weighting.rs` +- `uni_stream/weighting.rs` +- `existence_stream.rs` +- `complemented_stream.rs` + +All repeat the same pattern: +- compute `is_hard` from weight, +- build typed builder struct, +- duplicate hard/soft convenience wrappers. + +### Refactor + +- Introduce an internal macro/helper layer for weighting family generation. +- Keep concrete builder types and monomorphized closures (no type erasure). +- Consolidate hard/soft convenience methods via shared expansion points. + +### LOC savings estimate + +- Current repeated weighting-heavy files total roughly **~1,330 LOC**. +- Conservative 30–35% reduction from shared generation layer. +- **Estimated net savings: ~400–470 LOC**. + +## Priority 3 — Collapse repeated n-ary arity stream macro definitions into one meta-macro + +### Evidence + +`arity_stream_macros/nary_stream/{bi,tri,quad,penta}.rs` are structurally near-identical and mostly differ by arity-specific filter/closure signatures. + +### Refactor + +- Replace four top-level arity macro files with one meta-macro file parameterized by: + - arity token set, + - filter trait family, + - closure signature tuple. +- Retain explicit generated APIs for ergonomics and docs. + +### LOC savings estimate + +- Current four files: **1,325 LOC**. +- Keep one meta-macro + concise arity invocations. +- **Estimated net savings: ~420–520 LOC**. + +## Priority 4 — Consolidate higher-arity incremental constraints (tri/quad/penta) + +### Evidence + +`constraint/nary_incremental/higher_arity/{tri,quad,penta}.rs` repeat lifecycle and delta-application flow with arity-specific tuple plumbing. + +### Refactor + +- Introduce shared macro-driven skeleton for: + - index maintenance, + - score impact update path, + - event handling branches. +- Keep explicit type signatures generated per arity to preserve compile-time clarity. + +### LOC savings estimate + +- Current three files: **1,101 LOC**. +- Conservative reduction with shared generation layer: 25–35%. +- **Estimated net savings: ~275–385 LOC**. + +## Priority 5 — Extract reusable list-neighborhood enumeration kernels for selector families + +### Evidence + +Move selector implementations for list neighborhoods contain duplicated iteration skeletons: +- `list_swap.rs` +- `sublist_change.rs` +- `sublist_swap.rs` + +Repeated mechanics include: +- entity materialization, +- precomputed lengths, +- nested start/size/destination loops, +- symmetric pair filtering. + +### Refactor + +- Introduce internal zero-erasure iterator kernels (generic over callback emitters), e.g.: + - `for_each_entity_pair(...)` + - `for_each_segment(...)` + - `for_each_non_overlapping_segment_pair(...)` +- Keep selector public APIs unchanged. + +### LOC savings estimate + +- Current candidate files total about **~830 LOC**. +- Conservative extraction savings: 20–30%. +- **Estimated net savings: ~165–250 LOC**. + +## Priority 6 — Deduplicate reusable test fixture builders across solver/scoring tests + +### Evidence + +Multiple suites repeat near-identical mini-domain setup (solution structs, descriptor creation, extractors, director wiring), notably in move/selector tests and several scoring constraint tests. + +### Refactor + +- Add internal test support modules per crate (`#[cfg(test)] mod test_support`) with: + - canonical descriptor builders, + - common toy domain structs, + - helper constructors for directors. +- Migrate suites to shared fixtures; keep scenario-specific assertions local. + +### LOC savings estimate + +- Conservative first-pass reduction: +- **Estimated net savings: ~180–280 LOC**. + +## Aggregate savings + +### Conservative low estimate +- Priority 1: 1,660 +- Priority 2: 400 +- Priority 3: 420 +- Priority 4: 275 +- Priority 5: 165 +- Priority 6: 180 + +**Total conservative estimate: ~3,100 LOC saved**. + +### Higher-confidence upper estimate +- Priority 1: 1,660 +- Priority 2: 470 +- Priority 3: 520 +- Priority 4: 385 +- Priority 5: 250 +- Priority 6: 280 + +**Total upper estimate: ~3,560 LOC saved**. + +## Recommended execution order + +1. **P1 first** (duplicate tests): immediate, low-risk, largest return. +2. **P2 + P3 together**: unify stream DSL repetition in one architectural pass. +3. **P4 next**: apply same generation strategy to higher-arity incremental constraints. +4. **P5**: selector kernels once test duplication is removed and benchmark baselines are stable. +5. **P6 continuously**: migrate fixtures opportunistically during touch-ups. + +## Risk controls + +- Preserve zero-erasure boundaries and current ownership model. +- Validate with `make test` after each priority slice. +- For P2–P5, benchmark hot paths before/after each refactor slice to ensure no runtime regression. +- Avoid compatibility shims; migrate all call sites directly per repository policy.