Skip to content

merge queue: embarking main (bd84315) and #135 together#136

Closed
mergify[bot] wants to merge 23 commits intomainfrom
mergify/merge-queue/ae1547d429
Closed

merge queue: embarking main (bd84315) and #135 together#136
mergify[bot] wants to merge 23 commits intomainfrom
mergify/merge-queue/ae1547d429

Conversation

@mergify
Copy link
Contributor

@mergify mergify bot commented Feb 18, 2026

🎉 This pull request has been checked successfully and will be merged soon. 🎉

Branch main (bd84315) and #135 are embarked together for merge.

This pull request has been created by Mergify to speculatively check the mergeability of #135.
You don't need to do anything. Mergify will close this pull request automatically when it is complete.

Required conditions of queue default for merge:

  • check-success = coverage
  • check-success = msrv (stable minus 1 releases)
  • check-success = msrv (stable minus 2 releases)
  • check-success = msrv (stable minus 3 releases)
  • check-success = msrv (stable minus 4 releases)
  • check-success = msrv (stable)
  • check-success = quality
  • check-success = test
  • check-success = test-cross-platform (macos-latest, macOS)
  • check-success = test-cross-platform (ubuntu-latest, Linux)
  • check-success = test-cross-platform (windows-latest, Windows)
  • all of [🛡 Merge Protections rule CI must pass]:
    • check-success = coverage
    • check-success = msrv (stable minus 1 releases)
    • check-success = msrv (stable minus 2 releases)
    • check-success = msrv (stable minus 3 releases)
    • check-success = msrv (stable minus 4 releases)
    • check-success = msrv (stable)
    • check-success = quality
    • check-success = test
    • check-success = test-cross-platform (macos-latest, macOS)
    • check-success = test-cross-platform (ubuntu-latest, Linux)
    • check-success = test-cross-platform (windows-latest, Windows)
  • all of [🛡 Merge Protections rule Do not merge outdated PRs]:
  • all of [🛡 Merge Protections rule Enforce conventional commit]:
  • any of [🛡 GitHub repository ruleset rule main]:
    • check-neutral = Mergify Merge Protections
    • check-skipped = Mergify Merge Protections
    • check-success = Mergify Merge Protections

Required conditions to stay in the queue:

---
checking_base_sha: bd84315a1a7b3fb453f3681b7dfae20db1847599
previous_failed_batches: []
pull_requests:
  - number: 135
...

unclesp1d3r and others added 23 commits February 16, 2026 20:39
Centralize CI commands in justfile as single source of truth:
- Replace raw cargo commands with just recipes in test,
  test-cross-platform, coverage, security, and docs workflows
- Fix cargo-dist issue in security workflow by using `dist plan`
  via mise exec instead of `cargo dist check` (subcommand not found)
- Add --all-features to test-ci, build-release, and coverage recipes
- Add outdated recipe for cargo-outdated checks
- Fix dist-check recipe to use `dist plan` (check subcommand
  does not exist in cargo-dist 0.30.3)

Quality and MSRV jobs unchanged (use dtolnay Rust, not mise).

Signed-off-by: Kevin Melton <kevin@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Move the ~400-line `impl StringExtractor for BasicExtractor` block from
mod.rs into its own file, reducing mod.rs from 1038 to 630 lines. The
new basic_extractor.rs (421 lines) contains the core extraction engine
that orchestrates ASCII, UTF-8, and UTF-16 string extraction across
binary sections.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Convert the 702-line filters.rs into a module directory with three
focused files:
- mod.rs (193 lines): FilterContext, NoiseFilter trait, CompositeNoiseFilter
- implementations.rs (394 lines): all 6 filter structs and NoiseFilter impls
- tests.rs (131 lines): all test code

All public re-exports remain unchanged so the API is not affected.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Convert src/extraction/ascii.rs (832 lines) into a module directory
with three focused files:

- mod.rs (147 lines): AsciiExtractionConfig, is_printable_ascii(), re-exports
- extraction.rs (316 lines): extract_ascii_strings(), extract_from_section()
- tests.rs (378 lines): all unit tests

All files are under the 500-line limit. All 473 tests pass.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Convert the 1,273-line src/extraction/utf16.rs into a module directory
with six focused files, each under 500 lines:

- mod.rs (122 lines): ByteOrder enum, extract_utf16_strings orchestrator, re-exports
- config.rs (97 lines): Utf16ExtractionConfig struct and constructors
- validation.rs (281 lines): UTF-16 sequence validation, printability checks, null patterns
- confidence.rs (201 lines): Confidence scoring heuristics (ASCII ratio, printable ratio, etc.)
- extraction.rs (380 lines): Core extraction functions and section-aware extraction
- tests.rs (238 lines): All unit tests

All public exports remain identical. No functional changes.

Signed-off-by: Kyle Melton <kyle@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Convert the 838-line dedup.rs into a module directory with three files:
- mod.rs (237 lines): CanonicalString, StringOccurrence structs, deduplicate(),
  merge_tags(), found_string_to_occurrence(), CanonicalString::to_found_string()
- scoring.rs (56 lines): calculate_combined_score()
- tests.rs (555 lines): All test code

Public API and re-exports remain unchanged.

Signed-off-by: Kevin Melton <kmelton@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Convert the largest file (1,449 lines) into a module directory with
six focused files, each under 500 lines:

- mod.rs (230): constants, entry points, orchestrator, decode_utf16le
- detection.rs (208): detect_version_info, detect_string_tables, detect_manifests
- version_info.rs (76): extract_version_info_strings
- string_tables.rs (200): parse_string_table_block, extract_string_table_strings
- manifests.rs (201): detect_manifest_encoding, decode_manifest, extract_manifest_strings
- tests.rs (594): all test code

All 473 tests pass. Public API unchanged.

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Document the CI workflow patterns (dtolnay vs mise, just recipes as
single source of truth) and Mergify behavior in AGENTS.md. Add the
extraction refactor and CLI implementation plan for project history.

Signed-off-by: Keith Melton <kmelton@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Fix panic in CanonicalString::to_found_string() on empty occurrences
  (now returns Option<FoundString>)
- Add #[allow] justification comments per project rules (4 files)
- Remove redundant empty checks in confidence.rs and validation.rs
- Strengthen test assertions: replace conditional passes with asserts,
  fix unused result, fix tautological assertion, implement empty test
- Add max_length validation to ExtractionConfig::validate()
- Improve CLI tests: check stderr on error, verify min_length filtering
- Clarify AGENTS.md CI Architecture wording

Signed-off-by: Keith Melton <kmelton@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Fix UTF-16 length calculation in version_info.rs and string_tables.rs:
  use encode_utf16().count() * 2 instead of text.len() for source bytes
- Fix missing trailing newline in CLI output for table/JSON formatters
- Remove unnecessary clone in basic_extractor.rs canonical string path
- Use encoding from map key in dedup instead of re-indexing found_strings
- Implement empty test bodies for string_tables and manifests detection
- Assert unused results in PE resource tests instead of discarding
- Add max_length vs min_ascii_length/min_wide_length validation

Signed-off-by: Keith Melton <kmelton@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Validate min_length at CLI level (clap range 1..) and call
  config.validate() before extraction
- Set min_wide_length from CLI --min-length flag for consistent
  filtering across all encodings
- Assert default CLI run succeeds before comparing output in tests

Signed-off-by: Keith Melton <kmelton@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Use usize directly for min_length CLI field instead of u64 with
cast. config.validate() handles the >= 1 constraint.

Signed-off-by: Keith Melton <kmelton@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Fix STRINGTABLE truncated entry bug: break instead of continue when
  byte_length exceeds remaining buffer, preventing misaligned reads
- Fix stale doc paths after refactoring (ascii/mod.rs, pe_resources/mod.rs)
- Rename test module internal_tests -> tests (project convention)
- Rename misleading test (no_translation -> fallback_language_handling)
- Replace no-op match with actual type assertion in metadata validation
- Split pe_resources/tests.rs (632 lines) into tests/ directory module
- Split dedup/tests.rs (560 lines) into tests/ directory module
- Add test for STRINGTABLE truncated entry parsing behavior

Signed-off-by: Keith Melton <kmelton@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Address 8 pre-existing design issues surfaced during bot review of PR #135:

1. Remove dual encoding fields: delete `encodings` from ExtractionConfig,
   keep only `enabled_encodings`. Eliminates confusing OR-checks throughout
   basic_extractor.rs.

2. Extract shared logic from extract/extract_canonical: factor out the
   duplicated section-sorting, filtering, symbol-inclusion, and semantic
   enrichment pipeline into a private `collect_all_strings()` helper.

3. Fix O(n^2) dedup in Auto byte order mode: replace
   `strings.iter_mut().find()` with HashMap-based O(1) lookup in
   utf16/mod.rs.

4. Remove unused `_preserve_all_occurrences` parameter from
   `deduplicate()` signature, ExtractionConfig struct, and all callsites.

5. Remove broken byte order consistency heuristic: the
   `check_byte_order_consistency()` function operated on decoded u16
   values which are identical regardless of original byte order, making
   LE always score 1.0 and BE always 0.0. Redistribute the 0.1 weight
   to valid_unicode (0.35) and printable (0.45) heuristics.

6. Fix char_count semantics for surrogate pairs: change `char_count += 1`
   to `char_count += consumed_units` in utf16/extraction.rs so min/max
   length checks count UTF-16 code units, not scalar values.

7. Replace `Result<_, ()>` with `StringyError` in UTF-16 decode functions:
   use `StringyError::ParseError` for the 3 decode functions in
   utf16/extraction.rs, removing `#[allow(clippy::result_unit_err)]`.

8. Add noncharacter validation in surrogate-pair path: reject U+xFFFE
   and U+xFFFF in supplementary planes (planes 1-16) in
   utf16/validation.rs, matching the BMP noncharacter check.

Signed-off-by: Kevin Melton <kmelton@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Add .github/actionlint.yml config to suppress lint errors from
release.yml, which is auto-generated by cargo-dist and cannot be
hand-edited.

Signed-off-by: Kevin Melton <kmelton@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
The cspell tool was configured but never installed in mise, so
lint-spell always failed. The project's domain-heavy jargon
(binary analysis terms, hex patterns) required 70+ exception words,
making the maintenance cost outweigh the marginal value. Typos in
identifiers are caught at compile time, and documentation quality
is covered by CodeRabbit reviews.

Signed-off-by: Kevin Melton <kmelton@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
@mergify
Copy link
Contributor Author

mergify bot commented Feb 18, 2026

🧪 CI Insights

Here's what we observed from your CI run for fd99943.

🟢 All jobs passed!

But CI Insights is watching 👀

@mergify mergify bot closed this Feb 18, 2026
@mergify mergify bot deleted the mergify/merge-queue/ae1547d429 branch February 18, 2026 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments