feat(types): extend FoundString data model with non_exhaustive and builder pattern by unclesp1d3r · Pull Request #123 · EvilBit-Labs/Stringy

unclesp1d3r · 2026-01-18T03:50:51Z

Summary

This PR extends the FoundString data model to support symbol demangling preservation and debug score breakdowns, completing the foundational ticket for the enhanced string analysis pipeline.

Impact: 27 files changed (+592 additions, -126 deletions)
Risk Level: Low - additive changes with backwards compatibility
Review Time: ~15-20 minutes

What Changed

Source Changes

File	Change Type	Description
`src/types.rs`	Modified	Added `#[non_exhaustive]`, new fields, constructor, and builder pattern
`src/extraction/*.rs`	Modified	Documentation updates for code block formatting
`src/classification/*.rs`	Modified	Documentation updates for code block formatting

Test Changes

File	Change Type	Description
`tests/test_deduplication.rs`	Modified	Updated to use new constructor/builder pattern
`tests/classification_integration.rs`	Modified	Updated to use new constructor/builder pattern

Documentation Changes

File	Change Type	Description
`docs/src/*.md`	Modified	Updated code blocks to use text formatting
`project_plan/tickets/*.md`	Modified	Marked acceptance criteria as completed

Configuration Changes

File	Change Type	Description
`.github/prompts/cicheck.prompt.md`	Added	CI check automation prompt
`.serena/project.yml`	Added	Serena project configuration

Why These Changes

1. API Stability (`#[non_exhaustive]`)

The FoundString struct is part of the public API. Adding #[non_exhaustive] ensures:

External code cannot use struct literal syntax
New fields can be added without breaking downstream consumers
Complies with AGENTS.md guidelines for public structs

2. Demangling Support (`original_text`)

The original_text: Option<String> field enables:

Preserving mangled C++/Rust symbol names before demangling
Users can see both human-readable and original forms
Required for the demangling pipeline in future tickets

3. Debug Score Breakdown

Three new optional fields provide score transparency:

section_weight: Option<i32> - Weight from section location
semantic_boost: Option<i32> - Boost from semantic classification
noise_penalty: Option<i32> - Penalty for noise characteristics

4. Builder Pattern

The new constructor and builder methods provide:

Type-safe construction without struct literals
Fluent API for setting optional fields
Clear documentation of required vs optional fields

Type of Change

Technical Implementation

New FoundString API

// Constructor with required fields only
let fs = FoundString::new(
    text,       // String - the extracted text
    encoding,   // Encoding - ASCII, UTF-8, UTF-16, etc.
    offset,     // u64 - file offset
    length,     // u32 - byte length
    source,     // StringSource - where it came from
);

// Builder pattern for optional fields
let fs = FoundString::new(text, encoding, offset, length, source)
    .with_rva(0x1000)
    .with_section(".rodata".to_string())
    .with_score(10)
    .with_confidence(0.9)
    .with_original_text("_ZN4test8functionE".to_string())
    .with_section_weight(50)
    .with_semantic_boost(25)
    .with_noise_penalty(-10);

Serde Behavior

Optional fields use #[serde(skip_serializing_if = "Option::is_none", default)]:

Fields are omitted from JSON when None
Fields default to None when missing during deserialization
Backwards compatible with existing JSON data

How Has This Been Tested?

Unit Tests

New tests in src/types.rs:

test_found_string_serde_optional_fields_none - Verifies None fields are omitted
test_found_string_serde_optional_fields_some - Verifies Some fields are included
test_found_string_serde_roundtrip - Full serialization roundtrip
test_found_string_deserialize_missing_optional_fields - Backwards compatibility

Integration Tests

Updated existing tests to use new API:

tests/classification_integration.rs - Uses FoundString::new()
tests/test_deduplication.rs - Uses builder pattern

CI Validation

All 343 tests passing
cargo fmt -- --check (passed)
cargo clippy -- -D warnings (passed)
cargo test (passed)

Breaking Changes

None - This PR maintains full backwards compatibility:

Existing code using FoundString fields continues to work
Existing JSON serialization/deserialization continues to work
New fields default to None when not provided

Note: External code using struct literal syntax will need to migrate to the constructor/builder pattern. This is intentional to ensure future field additions don't break downstream code.

Risk Assessment

Factor	Score	Details
Size	2/10	Large line count but mostly documentation/formatting
Complexity	2/10	Simple additive changes, builder pattern is straightforward
Test Coverage	1/10	Comprehensive tests added, all existing tests pass
Dependencies	1/10	No new dependencies
Security	1/10	No security implications

Overall Risk: Low (1.4/10)

Review Checklist

General

Code follows project style guidelines (AGENTS.md)
Self-review completed
Comments added for complex logic
No debugging code left
No sensitive data exposed

Code Quality

No code duplication
Functions are focused and small
Variable names are descriptive
#[must_use] on builder methods

Testing

All new code is covered by tests
Tests are meaningful and not just for coverage
Edge cases are tested (None/Some, missing fields)

Documentation

Documentation is clear and accurate
Examples are provided in doc comments
API changes are documented

Rust-Specific

No unsafe code
No clippy warnings
Proper serde attributes for optional fields

Dependencies

This PR is the foundational ticket that other work depends on:

Demangling pipeline (will use original_text)
Debug mode CLI flag (will use score breakdown fields)
Ranking system (will populate score breakdown fields)

Files Changed Summary

src/types.rs                          +247 lines  (core changes)
tests/test_deduplication.rs            -22 lines  (use new API)
tests/classification_integration.rs    -13 lines  (use new API)
docs/src/*.md                          ~60 lines  (formatting)
.github/prompts/*.md                   +25 lines  (new prompts)
.serena/*                              +90 lines  (config)

Generated with Claude Code

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

- Introduced `original_text` field to preserve pre-demangled text. - Updated related extraction and classification modules to initialize this field. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Added unit tests to verify serialization and deserialization of the FoundString struct, focusing on the handling of optional fields (original_text, section_weight, semantic_boost, noise_penalty). Updated the related project plan ticket to mark acceptance criteria as complete. Also added .serena project configuration files.

Add #[non_exhaustive] attribute to FoundString struct to comply with AGENTS.md guidelines for public API structs. This prevents external code from using struct literal syntax, ensuring forward compatibility when new fields are added. Add constructor and builder methods: - new() constructor with required fields - with_rva(), with_section(), with_tags(), with_score() - with_confidence(), with_original_text() - with_section_weight(), with_semantic_boost(), with_noise_penalty() Update test files to use the new constructor/builder pattern. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai · 2026-01-18T03:51:02Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Pull Request Summary: Extend FoundString Data Model with Non-Exhaustive and Builder Pattern

Architectural & API Changes

FoundString is marked #[non_exhaustive] to allow safe future extension and to discourage external struct-literal construction.
Adds a required-field constructor:
- FoundString::new(text: String, encoding: Encoding, offset: u64, length: u32, source: StringSource) -> Self
Adds fluent, #[must_use] builder-style setters for optional metadata:
- existing: with_rva, with_section, with_tags, with_score, with_confidence
- new: with_original_text, with_section_weight, with_semantic_boost, with_noise_penalty
New optional fields on FoundString:
- original_text: Option — preserves pre-demangled symbol text
- section_weight: Option — section contribution to score (debug)
- semantic_boost: Option — semantic classifier boost component (debug)
- noise_penalty: Option — noise penalty component (debug)
Serde: each new optional field uses skip_serializing_if = "Option::is_none" and default, so None is omitted and missing fields deserialize to None.
Migration note: callers that used struct literal construction must migrate to FoundString::new(...) + builder chain; deserialization and code that treats FoundString as opaque remain compatible.

Binary Format & Extraction Impact

All extraction paths and insertion sites updated to use the new constructor/builder and to carry the new optional fields (set to None by default) across:
- ASCII and UTF-16 extractors (ascii.rs, utf16.rs)
- Deduplication / canonicalization (dedup.rs)
- Mach‑O load-command extraction (macho_load_commands.rs)
- PE resource extractors (pe_resources.rs)
- extraction/mod.rs and related modules
No extraction algorithms, scoring formulas, or control flow were modified. The change is structural: additional optional metadata accompanies each FoundString without altering default behavior.

Semantic Classification & Debugging Accuracy

original_text preserves mangled input so demangling operations can be audited and correlated back to source symbols — improves traceability.
section_weight, semantic_boost, and noise_penalty expose per-component contributions to the final score, enabling transparent debugging, tuning, and validation of classification heuristics.
Classification logic remains unchanged; these fields are instrumentation for inspection. They do not alter runtime classification unless consumers explicitly read or act on them.

Backwards Compatibility & Serialization

Backward-compatible for consumers that deserialize or use provided constructors: serialized outputs remain compact (new fields omitted when None) and deserializers are tolerant (missing fields => None).
Breaking only for callers that constructed FoundString via struct literal syntax; such code must migrate to the new constructor + builder pattern.

Tests, CI & Surface Area

Tests updated to construct FoundString via new constructor + builder (tests/test_deduplication.rs, tests/classification_integration.rs).
New unit tests verify serde behavior (skip when None, present when Some) and full round-trip serialization.
CI workflow modified (ci-gate aggregation job); documentation and project config updated (.github, docs/, project_plan/, .serena/*).
Primary implementation and tests are in src/types.rs (constructor, builders, fields, serde tests); multiple extraction modules updated to initialize new fields.
Aggregate impact: 27 files changed, +592 / -126 (per PR description).

Walkthrough

This PR expands the public data model (FoundString) with four optional metadata fields and builder APIs, updates all extraction and test sites to use the new constructors/fields, adds ContainerInfo constructor fields, introduces CI/prompts and a Serena project config, and converts many documentation code fences from rust/json to text.

Changes

Cohort / File(s)	Summary
Core Types & Constructors `src/types.rs`	Adds/exports new fields on `FoundString`: `original_text: Option<String>`, `section_weight: Option<i32>`, `semantic_boost: Option<i32>`, `noise_penalty: Option<i32>`; marks struct `#[non_exhaustive]`; adds `FoundString::new` and fluent `with_*` builders, serde skip-if-none annotations, accessors, and unit tests. Also expands `ContainerInfo::new(...)` signature to accept `imports`, `exports`, and `resources`.
Extraction Modules `src/extraction/*` `src/extraction/ascii.rs`, `src/extraction/utf16.rs`, `src/extraction/pe_resources.rs`, `src/extraction/macho_load_commands.rs`, `src/extraction/mod.rs`, `src/extraction/dedup.rs`	Update all `FoundString` initializations and helper conversions to populate the new optional fields (set to `None`) and adapt to `FoundString::new`/builders where used. No extraction/control-flow logic changes.
Classification / Semantic `src/classification/mod.rs`, `src/classification/semantic.rs`, `docs/src/classification.md`	Add `classify_unc_path(&self, text: &str) -> Option<Tag>` (doc/tests updated) and update examples/tests to construct `FoundString` via new API and include new optional fields in examples. No change to detection algorithms.
Tests `tests/*` `tests/classification_integration.rs`, `tests/test_deduplication.rs`	Replace struct literal initializations with `FoundString::new(...)` and chained `with_*` calls; assertions remain functionally equivalent; test scaffolding updated for new fields.
Docs & Examples `docs/src/*`, `codebase_analysis.md`, `project_plan/specs/...`	Change fenced-code language annotations from `rust` / `json` to `text` across docs and examples; update some example blocks to reflect expanded constructors/fields. Presentation-only changes.
CI / Prompts / Project Config `.github/prompts/cicheck.prompt.md`, `.github/prompts/simplicity-review.prompt.md`, `.github/copilot-instructions.md`, `.github/workflows/ci.yml`, `.serena/project.yml`, `.serena/.gitignore`	Add CI prompt files, a CI aggregation job (`ci-gate`) and guidance prompt, add Serena project config and ignore `/cache`. Minor CI matrix change (ARM test disabled).
Manifest `Cargo.toml`	Formatting/whitespace normalization of TOML arrays (authors/keywords/categories).
Project Plan / Tickets `project_plan/tickets/Extend_FoundString_data_model_for_demangling_and_debug_support.md`	Mark acceptance checklist items completed (presentation-only).

Sequence Diagram(s)

(omitted — changes are data-model and initialization updates; no new multi-component sequential control flow introduced)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(classification): add semantic classifier for URL and domain detection #118 — Related to changes in the classification module and addition of classify_unc_path and example updates.
feat(parse): implement string deduplication with metadata preservation #117 — Related to extraction/dedup and FoundString-related API changes and conversions.

Poem

✨ Fields added, builders sing,
Found strings carry a little more ring.
Docs tidied, CI hums along,
New constructors join the throng.
🛠️ None for now, but ready and strong.

🚥 Pre-merge checks | ✅ 4 | ❌ 4

❌ Failed checks (4 warnings)

Check name	Status	Explanation	Resolution
No Unsafe Code	⚠️ Warning	PR introduces no unsafe code blocks, functions, or `#[allow(unsafe_code)]` attributes across modified files, but crate lacks `#![forbid(unsafe_code)]` declaration in src/lib.rs.	Add `#![forbid(unsafe_code)]` as the first crate-level attribute in src/lib.rs to enforce compile-time guarantees against unsafe code introduction.
Ascii Only	⚠️ Warning	PR contains multiple Unicode violations: em-dashes in prompts and docs, emojis (✅❌⚠️→) in documentation and code comments, and non-ASCII test strings (世界).	Replace em-dashes with hyphens, remove emojis using plain text alternatives, use ASCII-only test data or separate UTF-8 validation logic.
File Size Limit	⚠️ Warning	src/types.rs exceeds 500-line limit at 556 lines; file must be split into modules per check requirements.	Refactor src/types.rs by extracting FoundString implementation and builders into src/types/found_string.rs with proper re-exports to maintain public API visibility.
Section Weight Consistency	⚠️ Warning	Section weights are not being consistently applied; all section_weight fields initialize to None and builder method is never invoked in extraction paths.	Implement weight assignment in extraction modules per specified ranges (read-only strings: 9.0-10.0, tables: 8.0, read-only data: 7.0, writable: 5.0, code: ≤3.0) and clarify if i32 or f32 type is intended.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(types): extend FoundString data model with non_exhaustive and builder pattern' follows Conventional Commits format with correct type and scope, and accurately describes the main changes in the PR.
Description check	✅ Passed	The description comprehensively explains the changes across all modified files, provides technical implementation details, testing approach, and risk assessment—directly related to the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 70.00%.
Error Handling	✅ Passed	PR extends FoundString model with optional fields and builder methods without modifying error handling. Existing StringyError infrastructure remains unchanged.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch enhance_foundstring_model

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR extends the FoundString data model to support future demangling functionality and debug-mode score breakdowns. It adds #[non_exhaustive] to the struct and implements a constructor/builder pattern for external API consumers.

Changes:

Added #[non_exhaustive] attribute to FoundString with constructor and builder methods for forward compatibility
Added four new optional fields: original_text, section_weight, semantic_boost, and noise_penalty
Updated all struct instantiations throughout the codebase to include the new fields (initialized to None)
Changed documentation code blocks from rust` to text` (reduces syntax highlighting)

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/types.rs	Core changes: added `#[non_exhaustive]`, new fields, constructor, builder methods, and comprehensive serde tests
src/extraction/ascii.rs	Updated struct literals to include new optional fields
src/extraction/utf16.rs	Updated struct literals to include new optional fields
src/extraction/pe_resources.rs	Updated struct literals to include new optional fields
src/extraction/mod.rs	Updated struct literals to include new optional fields
src/extraction/macho_load_commands.rs	Updated struct literals to include new optional fields
src/extraction/dedup.rs	Updated struct literals and `to_found_string()` conversion
src/classification/semantic.rs	Updated documentation examples with new fields
src/classification/mod.rs	Updated documentation examples with new fields
tests/test_deduplication.rs	Updated tests to use new builder pattern
tests/classification_integration.rs	Updated helper to use new constructor
docs/src/*.md	Changed code block markers from `rust` to `text`
.github/copilot-instructions.md	Added complete example of `#[non_exhaustive]` pattern
Cargo.toml	Minor formatting cleanup
project_plan/tickets/*.md	Marked acceptance criteria as complete
.serena/*	New tool configuration files (unrelated to ticket)
.github/prompts/*	New AI agent prompt files (unrelated to ticket)

Copilot · 2026-01-18T03:56:04Z

docs/src/output-formats.md

 ### Example Output

-```json
+```text


Throughout the documentation files, Rust code blocks have been changed from rust` to text, removing syntax highlighting. This significantly degrades the developer experience by removing color coding and syntax support in documentation viewers. Unless there's a specific technical reason for this change (e.g., incompatibility with a documentation tool), these should remain as ````rust blocks. This affects all documentation in docs/src/ including api.md, architecture.md, binary-formats.md, classification.md, performance.md, and string-extraction.md.

Suggested change

```text

```json

Copilot · 2026-01-18T03:56:04Z

.serena/project.yml

+# list of languages for which language servers are started; choose from:
+#   al                  bash                clojure             cpp                 csharp
+#   csharp_omnisharp    dart                elixir              elm                 erlang
+#   fortran             fsharp              go                  groovy              haskell
+#   java                julia               kotlin              lua                 markdown
+#   matlab              nix                 pascal              perl                php
+#   powershell          python              python_jedi         r                   rego
+#   ruby                ruby_solargraph     rust                scala               swift
+#   terraform           toml                typescript          typescript_vts      vue
+#   yaml                zig
+#   (This list may be outdated. For the current list, see values of Language enum here:
+#   https://github.com/oraios/serena/blob/main/src/solidlsp/ls_config.py
+#   For some languages, there are alternative language servers, e.g. csharp_omnisharp, ruby_solargraph.)
+# Note:
+#   - For C, use cpp
+#   - For JavaScript, use typescript
+#   - For Free Pascal/Lazarus, use pascal
+# Special requirements:
+#   - csharp: Requires the presence of a .sln file in the project folder.
+#   - pascal: Requires Free Pascal Compiler (fpc) and optionally Lazarus.
+# When using multiple languages, the first language server that supports a given file will be used for that file.
+# The first language is the default language and the respective language server will be used as a fallback.
+# Note that when using the JetBrains backend, language servers are not used and this list is correspondingly ignored.
+languages:
+- rust
+
+# the encoding used by text files in the project
+# For a list of possible encodings, see https://docs.python.org/3.11/library/codecs.html#standard-encodings
+encoding: "utf-8"
+
+# whether to use project's .gitignore files to ignore files
+ignore_all_files_in_gitignore: true
+
+# list of additional paths to ignore in all projects
+# same syntax as gitignore, so you can use * and **
+ignored_paths: []
+
+# whether the project is in read-only mode
+# If set to true, all editing tools will be disabled and attempts to use them will result in an error
+# Added on 2025-04-18
+read_only: false
+
+# list of tool names to exclude. We recommend not excluding any tools, see the readme for more details.
+# Below is the complete list of tools for convenience.
+# To make sure you have the latest list of tools, and to view their descriptions, 
+# execute `uv run scripts/print_tool_overview.py`.
+#
+#  * `activate_project`: Activates a project by name.
+#  * `check_onboarding_performed`: Checks whether project onboarding was already performed.
+#  * `create_text_file`: Creates/overwrites a file in the project directory.
+#  * `delete_lines`: Deletes a range of lines within a file.
+#  * `delete_memory`: Deletes a memory from Serena's project-specific memory store.
+#  * `execute_shell_command`: Executes a shell command.
+#  * `find_referencing_code_snippets`: Finds code snippets in which the symbol at the given location is referenced.
+#  * `find_referencing_symbols`: Finds symbols that reference the symbol at the given location (optionally filtered by type).
+#  * `find_symbol`: Performs a global (or local) search for symbols with/containing a given name/substring (optionally filtered by type).
+#  * `get_current_config`: Prints the current configuration of the agent, including the active and available projects, tools, contexts, and modes.
+#  * `get_symbols_overview`: Gets an overview of the top-level symbols defined in a given file.
+#  * `initial_instructions`: Gets the initial instructions for the current project.
+#     Should only be used in settings where the system prompt cannot be set,
+#     e.g. in clients you have no control over, like Claude Desktop.
+#  * `insert_after_symbol`: Inserts content after the end of the definition of a given symbol.
+#  * `insert_at_line`: Inserts content at a given line in a file.
+#  * `insert_before_symbol`: Inserts content before the beginning of the definition of a given symbol.
+#  * `list_dir`: Lists files and directories in the given directory (optionally with recursion).
+#  * `list_memories`: Lists memories in Serena's project-specific memory store.
+#  * `onboarding`: Performs onboarding (identifying the project structure and essential tasks, e.g. for testing or building).
+#  * `prepare_for_new_conversation`: Provides instructions for preparing for a new conversation (in order to continue with the necessary context).
+#  * `read_file`: Reads a file within the project directory.
+#  * `read_memory`: Reads the memory with the given name from Serena's project-specific memory store.
+#  * `remove_project`: Removes a project from the Serena configuration.
+#  * `replace_lines`: Replaces a range of lines within a file with new content.
+#  * `replace_symbol_body`: Replaces the full definition of a symbol.
+#  * `restart_language_server`: Restarts the language server, may be necessary when edits not through Serena happen.
+#  * `search_for_pattern`: Performs a search for a pattern in the project.
+#  * `summarize_changes`: Provides instructions for summarizing the changes made to the codebase.
+#  * `switch_modes`: Activates modes by providing a list of their names
+#  * `think_about_collected_information`: Thinking tool for pondering the completeness of collected information.
+#  * `think_about_task_adherence`: Thinking tool for determining whether the agent is still on track with the current task.
+#  * `think_about_whether_you_are_done`: Thinking tool for determining whether the task is truly completed.
+#  * `write_memory`: Writes a named memory (for future reference) to Serena's project-specific memory store.
+excluded_tools: []
+
+# initial prompt for the project. It will always be given to the LLM upon activating the project
+# (contrary to the memories, which are loaded on demand).
+initial_prompt: ""
+
+project_name: "Stringy"
+included_optional_tools: []


This PR includes unrelated infrastructure files (.serena/project.yml, .serena/.gitignore, .github/prompts/cicheck.prompt.md, .github/prompts/simplicity-review.prompt.md) that are not part of the ticket's scope to extend the FoundString data model. These should be in a separate PR focused on development tooling setup.

Copilot · 2026-01-18T03:56:04Z

src/types.rs

+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    /// Creates a test FoundString with all optional fields set to None
+    fn create_test_found_string() -> FoundString {
+        FoundString {
+            text: "test_string".to_string(),
+            original_text: None,
+            encoding: Encoding::Ascii,
+            offset: 0x1000,
+            rva: Some(0x2000),
+            section: Some(".rodata".to_string()),
+            length: 11,
+            tags: vec![Tag::Url],
+            score: 100,
+            section_weight: None,
+            semantic_boost: None,
+            noise_penalty: None,
+            source: StringSource::SectionData,
+            confidence: 0.85,
+        }
+    }
+
+    #[test]
+    fn test_found_string_serde_optional_fields_none() {
+        // Test that optional fields are skipped when None
+        let found_string = create_test_found_string();
+        let json = serde_json::to_string(&found_string).expect("Serialization failed");
+
+        // Verify optional fields are not present in JSON
+        assert!(!json.contains("original_text"));
+        assert!(!json.contains("section_weight"));
+        assert!(!json.contains("semantic_boost"));
+        assert!(!json.contains("noise_penalty"));
+
+        // Verify required fields are present
+        assert!(json.contains("text"));
+        assert!(json.contains("encoding"));
+        assert!(json.contains("offset"));
+    }
+
+    #[test]
+    fn test_found_string_serde_optional_fields_some() {
+        // Test that optional fields are included when Some
+        let mut found_string = create_test_found_string();
+        found_string.original_text = Some("_ZN4test6mangled".to_string());
+        found_string.section_weight = Some(50);
+        found_string.semantic_boost = Some(25);
+        found_string.noise_penalty = Some(-10);
+
+        let json = serde_json::to_string(&found_string).expect("Serialization failed");
+
+        // Verify optional fields are present in JSON
+        assert!(json.contains("original_text"));
+        assert!(json.contains("_ZN4test6mangled"));
+        assert!(json.contains("section_weight"));
+        assert!(json.contains("semantic_boost"));
+        assert!(json.contains("noise_penalty"));
+    }
+
+    #[test]
+    fn test_found_string_serde_roundtrip() {
+        // Test serialization/deserialization roundtrip with all fields
+        let mut found_string = create_test_found_string();
+        found_string.original_text = Some("mangled_name".to_string());
+        found_string.section_weight = Some(75);
+        found_string.semantic_boost = Some(30);
+        found_string.noise_penalty = Some(-5);
+
+        let json = serde_json::to_string(&found_string).expect("Serialization failed");
+        let deserialized: FoundString =
+            serde_json::from_str(&json).expect("Deserialization failed");
+
+        assert_eq!(found_string.text, deserialized.text);
+        assert_eq!(found_string.original_text, deserialized.original_text);
+        assert_eq!(found_string.section_weight, deserialized.section_weight);
+        assert_eq!(found_string.semantic_boost, deserialized.semantic_boost);
+        assert_eq!(found_string.noise_penalty, deserialized.noise_penalty);
+    }
+
+    #[test]
+    fn test_found_string_deserialize_missing_optional_fields() {
+        // Test that missing optional fields default to None during deserialization
+        let json = r#"{
+            "text": "test",
+            "encoding": "Ascii",
+            "offset": 0,
+            "rva": null,
+            "section": null,
+            "length": 4,
+            "tags": [],
+            "score": 0,
+            "source": "SectionData",
+            "confidence": 1.0
+        }"#;
+
+        let deserialized: FoundString = serde_json::from_str(json).expect("Deserialization failed");
+
+        assert_eq!(deserialized.text, "test");
+        assert_eq!(deserialized.original_text, None);
+        assert_eq!(deserialized.section_weight, None);
+        assert_eq!(deserialized.semantic_boost, None);
+        assert_eq!(deserialized.noise_penalty, None);
+    }
+}


The new tests validate serialization/deserialization of the new fields but don't test the new constructor FoundString::new() or the builder methods (with_rva(), with_section(), etc.). Consider adding a test that demonstrates the intended API usage pattern, such as creating a FoundString with the constructor and then chaining builder methods. This would serve as both a test and documentation of the API.

Copilot · 2026-01-18T03:56:05Z

src/types.rs

+        FoundString {
+            text: "test_string".to_string(),
+            original_text: None,
+            encoding: Encoding::Ascii,
+            offset: 0x1000,
+            rva: Some(0x2000),
+            section: Some(".rodata".to_string()),
+            length: 11,
+            tags: vec![Tag::Url],
+            score: 100,
+            section_weight: None,
+            semantic_boost: None,
+            noise_penalty: None,
+            source: StringSource::SectionData,
+            confidence: 0.85,
+        }


The test helper function create_test_found_string() uses struct literal construction with #[non_exhaustive]. This will cause compilation errors for external crates that cannot construct FoundString using struct literals. Since this is in the same crate module (under #[cfg(test)]), it compiles, but it sets a poor example. Consider using the new FoundString::new() constructor followed by builder methods to demonstrate the intended usage pattern.

Suggested change

FoundString {

text: "test_string".to_string(),

original_text: None,

encoding: Encoding::Ascii,

offset: 0x1000,

rva: Some(0x2000),

section: Some(".rodata".to_string()),

length: 11,

tags: vec![Tag::Url],

score: 100,

section_weight: None,

semantic_boost: None,

noise_penalty: None,

source: StringSource::SectionData,

confidence: 0.85,

}

FoundString::new(

"test_string".to_string(),

Encoding::Ascii,

0x1000,

11,

StringSource::SectionData,

)

.with_rva(0x2000)

.with_section(".rodata".to_string())

.with_tags(vec![Tag::Url])

.with_score(100)

.with_confidence(0.85)

Add a gate job that always runs and aggregates the status of all CI jobs. This enables branch protection rules to require "CI Gate" as a status check, which will pass if: - All jobs succeeded, OR - Conditional jobs were skipped (due to path filters) and other jobs passed The gate job fails only if any job actually failed or was cancelled. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (11)

docs/src/api.md (1)

11-33: Sync FoundString documentation with the new public API.

The struct example omits the new optional metadata fields, the confidence field, and the #[non_exhaustive] attribute, so external users will not see the current API. Please update this block to match src/types.rs and consider adding a short constructor example for FoundString::new to avoid drift.

docs/src/classification.md (1)

173-189: Update the public example to use FoundString::new.

Because FoundString is now non-exhaustive, the struct literal in the docs will not compile for external users. Use the constructor instead.

Proposed doc update

-let found_string = FoundString {
-    text: "C:\\Windows\\System32\\cmd.exe".to_string(),
-    encoding: Encoding::Ascii,
-    offset: 0,
-    rva: None,
-    section: None,
-    length: 27,
-    tags: Vec::new(),
-    score: 0,
-    source: StringSource::SectionData,
-    confidence: 1.0,
-};
+let found_string = FoundString::new(
+    "C:\\Windows\\System32\\cmd.exe".to_string(),
+    Encoding::Ascii,
+    0,
+    27,
+    StringSource::SectionData,
+);

src/extraction/utf16.rs (2)

823-842: Use FoundString::new() constructor instead of struct literal.

The PR introduces FoundString::new() and builder methods, yet this code still uses struct literal initialization. With #[non_exhaustive] on FoundString, external crates cannot construct via literals. While internal code can, using the constructor is more maintainable.

♻️ Suggested refactor

-                        found_strings.push(FoundString {
-                            text,
-                            original_text: None,
-                            encoding: match byte_order {
-                                ByteOrder::LE => Encoding::Utf16Le,
-                                ByteOrder::BE => Encoding::Utf16Be,
-                                ByteOrder::Auto => unreachable!(),
-                            },
-                            offset: start as u64,
-                            rva: None,
-                            section: None,
-                            length: bytes_for_decoding.len() as u32,
-                            tags: Vec::new(),
-                            score: 0,
-                            section_weight: None,
-                            semantic_boost: None,
-                            noise_penalty: None,
-                            source: StringSource::SectionData,
-                            confidence: utf16_confidence,
-                        });
+                        let encoding = match byte_order {
+                            ByteOrder::LE => Encoding::Utf16Le,
+                            ByteOrder::BE => Encoding::Utf16Be,
+                            ByteOrder::Auto => unreachable!(),
+                        };
+                        found_strings.push(
+                            FoundString::new(
+                                text,
+                                encoding,
+                                start as u64,
+                                bytes_for_decoding.len() as u32,
+                                StringSource::SectionData,
+                            )
+                            .with_confidence(utf16_confidence)
+                        );

1-1036: File exceeds 500 line limit at 1273 lines.

Per coding guidelines, files should stay under 500 lines. Consider splitting into submodules:

Core extraction logic (extract.rs)
Confidence scoring (scoring.rs)
Printability checks (printable.rs)
Section-aware extraction (section.rs)

src/extraction/mod.rs (2)

480-522: DRY violation: Symbol handling duplicated between extract() and extract_canonical().

Lines 480-522 and 583-626 contain nearly identical code for adding import/export symbols. Extract this into a helper method.
♻️ Suggested refactor
fn collect_symbol_strings(container_info: &ContainerInfo) -> Vec<FoundString> {
    let mut symbols = Vec::new();
    for import in &container_info.imports {
        symbols.push(FoundString::new(
            import.name.clone(),
            Encoding::Utf8,
            0,
            import.name.len() as u32,
            StringSource::ImportName,
        ));
    }
    for export in &container_info.exports {
        symbols.push(FoundString::new(
            export.name.clone(),
            Encoding::Utf8,
            0,
            export.name.len() as u32,
            StringSource::ExportName,
        ));
    }
    symbols
}
Also applies to: 583-626

1-948: File exceeds 500 line limit at 1517 lines.

This file contains the trait, config, extractor implementation, helper functions, and extensive tests. Consider splitting:

trait.rs - StringExtractor trait

config.rs - ExtractionConfig (already exists but could absorb more)

basic.rs - BasicExtractor implementation

Move tests to tests/ or mod/tests.rs

src/extraction/macho_load_commands.rs (1)

101-116: Use FoundString::new() with builder methods for dylib and rpath strings.

Both extract_dylib_strings and extract_rpath_strings use struct literals. Refactor to use the constructor and .with_tags() builder.

♻️ Suggested refactor for extract_dylib_strings

-        strings.push(FoundString {
-            text: lib.to_string(),
-            original_text: None,
-            encoding: Encoding::Utf8,
-            source: StringSource::LoadCommand,
-            tags,
-            section: None,
-            offset: 0,
-            rva: None,
-            length,
-            score: 0,
-            section_weight: None,
-            semantic_boost: None,
-            noise_penalty: None,
-            confidence: 1.0,
-        });
+        strings.push(
+            FoundString::new(
+                lib.to_string(),
+                Encoding::Utf8,
+                0,
+                length,
+                StringSource::LoadCommand,
+            )
+            .with_tags(tags)
+        );

Also applies to: 134-149

src/extraction/pe_resources.rs (2)

427-443: Use FoundString::new() with builder methods.

Same pattern as other files - refactor to use constructor and builders like .with_section() and .with_tags().

♻️ Suggested refactor

-                let found_string = FoundString {
-                    text,
-                    original_text: None,
-                    encoding: Encoding::Utf16Le,
-                    offset: 0, // pelite doesn't provide offsets easily
-                    rva: None,
-                    section: Some(".rsrc".to_string()),
-                    length,
-                    tags: vec![Tag::Version, Tag::Resource],
-                    score: 0,
-                    section_weight: None,
-                    semantic_boost: None,
-                    noise_penalty: None,
-                    source: StringSource::ResourceString,
-                    confidence: 1.0,
-                };
+                let found_string = FoundString::new(
+                    text,
+                    Encoding::Utf16Le,
+                    0,
+                    length,
+                    StringSource::ResourceString,
+                )
+                .with_section(".rsrc".to_string())
+                .with_tags(vec![Tag::Version, Tag::Resource]);

1-828: File exceeds 500 line limit at 1442 lines.

Consider splitting:

pe_resources/mod.rs - public API and orchestration
pe_resources/version_info.rs - VERSIONINFO extraction
pe_resources/string_table.rs - STRINGTABLE extraction
pe_resources/manifest.rs - MANIFEST extraction
pe_resources/encoding.rs - decode_utf16le, detect_manifest_encoding

src/classification/semantic.rs (2)

1-1640: File exceeds the 500-600 line limit (1640 lines).

Per coding guidelines, files should stay under 500-600 lines. Consider splitting this module - for example, extract the path classification (classify_posix_path, classify_windows_path, classify_unc_path, classify_registry_path) and their validation helpers into a separate paths.rs submodule, and keep URL/domain/IP classification here.

1018-1035: Consider using the constructor in test helper for consistency.

While same-crate tests can use struct literals, using FoundString::new() would exercise the public API and automatically track any signature changes.
Suggested fix
 fn create_test_string(text: &str) -> FoundString {
-    FoundString {
-        text: text.to_string(),
-        original_text: None,
-        encoding: Encoding::Ascii,
-        offset: 0,
-        rva: None,
-        section: None,
-        length: text.len() as u32,
-        tags: Vec::new(),
-        score: 0,
-        section_weight: None,
-        semantic_boost: None,
-        noise_penalty: None,
-        source: StringSource::SectionData,
-        confidence: 1.0,
-    }
+    FoundString::new(
+        text.to_string(),
+        Encoding::Ascii,
+        0,
+        text.len() as u32,
+        StringSource::SectionData,
+    )
 }

🤖 Fix all issues with AI agents

In
`@project_plan/tickets/Extend_FoundString_data_model_for_demangling_and_debug_support.md`:
- Around line 25-30: The acceptance checklist item claiming "No breaking changes
to existing code that creates FoundString instances" is incorrect because adding
#[non_exhaustive] makes struct literal construction a breaking change for
external users; update the wording to acknowledge this and instruct downstream
users to migrate to the provided constructor and builders by naming the symbols
FoundString::new and the builder methods (e.g., FoundString::builder or similar)
so callers switch from struct literals to the new API; also add a short
migration note explaining that existing direct field initialization must be
replaced with FoundString::new(...) or the builder methods to set optional
fields.

In `@src/classification/mod.rs`:
- Around line 31-46: The doc example currently constructs FoundString via struct
literal; update it to use the new constructor and builder pattern by calling
FoundString::new() and chaining the builder methods (e.g., .text(...),
.encoding(...), .offset(...), .length(...), .source(...), .confidence(...),
etc.) to set the same fields as the example so the docs show the preferred API
(refer to FoundString::new and the associated builder method names in the
implementation).

In `@src/classification/semantic.rs`:
- Around line 21-36: The doc example is constructing FoundString via a struct
literal but FoundString is now #[non_exhaustive] and must be created with the
public constructor/builder API; update the example to call FoundString::new(...)
and chain the available builder methods (e.g., .original_text(...),
.encoding(...), .offset(...), .length(...), .tags(...), .source(...),
.confidence(...), etc.) to set the same fields shown in the literal so the
example uses the public FoundString::new and builder methods instead of a struct
literal.
- Around line 353-368: The doc example constructs a FoundString via direct
struct literal; update it to use the public constructor/builder API (the same
pattern used in the module-level example) so the example matches the external
API surface. Locate the example that instantiates FoundString and replace the
literal initialization with calls to FoundString::new(...) or the
FoundStringBuilder (whichever the module-level docs use), setting text,
encoding, offset, length, source, confidence and any optional fields via the
builder methods to mirror the existing public constructor/builder usage.

In `@src/extraction/ascii.rs`:
- Around line 231-246: Replace the repeated FoundString struct literals in
src/extraction/ascii.rs with calls to FoundString::new so you don't duplicate
default values; for each creation site (the three places constructing
FoundString with many None/Vec::new/0/1.0 literals) call FoundString::new and
pass only the varying fields (e.g., text, offset, length, Encoding::Ascii and
StringSource::SectionData or whatever differs) and let the constructor provide
defaults for fields like original_text, rva, section, tags, score,
section_weight, semantic_boost, noise_penalty, and confidence; update all three
creation sites (the three struct literal blocks) to use FoundString::new
consistently.

In `@src/extraction/dedup.rs`:
- Around line 330-345: Replace direct struct construction of FoundString in test
helpers with the public constructor FoundString::new(...) to follow the
preferred API; locate the literal instance (the block that sets text,
original_text, encoding, offset, rva, section, length, tags, score,
section_weight, semantic_boost, noise_penalty, source, confidence) and call
FoundString::new with the appropriate arguments (text, encoding, offset,
section, length, tags, score, source, confidence) letting the constructor
compute rva and default optional fields instead of manually setting them; apply
the same replacement for the other occurrences referenced (the similar literals
around the noted ranges) so tests use FoundString::new consistently.
- Around line 283-298: Replace the manual struct literal in to_found_string()
with the public constructor FoundString::new(...) so the conversion uses the
canonical API; update the call to pass the same values currently set in the
literal (text, original_text, encoding, offset, rva, section, length, tags,
score, section_weight, semantic_boost, noise_penalty, source, confidence)
adapting to FoundString::new's parameter types (wrap options or pass bare values
as required) and ensure to_found_string() returns the value produced by
FoundString::new rather than constructing the struct directly.

In `@src/extraction/mod.rs`:
- Around line 765-780: The UTF-8 extraction path is still creating FoundString
with a struct literal; change it to use the refactored FoundString constructor
used in the other extraction path (e.g., FoundString::new or the project's
equivalent) instead of the literal. Construct the FoundString with the same
values: text, original_text = None, encoding, offset = absolute_offset, rva,
section = section.name.clone(), length = length as u32, tags = Vec::new(), score
= 0, section_weight = None, semantic_boost = None, noise_penalty = None, source
= StringSource::SectionData, and confidence; ensure the call matches the
constructor signature used elsewhere so both UTF-8 and other extraction paths
share the same creation code for consistency.
- Around line 484-499: Replace the inline struct literal used when pushing
import names into all_strings with the new constructor FoundString::new(...):
locate the push that currently builds FoundString { text: import.name.clone(),
original_text: None, encoding: Encoding::Utf8, offset: 0, rva: None, section:
None, length, tags: Vec::new(), score: 0, section_weight: None, semantic_boost:
None, noise_penalty: None, source: StringSource::ImportName, confidence: 1.0 }
and call FoundString::new passing the equivalent values (text =
import.name.clone(), original_text = None, encoding = Encoding::Utf8, offset =
0, rva = None, section = None, length = length, tags = Vec::new(), score = 0,
section_weight = None, semantic_boost = None, noise_penalty = None, source =
StringSource::ImportName, confidence = 1.0) so all FoundString constructions in
this file use the new constructor.

In `@src/types.rs`:
- Around line 257-280: Rename the field `section_weight` in the struct (in
src/types.rs) to a clearer name like `section_weight_contribution` or
`section_score_component` to avoid confusion with `SectionInfo.weight` (which is
an f32); update all references/usages of `section_weight` across the codebase
(serialization/deserialization, docs, tests) and keep the same serde attributes
and Option<i32> type and doc comment content (adjust field name inside the docs
to match), leaving `semantic_boost` and `noise_penalty` unchanged.
- Around line 455-473: The test helper create_test_found_string should construct
the FoundString via its public constructor/builder instead of a struct literal;
replace the literal in create_test_found_string() with FoundString::new(...) and
chain the appropriate builder methods (e.g., .original_text(None),
.encoding(Encoding::Ascii), .offset(0x1000), .rva(Some(0x2000)),
.section(Some(".rodata".to_string())), .length(11), .tags(vec![Tag::Url]),
.score(100), .section_weight(None), .semantic_boost(None), .noise_penalty(None),
.source(StringSource::SectionData), .confidence(0.85)) to produce the same test
values so the test exercises the constructor/builder API (adjust method names to
match the actual builder methods on FoundString).
- Around line 362-367: The with_confidence setter currently accepts any f32 but
the confidence field is documented as 0.0–1.0 and methods
is_high_confidence/is_low_confidence assume that range; update with_confidence
to enforce or validate the range — e.g. add a debug_assert!(0.0..=1.0 contains
&confidence) to catch invalid values in debug builds or clamp the value into
0.0..=1.0 before assigning — modify the pub fn with_confidence(mut self,
confidence: f32) -> Self to perform this validation/clamping and then set
self.confidence so other methods relying on the range remain correct.

coderabbitai · 2026-01-18T04:03:16Z

project_plan/tickets/Extend_FoundString_data_model_for_demangling_and_debug_support.md

+- [x] FoundString struct includes original_text field
+- [x] FoundString struct includes optional breakdown fields (section_weight, semantic_boost, noise_penalty)
+- [x] All fields properly serialize/deserialize with serde
+- [x] Existing tests updated and passing
+- [x] Documentation updated with field descriptions and usage examples
+- [x] No breaking changes to existing code that creates FoundString instances


⚠️ Potential issue | 🟡 Minor

Clarify the "no breaking changes" acceptance item.

#[non_exhaustive] prevents downstream struct literals, so this is a breaking change for external callers. Please reword to reflect the required migration to FoundString::new and builder methods.

Proposed wording

-- [x] No breaking changes to existing code that creates FoundString instances +- [x] Downstream code uses FoundString::new and builder methods; struct literal construction is no longer supported due to #[non_exhaustive]

🤖 Prompt for AI Agents

In `@project_plan/tickets/Extend_FoundString_data_model_for_demangling_and_debug_support.md` around lines 25 - 30, The acceptance checklist item claiming "No breaking changes to existing code that creates FoundString instances" is incorrect because adding #[non_exhaustive] makes struct literal construction a breaking change for external users; update the wording to acknowledge this and instruct downstream users to migrate to the provided constructor and builders by naming the symbols FoundString::new and the builder methods (e.g., FoundString::builder or similar) so callers switch from struct literals to the new API; also add a short migration note explaining that existing direct field initialization must be replaced with FoundString::new(...) or the builder methods to set optional fields.

coderabbitai · 2026-01-18T04:03:16Z

src/classification/mod.rs

 //! let found_string = FoundString {
 //!     text: "C:\\Windows\\System32\\cmd.exe".to_string(),
+//!     original_text: None,
 //!     encoding: Encoding::Ascii,
 //!     offset: 0,
 //!     rva: None,
 //!     section: None,
 //!     length: 27,
 //!     tags: Vec::new(),
 //!     score: 0,
+//!     section_weight: None,
+//!     semantic_boost: None,
+//!     noise_penalty: None,
 //!     source: StringSource::SectionData,
 //!     confidence: 1.0,
 //! };


🧹 Nitpick | 🔵 Trivial

Update doc example to use FoundString::new() builder pattern.

The PR introduces the constructor and builder methods as the preferred API. Documentation should demonstrate this pattern.

♻️ Suggested refactor

-//! let found_string = FoundString { -//! text: "C:\\Windows\\System32\\cmd.exe".to_string(), -//! original_text: None, -//! encoding: Encoding::Ascii, -//! offset: 0, -//! rva: None, -//! section: None, -//! length: 27, -//! tags: Vec::new(), -//! score: 0, -//! section_weight: None, -//! semantic_boost: None, -//! noise_penalty: None, -//! source: StringSource::SectionData, -//! confidence: 1.0, -//! }; +//! let found_string = FoundString::new( +//! "C:\\Windows\\System32\\cmd.exe".to_string(), +//! Encoding::Ascii, +//! 0, +//! 27, +//! StringSource::SectionData, +//! );

🤖 Prompt for AI Agents

In `@src/classification/mod.rs` around lines 31 - 46, The doc example currently constructs FoundString via struct literal; update it to use the new constructor and builder pattern by calling FoundString::new() and chaining the builder methods (e.g., .text(...), .encoding(...), .offset(...), .length(...), .source(...), .confidence(...), etc.) to set the same fields as the example so the docs show the preferred API (refer to FoundString::new and the associated builder method names in the implementation).

coderabbitai · 2026-01-18T04:03:16Z

src/classification/semantic.rs

 //! let found_string = FoundString {
 //!     text: "https://example.com/api".to_string(),
+//!     original_text: None,
 //!     encoding: Encoding::Ascii,
 //!     offset: 0,
 //!     rva: None,
 //!     section: None,
 //!     length: 24,
 //!     tags: Vec::new(),
 //!     score: 0,
+//!     section_weight: None,
+//!     semantic_boost: None,
+//!     noise_penalty: None,
 //!     source: StringSource::SectionData,
 //!     confidence: 1.0,
 //! };


⚠️ Potential issue | 🟡 Minor

Doc example uses struct literal instead of the new constructor/builder pattern.

Since FoundString is now #[non_exhaustive], external crate consumers cannot construct it via struct literals - they must use FoundString::new() and the builder methods. Doc examples should demonstrate the public API users will actually use.

Suggested fix

//! let classifier = SemanticClassifier::new(); -//! let found_string = FoundString { -//! text: "https://example.com/api".to_string(), -//! original_text: None, -//! encoding: Encoding::Ascii, -//! offset: 0, -//! rva: None, -//! section: None, -//! length: 24, -//! tags: Vec::new(), -//! score: 0, -//! section_weight: None, -//! semantic_boost: None, -//! noise_penalty: None, -//! source: StringSource::SectionData, -//! confidence: 1.0, -//! }; +//! let found_string = FoundString::new( +//! "https://example.com/api".to_string(), +//! Encoding::Ascii, +//! 0, +//! 24, +//! StringSource::SectionData, +//! );

🤖 Prompt for AI Agents

In `@src/classification/semantic.rs` around lines 21 - 36, The doc example is constructing FoundString via a struct literal but FoundString is now #[non_exhaustive] and must be created with the public constructor/builder API; update the example to call FoundString::new(...) and chain the available builder methods (e.g., .original_text(...), .encoding(...), .offset(...), .length(...), .tags(...), .source(...), .confidence(...), etc.) to set the same fields shown in the literal so the example uses the public FoundString::new and builder methods instead of a struct literal.

coderabbitai · 2026-01-18T04:03:16Z

src/classification/semantic.rs

    /// let found_string = FoundString {
    ///     text: "https://example.com".to_string(),
+    ///     original_text: None,
    ///     encoding: Encoding::Ascii,
    ///     offset: 0,
    ///     rva: None,
    ///     section: None,
    ///     length: 19,
    ///     tags: Vec::new(),
    ///     score: 0,
+    ///     section_weight: None,
+    ///     semantic_boost: None,
+    ///     noise_penalty: None,
    ///     source: StringSource::SectionData,
    ///     confidence: 1.0,
    /// };


⚠️ Potential issue | 🟡 Minor

Same issue: doc example should use constructor/builder pattern.

Apply the same fix as the module-level doc example for consistency and to accurately reflect the external API.

Suggested fix

//! let classifier = SemanticClassifier::new(); -//! let found_string = FoundString { -//! text: "https://example.com".to_string(), -//! original_text: None, -//! encoding: Encoding::Ascii, -//! offset: 0, -//! rva: None, -//! section: None, -//! length: 19, -//! tags: Vec::new(), -//! score: 0, -//! section_weight: None, -//! semantic_boost: None, -//! noise_penalty: None, -//! source: StringSource::SectionData, -//! confidence: 1.0, -//! }; +//! let found_string = FoundString::new( +//! "https://example.com".to_string(), +//! Encoding::Ascii, +//! 0, +//! 19, +//! StringSource::SectionData, +//! );

🤖 Prompt for AI Agents

In `@src/classification/semantic.rs` around lines 353 - 368, The doc example constructs a FoundString via direct struct literal; update it to use the public constructor/builder API (the same pattern used in the module-level example) so the example matches the external API surface. Locate the example that instantiates FoundString and replace the literal initialization with calls to FoundString::new(...) or the FoundStringBuilder (whichever the module-level docs use), setting text, encoding, offset, length, source, confidence and any optional fields via the builder methods to mirror the existing public constructor/builder usage.

coderabbitai · 2026-01-18T04:03:16Z

src/extraction/ascii.rs

                    strings.push(FoundString {
                        text,
+                        original_text: None,
                        encoding: Encoding::Ascii,
                        offset: start as u64,
                        rva: None,
                        section: None,
                        length: len as u32,
                        tags: Vec::new(),
                        score: 0,
+                        section_weight: None,
+                        semantic_boost: None,
+                        noise_penalty: None,
                        source: StringSource::SectionData,
                        confidence: 1.0,
                    });


🧹 Nitpick | 🔵 Trivial

Prefer FoundString::new to avoid repeating defaults.

These three literals duplicate the default field values and will need updates every time FoundString evolves. Using the constructor keeps defaults in one place. Apply the same pattern at all three creation sites.

Proposed refactor

- strings.push(FoundString { - text, - original_text: None, - encoding: Encoding::Ascii, - offset: start as u64, - rva: None, - section: None, - length: len as u32, - tags: Vec::new(), - score: 0, - section_weight: None, - semantic_boost: None, - noise_penalty: None, - source: StringSource::SectionData, - confidence: 1.0, - }); + strings.push(FoundString::new( + text, + Encoding::Ascii, + start as u64, + len as u32, + StringSource::SectionData, + ));

Also applies to: 265-280, 285-300

🤖 Prompt for AI Agents

In `@src/extraction/ascii.rs` around lines 231 - 246, Replace the repeated FoundString struct literals in src/extraction/ascii.rs with calls to FoundString::new so you don't duplicate default values; for each creation site (the three places constructing FoundString with many None/Vec::new/0/1.0 literals) call FoundString::new and pass only the varying fields (e.g., text, offset, length, Encoding::Ascii and StringSource::SectionData or whatever differs) and let the constructor provide defaults for fields like original_text, rva, section, tags, score, section_weight, semantic_boost, noise_penalty, and confidence; update all three creation sites (the three struct literal blocks) to use FoundString::new consistently.

coderabbitai · 2026-01-18T04:03:17Z

src/extraction/mod.rs

                all_strings.push(FoundString {
                    text: import.name.clone(),
+                    original_text: None,
                    encoding: Encoding::Utf8,
                    offset: 0,
                    rva: None,
                    section: None,
                    length,
                    tags: Vec::new(),
                    score: 0,
+                    section_weight: None,
+                    semantic_boost: None,
+                    noise_penalty: None,
                    source: StringSource::ImportName,
                    confidence: 1.0,
                });


🧹 Nitpick | 🔵 Trivial

Use FoundString::new() for import name construction.

Struct literal used instead of the constructor introduced in this PR. This applies to all FoundString constructions in this file.

♻️ Suggested refactor

- all_strings.push(FoundString { - text: import.name.clone(), - original_text: None, - encoding: Encoding::Utf8, - offset: 0, - rva: None, - section: None, - length, - tags: Vec::new(), - score: 0, - section_weight: None, - semantic_boost: None, - noise_penalty: None, - source: StringSource::ImportName, - confidence: 1.0, - }); + all_strings.push(FoundString::new( + import.name.clone(), + Encoding::Utf8, + 0, + length, + StringSource::ImportName, + ));

🤖 Prompt for AI Agents

In `@src/extraction/mod.rs` around lines 484 - 499, Replace the inline struct literal used when pushing import names into all_strings with the new constructor FoundString::new(...): locate the push that currently builds FoundString { text: import.name.clone(), original_text: None, encoding: Encoding::Utf8, offset: 0, rva: None, section: None, length, tags: Vec::new(), score: 0, section_weight: None, semantic_boost: None, noise_penalty: None, source: StringSource::ImportName, confidence: 1.0 } and call FoundString::new passing the equivalent values (text = import.name.clone(), original_text = None, encoding = Encoding::Utf8, offset = 0, rva = None, section = None, length = length, tags = Vec::new(), score = 0, section_weight = None, semantic_boost = None, noise_penalty = None, source = StringSource::ImportName, confidence = 1.0) so all FoundString constructions in this file use the new constructor.

coderabbitai · 2026-01-18T04:03:17Z

src/extraction/mod.rs

                let found_string = FoundString {
                    text,
+                    original_text: None,
                    encoding,
                    offset: absolute_offset,
                    rva,
                    section: Some(section.name.clone()),
                    length: length as u32,
                    tags: Vec::new(),
                    score: 0,
+                    section_weight: None,
+                    semantic_boost: None,
+                    noise_penalty: None,
                    source: StringSource::SectionData,
                    confidence,
                };


🧹 Nitpick | 🔵 Trivial

Consistent: UTF-8 extraction also needs constructor refactor.

Same struct literal pattern in the UTF-8 extraction path.

🤖 Prompt for AI Agents

In `@src/extraction/mod.rs` around lines 765 - 780, The UTF-8 extraction path is still creating FoundString with a struct literal; change it to use the refactored FoundString constructor used in the other extraction path (e.g., FoundString::new or the project's equivalent) instead of the literal. Construct the FoundString with the same values: text, original_text = None, encoding, offset = absolute_offset, rva, section = section.name.clone(), length = length as u32, tags = Vec::new(), score = 0, section_weight = None, semantic_boost = None, noise_penalty = None, source = StringSource::SectionData, and confidence; ensure the call matches the constructor signature used elsewhere so both UTF-8 and other extraction paths share the same creation code for consistency.

coderabbitai · 2026-01-18T04:03:17Z

src/types.rs

+    /// Section weight contribution to the final score (debug only)
+    ///
+    /// When debug mode is enabled, this field contains the weight assigned based on
+    /// the section where the string was found. Higher weights indicate sections more
+    /// likely to contain meaningful strings (e.g., .rodata vs .text). This is `None`
+    /// unless explicitly populated by the ranking system in debug mode.
+    #[serde(skip_serializing_if = "Option::is_none", default)]
+    pub section_weight: Option<i32>,
+    /// Semantic classification boost to the final score (debug only)
+    ///
+    /// When debug mode is enabled, this field contains the score boost applied based on
+    /// semantic tags (URLs, file paths, GUIDs, etc.). Strings with valuable semantic
+    /// meaning receive positive boosts. This is `None` unless explicitly populated by
+    /// the ranking system in debug mode.
+    #[serde(skip_serializing_if = "Option::is_none", default)]
+    pub semantic_boost: Option<i32>,
+    /// Noise penalty applied to the final score (debug only)
+    ///
+    /// When debug mode is enabled, this field contains the penalty applied for noise
+    /// characteristics (low confidence, repetitive patterns, etc.). Higher penalties
+    /// indicate strings more likely to be noise. This is `None` unless explicitly
+    /// populated by the ranking system in debug mode.
+    #[serde(skip_serializing_if = "Option::is_none", default)]
+    pub noise_penalty: Option<i32>,


🧹 Nitpick | 🔵 Trivial

Field naming could be more precise to avoid confusion with SectionInfo.weight.

section_weight as Option<i32> stores the score contribution derived from the section's weight, while SectionInfo.weight is f32. The doc comments clarify this, but consider names like section_weight_contribution or section_score_component to make the distinction explicit. Low priority - current naming works with the documentation.

🤖 Prompt for AI Agents

In `@src/types.rs` around lines 257 - 280, Rename the field `section_weight` in the struct (in src/types.rs) to a clearer name like `section_weight_contribution` or `section_score_component` to avoid confusion with `SectionInfo.weight` (which is an f32); update all references/usages of `section_weight` across the codebase (serialization/deserialization, docs, tests) and keep the same serde attributes and Option<i32> type and doc comment content (adjust field name inside the docs to match), leaving `semantic_boost` and `noise_penalty` unchanged.

coderabbitai · 2026-01-18T04:03:17Z

src/types.rs

+    /// Sets the confidence
+    #[must_use]
+    pub fn with_confidence(mut self, confidence: f32) -> Self {
+        self.confidence = confidence;
+        self
+    }


⚠️ Potential issue | 🟡 Minor

with_confidence does not validate the documented 0.0-1.0 range.

The confidence field is documented as "(0.0-1.0)" and is_high_confidence/is_low_confidence methods assume this range. Consider adding a debug assertion or clamping to prevent invalid values.

Suggested fix with debug assertion

/// Sets the confidence #[must_use] pub fn with_confidence(mut self, confidence: f32) -> Self { + debug_assert!( + (0.0..=1.0).contains(&confidence), + "confidence must be in range 0.0-1.0, got {confidence}" + ); self.confidence = confidence; self }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Sets the confidence

#[must_use]

pub fn with_confidence(mut self, confidence: f32) -> Self {

self.confidence = confidence;

self

}

/// Sets the confidence

#[must_use]

pub fn with_confidence(mut self, confidence: f32) -> Self {

debug_assert!(

(0.0..=1.0).contains(&confidence),

"confidence must be in range 0.0-1.0, got {confidence}"

);

self.confidence = confidence;

self

}

🤖 Prompt for AI Agents

In `@src/types.rs` around lines 362 - 367, The with_confidence setter currently accepts any f32 but the confidence field is documented as 0.0–1.0 and methods is_high_confidence/is_low_confidence assume that range; update with_confidence to enforce or validate the range — e.g. add a debug_assert!(0.0..=1.0 contains &confidence) to catch invalid values in debug builds or clamp the value into 0.0..=1.0 before assigning — modify the pub fn with_confidence(mut self, confidence: f32) -> Self to perform this validation/clamping and then set self.confidence so other methods relying on the range remain correct.

coderabbitai · 2026-01-18T04:03:17Z

src/types.rs

+    /// Creates a test FoundString with all optional fields set to None
+    fn create_test_found_string() -> FoundString {
+        FoundString {
+            text: "test_string".to_string(),
+            original_text: None,
+            encoding: Encoding::Ascii,
+            offset: 0x1000,
+            rva: Some(0x2000),
+            section: Some(".rodata".to_string()),
+            length: 11,
+            tags: vec![Tag::Url],
+            score: 100,
+            section_weight: None,
+            semantic_boost: None,
+            noise_penalty: None,
+            source: StringSource::SectionData,
+            confidence: 0.85,
+        }
+    }


🧹 Nitpick | 🔵 Trivial

Test helper uses struct literal instead of constructor.

For consistency with the PR's goal of promoting the constructor/builder pattern, consider using FoundString::new() with builder methods. This would also verify the API works correctly.

Suggested fix

fn create_test_found_string() -> FoundString { - FoundString { - text: "test_string".to_string(), - original_text: None, - encoding: Encoding::Ascii, - offset: 0x1000, - rva: Some(0x2000), - section: Some(".rodata".to_string()), - length: 11, - tags: vec![Tag::Url], - score: 100, - section_weight: None, - semantic_boost: None, - noise_penalty: None, - source: StringSource::SectionData, - confidence: 0.85, - } + FoundString::new( + "test_string".to_string(), + Encoding::Ascii, + 0x1000, + 11, + StringSource::SectionData, + ) + .with_rva(0x2000) + .with_section(".rodata".to_string()) + .with_tags(vec![Tag::Url]) + .with_score(100) + .with_confidence(0.85) }

🤖 Prompt for AI Agents

In `@src/types.rs` around lines 455 - 473, The test helper create_test_found_string should construct the FoundString via its public constructor/builder instead of a struct literal; replace the literal in create_test_found_string() with FoundString::new(...) and chain the appropriate builder methods (e.g., .original_text(None), .encoding(Encoding::Ascii), .offset(0x1000), .rva(Some(0x2000)), .section(Some(".rodata".to_string())), .length(11), .tags(vec![Tag::Url]), .score(100), .section_weight(None), .semantic_boost(None), .noise_penalty(None), .source(StringSource::SectionData), .confidence(0.85)) to produce the same test values so the test exercises the constructor/builder API (adjust method names to match the actual builder methods on FoundString).

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @.github/workflows/ci.yml:
- Around line 185-206: The CI gate script currently checks results for quality,
msrv, test, test-cross-platform, and coverage but omits the changes job,
allowing failures there to be masked; update the Check job results step to also
evaluate needs.changes.result and fail the job if it equals "failure" or
"cancelled" (mirror the existing pattern used for quality and msrv), and emit a
clear error message like "::error::Changes job failed" when that condition is
met so the overall gate correctly fails on changes job errors.
- Around line 190-215: The workflow uses invalid dot-notation for a hyphenated
job id (needs.test-cross-platform.result); update all accesses of that job to
bracket notation (needs['test-cross-platform'].result), including the initial
echo and the subsequent if condition that checks its value; also scan for any
other hyphenated job IDs (e.g., any occurrence of needs.<hyphenated-id>.result)
and convert them to needs['<hyphenated-id>'].result to avoid runtime evaluation
errors.

coderabbitai · 2026-01-18T04:09:39Z

.github/workflows/ci.yml

+    needs: [quality, msrv, test, test-cross-platform, coverage]
+    if: always()
+    steps:
+      - name: Check job results
+        run: |
+          echo "Quality: ${{ needs.quality.result }}"
+          echo "MSRV: ${{ needs.msrv.result }}"
+          echo "Test: ${{ needs.test.result }}"
+          echo "Test Cross-Platform: ${{ needs.test-cross-platform.result }}"
+          echo "Coverage: ${{ needs.coverage.result }}"
+
+          # Fail if any required job failed
+          # Jobs that were skipped (due to path filters) are OK
+          if [[ "${{ needs.quality.result }}" == "failure" || "${{ needs.quality.result }}" == "cancelled" ]]; then
+            echo "::error::Quality checks failed"
+            exit 1
+          fi
+
+          if [[ "${{ needs.msrv.result }}" == "failure" || "${{ needs.msrv.result }}" == "cancelled" ]]; then
+            echo "::error::MSRV check failed"
+            exit 1
+          fi


⚠️ Potential issue | 🟠 Major

Include changes in the gate to avoid masked failures.
If the changes job fails, dependent jobs are skipped and the gate still passes because it never checks changes. This can hide CI breakage.

✅ Suggested fix

- needs: [quality, msrv, test, test-cross-platform, coverage] + needs: [changes, quality, msrv, test, test-cross-platform, coverage] ... - echo "Quality: ${{ needs.quality.result }}" + echo "Changes: ${{ needs.changes.result }}" + echo "Quality: ${{ needs.quality.result }}" ... + if [[ "${{ needs.changes.result }}" == "failure" || "${{ needs.changes.result }}" == "cancelled" ]]; then + echo "::error::Changes filter failed" + exit 1 + fi

🤖 Prompt for AI Agents

In @.github/workflows/ci.yml around lines 185 - 206, The CI gate script currently checks results for quality, msrv, test, test-cross-platform, and coverage but omits the changes job, allowing failures there to be masked; update the Check job results step to also evaluate needs.changes.result and fail the job if it equals "failure" or "cancelled" (mirror the existing pattern used for quality and msrv), and emit a clear error message like "::error::Changes job failed" when that condition is met so the overall gate correctly fails on changes job errors.

coderabbitai · 2026-01-18T04:09:39Z

.github/workflows/ci.yml

+          echo "Quality: ${{ needs.quality.result }}"
+          echo "MSRV: ${{ needs.msrv.result }}"
+          echo "Test: ${{ needs.test.result }}"
+          echo "Test Cross-Platform: ${{ needs.test-cross-platform.result }}"
+          echo "Coverage: ${{ needs.coverage.result }}"
+
+          # Fail if any required job failed
+          # Jobs that were skipped (due to path filters) are OK
+          if [[ "${{ needs.quality.result }}" == "failure" || "${{ needs.quality.result }}" == "cancelled" ]]; then
+            echo "::error::Quality checks failed"
+            exit 1
+          fi
+
+          if [[ "${{ needs.msrv.result }}" == "failure" || "${{ needs.msrv.result }}" == "cancelled" ]]; then
+            echo "::error::MSRV check failed"
+            exit 1
+          fi
+
+          if [[ "${{ needs.test.result }}" == "failure" || "${{ needs.test.result }}" == "cancelled" ]]; then
+            echo "::error::Tests failed"
+            exit 1
+          fi
+
+          if [[ "${{ needs.test-cross-platform.result }}" == "failure" || "${{ needs.test-cross-platform.result }}" == "cancelled" ]]; then
+            echo "::error::Cross-platform tests failed"
+            exit 1


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

GitHub Actions needs context job id hyphen dot notation bracket notation

💡 Result:

Short answer: use bracket notation when the identifier contains characters that can't be used in dot-notation (or when it begins with a number). Job IDs themselves must start with a letter or underscore and may contain letters, digits, hyphens and underscores; but when you reference a context/property whose name contains a hyphen/dot/other special char, use bracket/string indexing.

Key facts and examples:

Job-id rules (what you may name a job): must start with a letter or _ and contain only alphanumeric characters, -, or _. [1]

When an expression needs to access a property whose key contains a character not valid for dot notation, use bracket notation: example usages:

needs['my-job-with-hyphen'].outputs.some_output

job.services['my-service'].id

github['some-context-with-hyphen'] (general form)
These follow GitHub Actions expression/object-indexing behavior described in the contexts/docs. [1][2]

Example:

jobs:
build-and-test:
runs-on: ubuntu-latest
needs: []

Another job referencing output:
deploy:
needs: ['build-and-test']
if: needs['build-and-test'].outputs.artifact_ready == 'true'

Sources:

Workflow syntax / jobs and job_id rules. [1]

Contexts and expression syntax (job/steps/services access and example of using object keys). [2]

🏁 Script executed:

head -n 220 .github/workflows/ci.yml | tail -n 35

Repository: EvilBit-Labs/Stringy

Length of output: 1569

Fix context access syntax for hyphenated job IDs.

needs.test-cross-platform.result uses invalid dot notation for a job ID containing a hyphen. GitHub Actions expressions require bracket notation to access properties with hyphens in their names. The workflow will fail at runtime when evaluating these expressions.

Suggested fix

- echo "Test Cross-Platform: ${{ needs.test-cross-platform.result }}" + echo "Test Cross-Platform: ${{ needs['test-cross-platform'].result }}" # Fail if any required job failed # Jobs that were skipped (due to path filters) are OK - if [[ "${{ needs.test-cross-platform.result }}" == "failure" || "${{ needs.test-cross-platform.result }}" == "cancelled" ]]; then + if [[ "${{ needs['test-cross-platform'].result }}" == "failure" || "${{ needs['test-cross-platform'].result }}" == "cancelled" ]]; then echo "::error::Cross-platform tests failed" exit 1 fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "Quality: ${{ needs.quality.result }}"

echo "MSRV: ${{ needs.msrv.result }}"

echo "Test: ${{ needs.test.result }}"

echo "Test Cross-Platform: ${{ needs.test-cross-platform.result }}"

echo "Coverage: ${{ needs.coverage.result }}"

# Fail if any required job failed

# Jobs that were skipped (due to path filters) are OK

if [[ "${{ needs.quality.result }}" == "failure" || "${{ needs.quality.result }}" == "cancelled" ]]; then

echo "::error::Quality checks failed"

exit 1

fi

if [[ "${{ needs.msrv.result }}" == "failure" || "${{ needs.msrv.result }}" == "cancelled" ]]; then

echo "::error::MSRV check failed"

exit 1

fi

if [[ "${{ needs.test.result }}" == "failure" || "${{ needs.test.result }}" == "cancelled" ]]; then

echo "::error::Tests failed"

exit 1

fi

if [[ "${{ needs.test-cross-platform.result }}" == "failure" || "${{ needs.test-cross-platform.result }}" == "cancelled" ]]; then

echo "::error::Cross-platform tests failed"

exit 1

echo "Quality: ${{ needs.quality.result }}"

echo "MSRV: ${{ needs.msrv.result }}"

echo "Test: ${{ needs.test.result }}"

echo "Test Cross-Platform: ${{ needs['test-cross-platform'].result }}"

echo "Coverage: ${{ needs.coverage.result }}"

# Fail if any required job failed

# Jobs that were skipped (due to path filters) are OK

if [[ "${{ needs.quality.result }}" == "failure" || "${{ needs.quality.result }}" == "cancelled" ]]; then

echo "::error::Quality checks failed"

exit 1

fi

if [[ "${{ needs.msrv.result }}" == "failure" || "${{ needs.msrv.result }}" == "cancelled" ]]; then

echo "::error::MSRV check failed"

exit 1

fi

if [[ "${{ needs.test.result }}" == "failure" || "${{ needs.test.result }}" == "cancelled" ]]; then

echo "::error::Tests failed"

exit 1

fi

if [[ "${{ needs['test-cross-platform'].result }}" == "failure" || "${{ needs['test-cross-platform'].result }}" == "cancelled" ]]; then

echo "::error::Cross-platform tests failed"

exit 1

🤖 Prompt for AI Agents

In @.github/workflows/ci.yml around lines 190 - 215, The workflow uses invalid dot-notation for a hyphenated job id (needs.test-cross-platform.result); update all accesses of that job to bracket notation (needs['test-cross-platform'].result), including the initial echo and the subsequent if condition that checks its value; also scan for any other hyphenated job IDs (e.g., any occurrence of needs.<hyphenated-id>.result) and convert them to needs['<hyphenated-id>'].result to avoid runtime evaluation errors.

codecov · 2026-01-18T04:10:47Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

unclesp1d3r and others added 8 commits January 17, 2026 21:55

feat(prompts): add continuous integration and simplicity review prompts

af90065

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

chore(docs): update code blocks to use text formatting

b896a6c

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

chore(docs): update code blocks to use text formatting

f8dc568

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

chore(docs): update output format code blocks to text

d60eed2

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

chore(docs): update code blocks to use text formatting

0efc334

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

feat(types): add original_text field to FoundString struct

3fff6aa

- Introduced `original_text` field to preserve pre-demangled text. - Updated related extraction and classification modules to initialize this field. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

Copilot AI review requested due to automatic review settings January 18, 2026 03:50

Copilot started reviewing on behalf of unclesp1d3r January 18, 2026 03:51 View session

coderabbitai bot added the enhancement New feature or request label Jan 18, 2026

Copilot AI reviewed Jan 18, 2026

View reviewed changes

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

coderabbitai bot added the documentation Improvements or additions to documentation label Jan 18, 2026

chore(ci): disable ARM runners due to unavailability

ee01259

Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

unclesp1d3r merged commit 0f16b92 into main Jan 18, 2026
20 of 21 checks passed

unclesp1d3r deleted the enhance_foundstring_model branch January 18, 2026 04:14

This was referenced Jan 18, 2026

feat(classification): add symbol demangling for Rust #124

Merged

feat: Implement enum-based output formatters (Table, JSON, YARA) #127

Merged

coderabbitai bot mentioned this pull request Jan 25, 2026

feat(classification): add semantic pattern matching for GUID, email, base64, format strings, and user agents #131

Merged

Uh oh!

Conversation

unclesp1d3r commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Source Changes

Test Changes

Documentation Changes

Configuration Changes

Why These Changes

1. API Stability (#[non_exhaustive])

2. Demangling Support (original_text)

3. Debug Score Breakdown

4. Builder Pattern

Type of Change

Technical Implementation

New FoundString API

Serde Behavior

How Has This Been Tested?

Unit Tests

Integration Tests

CI Validation

Breaking Changes

Risk Assessment

Review Checklist

General

Code Quality

Testing

Documentation

Rust-Specific

Dependencies

Files Changed Summary

Uh oh!

coderabbitai bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Pull Request Summary: Extend FoundString Data Model with Non-Exhaustive and Builder Pattern

Architectural & API Changes

Binary Format & Extraction Impact

Semantic Classification & Debugging Accuracy

Backwards Compatibility & Serialization

Tests, CI & Surface Area

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 18, 2026

unclesp1d3r commented Jan 18, 2026 •

edited

Loading

1. API Stability (`#[non_exhaustive]`)

2. Demangling Support (`original_text`)

coderabbitai bot commented Jan 18, 2026 •

edited

Loading