feat(types): extend FoundString data model with non_exhaustive and builder pattern#123
feat(types): extend FoundString data model with non_exhaustive and builder pattern#123unclesp1d3r merged 10 commits intomainfrom
Conversation
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
- Introduced `original_text` field to preserve pre-demangled text. - Updated related extraction and classification modules to initialize this field. Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
Added unit tests to verify serialization and deserialization of the FoundString struct, focusing on the handling of optional fields (original_text, section_weight, semantic_boost, noise_penalty). Updated the related project plan ticket to mark acceptance criteria as complete. Also added .serena project configuration files.
Add #[non_exhaustive] attribute to FoundString struct to comply with AGENTS.md guidelines for public API structs. This prevents external code from using struct literal syntax, ensuring forward compatibility when new fields are added. Add constructor and builder methods: - new() constructor with required fields - with_rva(), with_section(), with_tags(), with_score() - with_confidence(), with_original_text() - with_section_weight(), with_semantic_boost(), with_noise_penalty() Update test files to use the new constructor/builder pattern. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughPull Request Summary: Extend FoundString Data Model with Non-Exhaustive and Builder PatternArchitectural & API Changes
Binary Format & Extraction Impact
Semantic Classification & Debugging Accuracy
Backwards Compatibility & Serialization
Tests, CI & Surface Area
WalkthroughThis PR expands the public data model (FoundString) with four optional metadata fields and builder APIs, updates all extraction and test sites to use the new constructors/fields, adds ContainerInfo constructor fields, introduces CI/prompts and a Serena project config, and converts many documentation code fences from Changes
Sequence Diagram(s)(omitted — changes are data-model and initialization updates; no new multi-component sequential control flow introduced) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 4❌ Failed checks (4 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR extends the FoundString data model to support future demangling functionality and debug-mode score breakdowns. It adds #[non_exhaustive] to the struct and implements a constructor/builder pattern for external API consumers.
Changes:
- Added
#[non_exhaustive]attribute toFoundStringwith constructor and builder methods for forward compatibility - Added four new optional fields:
original_text,section_weight,semantic_boost, andnoise_penalty - Updated all struct instantiations throughout the codebase to include the new fields (initialized to
None) - Changed documentation code blocks from
rust` totext` (reduces syntax highlighting)
Reviewed changes
Copilot reviewed 26 out of 27 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/types.rs | Core changes: added #[non_exhaustive], new fields, constructor, builder methods, and comprehensive serde tests |
| src/extraction/ascii.rs | Updated struct literals to include new optional fields |
| src/extraction/utf16.rs | Updated struct literals to include new optional fields |
| src/extraction/pe_resources.rs | Updated struct literals to include new optional fields |
| src/extraction/mod.rs | Updated struct literals to include new optional fields |
| src/extraction/macho_load_commands.rs | Updated struct literals to include new optional fields |
| src/extraction/dedup.rs | Updated struct literals and to_found_string() conversion |
| src/classification/semantic.rs | Updated documentation examples with new fields |
| src/classification/mod.rs | Updated documentation examples with new fields |
| tests/test_deduplication.rs | Updated tests to use new builder pattern |
| tests/classification_integration.rs | Updated helper to use new constructor |
| docs/src/*.md | Changed code block markers from rust to text |
| .github/copilot-instructions.md | Added complete example of #[non_exhaustive] pattern |
| Cargo.toml | Minor formatting cleanup |
| project_plan/tickets/*.md | Marked acceptance criteria as complete |
| .serena/* | New tool configuration files (unrelated to ticket) |
| .github/prompts/* | New AI agent prompt files (unrelated to ticket) |
| ### Example Output | ||
|
|
||
| ```json | ||
| ```text |
There was a problem hiding this comment.
Throughout the documentation files, Rust code blocks have been changed from rust` to text, removing syntax highlighting. This significantly degrades the developer experience by removing color coding and syntax support in documentation viewers. Unless there's a specific technical reason for this change (e.g., incompatibility with a documentation tool), these should remain as ````rust blocks. This affects all documentation in docs/src/ including api.md, architecture.md, binary-formats.md, classification.md, performance.md, and string-extraction.md.
| ```text | |
| ```json |
| # list of languages for which language servers are started; choose from: | ||
| # al bash clojure cpp csharp | ||
| # csharp_omnisharp dart elixir elm erlang | ||
| # fortran fsharp go groovy haskell | ||
| # java julia kotlin lua markdown | ||
| # matlab nix pascal perl php | ||
| # powershell python python_jedi r rego | ||
| # ruby ruby_solargraph rust scala swift | ||
| # terraform toml typescript typescript_vts vue | ||
| # yaml zig | ||
| # (This list may be outdated. For the current list, see values of Language enum here: | ||
| # https://github.com/oraios/serena/blob/main/src/solidlsp/ls_config.py | ||
| # For some languages, there are alternative language servers, e.g. csharp_omnisharp, ruby_solargraph.) | ||
| # Note: | ||
| # - For C, use cpp | ||
| # - For JavaScript, use typescript | ||
| # - For Free Pascal/Lazarus, use pascal | ||
| # Special requirements: | ||
| # - csharp: Requires the presence of a .sln file in the project folder. | ||
| # - pascal: Requires Free Pascal Compiler (fpc) and optionally Lazarus. | ||
| # When using multiple languages, the first language server that supports a given file will be used for that file. | ||
| # The first language is the default language and the respective language server will be used as a fallback. | ||
| # Note that when using the JetBrains backend, language servers are not used and this list is correspondingly ignored. | ||
| languages: | ||
| - rust | ||
|
|
||
| # the encoding used by text files in the project | ||
| # For a list of possible encodings, see https://docs.python.org/3.11/library/codecs.html#standard-encodings | ||
| encoding: "utf-8" | ||
|
|
||
| # whether to use project's .gitignore files to ignore files | ||
| ignore_all_files_in_gitignore: true | ||
|
|
||
| # list of additional paths to ignore in all projects | ||
| # same syntax as gitignore, so you can use * and ** | ||
| ignored_paths: [] | ||
|
|
||
| # whether the project is in read-only mode | ||
| # If set to true, all editing tools will be disabled and attempts to use them will result in an error | ||
| # Added on 2025-04-18 | ||
| read_only: false | ||
|
|
||
| # list of tool names to exclude. We recommend not excluding any tools, see the readme for more details. | ||
| # Below is the complete list of tools for convenience. | ||
| # To make sure you have the latest list of tools, and to view their descriptions, | ||
| # execute `uv run scripts/print_tool_overview.py`. | ||
| # | ||
| # * `activate_project`: Activates a project by name. | ||
| # * `check_onboarding_performed`: Checks whether project onboarding was already performed. | ||
| # * `create_text_file`: Creates/overwrites a file in the project directory. | ||
| # * `delete_lines`: Deletes a range of lines within a file. | ||
| # * `delete_memory`: Deletes a memory from Serena's project-specific memory store. | ||
| # * `execute_shell_command`: Executes a shell command. | ||
| # * `find_referencing_code_snippets`: Finds code snippets in which the symbol at the given location is referenced. | ||
| # * `find_referencing_symbols`: Finds symbols that reference the symbol at the given location (optionally filtered by type). | ||
| # * `find_symbol`: Performs a global (or local) search for symbols with/containing a given name/substring (optionally filtered by type). | ||
| # * `get_current_config`: Prints the current configuration of the agent, including the active and available projects, tools, contexts, and modes. | ||
| # * `get_symbols_overview`: Gets an overview of the top-level symbols defined in a given file. | ||
| # * `initial_instructions`: Gets the initial instructions for the current project. | ||
| # Should only be used in settings where the system prompt cannot be set, | ||
| # e.g. in clients you have no control over, like Claude Desktop. | ||
| # * `insert_after_symbol`: Inserts content after the end of the definition of a given symbol. | ||
| # * `insert_at_line`: Inserts content at a given line in a file. | ||
| # * `insert_before_symbol`: Inserts content before the beginning of the definition of a given symbol. | ||
| # * `list_dir`: Lists files and directories in the given directory (optionally with recursion). | ||
| # * `list_memories`: Lists memories in Serena's project-specific memory store. | ||
| # * `onboarding`: Performs onboarding (identifying the project structure and essential tasks, e.g. for testing or building). | ||
| # * `prepare_for_new_conversation`: Provides instructions for preparing for a new conversation (in order to continue with the necessary context). | ||
| # * `read_file`: Reads a file within the project directory. | ||
| # * `read_memory`: Reads the memory with the given name from Serena's project-specific memory store. | ||
| # * `remove_project`: Removes a project from the Serena configuration. | ||
| # * `replace_lines`: Replaces a range of lines within a file with new content. | ||
| # * `replace_symbol_body`: Replaces the full definition of a symbol. | ||
| # * `restart_language_server`: Restarts the language server, may be necessary when edits not through Serena happen. | ||
| # * `search_for_pattern`: Performs a search for a pattern in the project. | ||
| # * `summarize_changes`: Provides instructions for summarizing the changes made to the codebase. | ||
| # * `switch_modes`: Activates modes by providing a list of their names | ||
| # * `think_about_collected_information`: Thinking tool for pondering the completeness of collected information. | ||
| # * `think_about_task_adherence`: Thinking tool for determining whether the agent is still on track with the current task. | ||
| # * `think_about_whether_you_are_done`: Thinking tool for determining whether the task is truly completed. | ||
| # * `write_memory`: Writes a named memory (for future reference) to Serena's project-specific memory store. | ||
| excluded_tools: [] | ||
|
|
||
| # initial prompt for the project. It will always be given to the LLM upon activating the project | ||
| # (contrary to the memories, which are loaded on demand). | ||
| initial_prompt: "" | ||
|
|
||
| project_name: "Stringy" | ||
| included_optional_tools: [] |
There was a problem hiding this comment.
This PR includes unrelated infrastructure files (.serena/project.yml, .serena/.gitignore, .github/prompts/cicheck.prompt.md, .github/prompts/simplicity-review.prompt.md) that are not part of the ticket's scope to extend the FoundString data model. These should be in a separate PR focused on development tooling setup.
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
|
|
||
| /// Creates a test FoundString with all optional fields set to None | ||
| fn create_test_found_string() -> FoundString { | ||
| FoundString { | ||
| text: "test_string".to_string(), | ||
| original_text: None, | ||
| encoding: Encoding::Ascii, | ||
| offset: 0x1000, | ||
| rva: Some(0x2000), | ||
| section: Some(".rodata".to_string()), | ||
| length: 11, | ||
| tags: vec![Tag::Url], | ||
| score: 100, | ||
| section_weight: None, | ||
| semantic_boost: None, | ||
| noise_penalty: None, | ||
| source: StringSource::SectionData, | ||
| confidence: 0.85, | ||
| } | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_found_string_serde_optional_fields_none() { | ||
| // Test that optional fields are skipped when None | ||
| let found_string = create_test_found_string(); | ||
| let json = serde_json::to_string(&found_string).expect("Serialization failed"); | ||
|
|
||
| // Verify optional fields are not present in JSON | ||
| assert!(!json.contains("original_text")); | ||
| assert!(!json.contains("section_weight")); | ||
| assert!(!json.contains("semantic_boost")); | ||
| assert!(!json.contains("noise_penalty")); | ||
|
|
||
| // Verify required fields are present | ||
| assert!(json.contains("text")); | ||
| assert!(json.contains("encoding")); | ||
| assert!(json.contains("offset")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_found_string_serde_optional_fields_some() { | ||
| // Test that optional fields are included when Some | ||
| let mut found_string = create_test_found_string(); | ||
| found_string.original_text = Some("_ZN4test6mangled".to_string()); | ||
| found_string.section_weight = Some(50); | ||
| found_string.semantic_boost = Some(25); | ||
| found_string.noise_penalty = Some(-10); | ||
|
|
||
| let json = serde_json::to_string(&found_string).expect("Serialization failed"); | ||
|
|
||
| // Verify optional fields are present in JSON | ||
| assert!(json.contains("original_text")); | ||
| assert!(json.contains("_ZN4test6mangled")); | ||
| assert!(json.contains("section_weight")); | ||
| assert!(json.contains("semantic_boost")); | ||
| assert!(json.contains("noise_penalty")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_found_string_serde_roundtrip() { | ||
| // Test serialization/deserialization roundtrip with all fields | ||
| let mut found_string = create_test_found_string(); | ||
| found_string.original_text = Some("mangled_name".to_string()); | ||
| found_string.section_weight = Some(75); | ||
| found_string.semantic_boost = Some(30); | ||
| found_string.noise_penalty = Some(-5); | ||
|
|
||
| let json = serde_json::to_string(&found_string).expect("Serialization failed"); | ||
| let deserialized: FoundString = | ||
| serde_json::from_str(&json).expect("Deserialization failed"); | ||
|
|
||
| assert_eq!(found_string.text, deserialized.text); | ||
| assert_eq!(found_string.original_text, deserialized.original_text); | ||
| assert_eq!(found_string.section_weight, deserialized.section_weight); | ||
| assert_eq!(found_string.semantic_boost, deserialized.semantic_boost); | ||
| assert_eq!(found_string.noise_penalty, deserialized.noise_penalty); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_found_string_deserialize_missing_optional_fields() { | ||
| // Test that missing optional fields default to None during deserialization | ||
| let json = r#"{ | ||
| "text": "test", | ||
| "encoding": "Ascii", | ||
| "offset": 0, | ||
| "rva": null, | ||
| "section": null, | ||
| "length": 4, | ||
| "tags": [], | ||
| "score": 0, | ||
| "source": "SectionData", | ||
| "confidence": 1.0 | ||
| }"#; | ||
|
|
||
| let deserialized: FoundString = serde_json::from_str(json).expect("Deserialization failed"); | ||
|
|
||
| assert_eq!(deserialized.text, "test"); | ||
| assert_eq!(deserialized.original_text, None); | ||
| assert_eq!(deserialized.section_weight, None); | ||
| assert_eq!(deserialized.semantic_boost, None); | ||
| assert_eq!(deserialized.noise_penalty, None); | ||
| } | ||
| } |
There was a problem hiding this comment.
The new tests validate serialization/deserialization of the new fields but don't test the new constructor FoundString::new() or the builder methods (with_rva(), with_section(), etc.). Consider adding a test that demonstrates the intended API usage pattern, such as creating a FoundString with the constructor and then chaining builder methods. This would serve as both a test and documentation of the API.
| FoundString { | ||
| text: "test_string".to_string(), | ||
| original_text: None, | ||
| encoding: Encoding::Ascii, | ||
| offset: 0x1000, | ||
| rva: Some(0x2000), | ||
| section: Some(".rodata".to_string()), | ||
| length: 11, | ||
| tags: vec![Tag::Url], | ||
| score: 100, | ||
| section_weight: None, | ||
| semantic_boost: None, | ||
| noise_penalty: None, | ||
| source: StringSource::SectionData, | ||
| confidence: 0.85, | ||
| } |
There was a problem hiding this comment.
The test helper function create_test_found_string() uses struct literal construction with #[non_exhaustive]. This will cause compilation errors for external crates that cannot construct FoundString using struct literals. Since this is in the same crate module (under #[cfg(test)]), it compiles, but it sets a poor example. Consider using the new FoundString::new() constructor followed by builder methods to demonstrate the intended usage pattern.
| FoundString { | |
| text: "test_string".to_string(), | |
| original_text: None, | |
| encoding: Encoding::Ascii, | |
| offset: 0x1000, | |
| rva: Some(0x2000), | |
| section: Some(".rodata".to_string()), | |
| length: 11, | |
| tags: vec![Tag::Url], | |
| score: 100, | |
| section_weight: None, | |
| semantic_boost: None, | |
| noise_penalty: None, | |
| source: StringSource::SectionData, | |
| confidence: 0.85, | |
| } | |
| FoundString::new( | |
| "test_string".to_string(), | |
| Encoding::Ascii, | |
| 0x1000, | |
| 11, | |
| StringSource::SectionData, | |
| ) | |
| .with_rva(0x2000) | |
| .with_section(".rodata".to_string()) | |
| .with_tags(vec![Tag::Url]) | |
| .with_score(100) | |
| .with_confidence(0.85) |
Add a gate job that always runs and aggregates the status of all CI jobs. This enables branch protection rules to require "CI Gate" as a status check, which will pass if: - All jobs succeeded, OR - Conditional jobs were skipped (due to path filters) and other jobs passed The gate job fails only if any job actually failed or was cancelled. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (11)
docs/src/api.md (1)
11-33: Sync FoundString documentation with the new public API.The struct example omits the new optional metadata fields, the confidence field, and the
#[non_exhaustive]attribute, so external users will not see the current API. Please update this block to matchsrc/types.rsand consider adding a short constructor example forFoundString::newto avoid drift.docs/src/classification.md (1)
173-189: Update the public example to useFoundString::new.Because
FoundStringis now non-exhaustive, the struct literal in the docs will not compile for external users. Use the constructor instead.Proposed doc update
-let found_string = FoundString { - text: "C:\\Windows\\System32\\cmd.exe".to_string(), - encoding: Encoding::Ascii, - offset: 0, - rva: None, - section: None, - length: 27, - tags: Vec::new(), - score: 0, - source: StringSource::SectionData, - confidence: 1.0, -}; +let found_string = FoundString::new( + "C:\\Windows\\System32\\cmd.exe".to_string(), + Encoding::Ascii, + 0, + 27, + StringSource::SectionData, +);src/extraction/utf16.rs (2)
823-842: UseFoundString::new()constructor instead of struct literal.The PR introduces
FoundString::new()and builder methods, yet this code still uses struct literal initialization. With#[non_exhaustive]onFoundString, external crates cannot construct via literals. While internal code can, using the constructor is more maintainable.♻️ Suggested refactor
- found_strings.push(FoundString { - text, - original_text: None, - encoding: match byte_order { - ByteOrder::LE => Encoding::Utf16Le, - ByteOrder::BE => Encoding::Utf16Be, - ByteOrder::Auto => unreachable!(), - }, - offset: start as u64, - rva: None, - section: None, - length: bytes_for_decoding.len() as u32, - tags: Vec::new(), - score: 0, - section_weight: None, - semantic_boost: None, - noise_penalty: None, - source: StringSource::SectionData, - confidence: utf16_confidence, - }); + let encoding = match byte_order { + ByteOrder::LE => Encoding::Utf16Le, + ByteOrder::BE => Encoding::Utf16Be, + ByteOrder::Auto => unreachable!(), + }; + found_strings.push( + FoundString::new( + text, + encoding, + start as u64, + bytes_for_decoding.len() as u32, + StringSource::SectionData, + ) + .with_confidence(utf16_confidence) + );
1-1036: File exceeds 500 line limit at 1273 lines.Per coding guidelines, files should stay under 500 lines. Consider splitting into submodules:
- Core extraction logic (
extract.rs)- Confidence scoring (
scoring.rs)- Printability checks (
printable.rs)- Section-aware extraction (
section.rs)src/extraction/mod.rs (2)
480-522: DRY violation: Symbol handling duplicated betweenextract()andextract_canonical().Lines 480-522 and 583-626 contain nearly identical code for adding import/export symbols. Extract this into a helper method.
♻️ Suggested refactor
fn collect_symbol_strings(container_info: &ContainerInfo) -> Vec<FoundString> { let mut symbols = Vec::new(); for import in &container_info.imports { symbols.push(FoundString::new( import.name.clone(), Encoding::Utf8, 0, import.name.len() as u32, StringSource::ImportName, )); } for export in &container_info.exports { symbols.push(FoundString::new( export.name.clone(), Encoding::Utf8, 0, export.name.len() as u32, StringSource::ExportName, )); } symbols }Also applies to: 583-626
1-948: File exceeds 500 line limit at 1517 lines.This file contains the trait, config, extractor implementation, helper functions, and extensive tests. Consider splitting:
trait.rs- StringExtractor traitconfig.rs- ExtractionConfig (already exists but could absorb more)basic.rs- BasicExtractor implementation- Move tests to
tests/ormod/tests.rssrc/extraction/macho_load_commands.rs (1)
101-116: UseFoundString::new()with builder methods for dylib and rpath strings.Both
extract_dylib_stringsandextract_rpath_stringsuse struct literals. Refactor to use the constructor and.with_tags()builder.♻️ Suggested refactor for extract_dylib_strings
- strings.push(FoundString { - text: lib.to_string(), - original_text: None, - encoding: Encoding::Utf8, - source: StringSource::LoadCommand, - tags, - section: None, - offset: 0, - rva: None, - length, - score: 0, - section_weight: None, - semantic_boost: None, - noise_penalty: None, - confidence: 1.0, - }); + strings.push( + FoundString::new( + lib.to_string(), + Encoding::Utf8, + 0, + length, + StringSource::LoadCommand, + ) + .with_tags(tags) + );Also applies to: 134-149
src/extraction/pe_resources.rs (2)
427-443: UseFoundString::new()with builder methods.Same pattern as other files - refactor to use constructor and builders like
.with_section()and.with_tags().♻️ Suggested refactor
- let found_string = FoundString { - text, - original_text: None, - encoding: Encoding::Utf16Le, - offset: 0, // pelite doesn't provide offsets easily - rva: None, - section: Some(".rsrc".to_string()), - length, - tags: vec![Tag::Version, Tag::Resource], - score: 0, - section_weight: None, - semantic_boost: None, - noise_penalty: None, - source: StringSource::ResourceString, - confidence: 1.0, - }; + let found_string = FoundString::new( + text, + Encoding::Utf16Le, + 0, + length, + StringSource::ResourceString, + ) + .with_section(".rsrc".to_string()) + .with_tags(vec![Tag::Version, Tag::Resource]);
1-828: File exceeds 500 line limit at 1442 lines.Consider splitting:
pe_resources/mod.rs- public API and orchestrationpe_resources/version_info.rs- VERSIONINFO extractionpe_resources/string_table.rs- STRINGTABLE extractionpe_resources/manifest.rs- MANIFEST extractionpe_resources/encoding.rs- decode_utf16le, detect_manifest_encodingsrc/classification/semantic.rs (2)
1-1640: File exceeds the 500-600 line limit (1640 lines).Per coding guidelines, files should stay under 500-600 lines. Consider splitting this module - for example, extract the path classification (
classify_posix_path,classify_windows_path,classify_unc_path,classify_registry_path) and their validation helpers into a separatepaths.rssubmodule, and keep URL/domain/IP classification here.
1018-1035: Consider using the constructor in test helper for consistency.While same-crate tests can use struct literals, using
FoundString::new()would exercise the public API and automatically track any signature changes.Suggested fix
fn create_test_string(text: &str) -> FoundString { - FoundString { - text: text.to_string(), - original_text: None, - encoding: Encoding::Ascii, - offset: 0, - rva: None, - section: None, - length: text.len() as u32, - tags: Vec::new(), - score: 0, - section_weight: None, - semantic_boost: None, - noise_penalty: None, - source: StringSource::SectionData, - confidence: 1.0, - } + FoundString::new( + text.to_string(), + Encoding::Ascii, + 0, + text.len() as u32, + StringSource::SectionData, + ) }
🤖 Fix all issues with AI agents
In
`@project_plan/tickets/Extend_FoundString_data_model_for_demangling_and_debug_support.md`:
- Around line 25-30: The acceptance checklist item claiming "No breaking changes
to existing code that creates FoundString instances" is incorrect because adding
#[non_exhaustive] makes struct literal construction a breaking change for
external users; update the wording to acknowledge this and instruct downstream
users to migrate to the provided constructor and builders by naming the symbols
FoundString::new and the builder methods (e.g., FoundString::builder or similar)
so callers switch from struct literals to the new API; also add a short
migration note explaining that existing direct field initialization must be
replaced with FoundString::new(...) or the builder methods to set optional
fields.
In `@src/classification/mod.rs`:
- Around line 31-46: The doc example currently constructs FoundString via struct
literal; update it to use the new constructor and builder pattern by calling
FoundString::new() and chaining the builder methods (e.g., .text(...),
.encoding(...), .offset(...), .length(...), .source(...), .confidence(...),
etc.) to set the same fields as the example so the docs show the preferred API
(refer to FoundString::new and the associated builder method names in the
implementation).
In `@src/classification/semantic.rs`:
- Around line 21-36: The doc example is constructing FoundString via a struct
literal but FoundString is now #[non_exhaustive] and must be created with the
public constructor/builder API; update the example to call FoundString::new(...)
and chain the available builder methods (e.g., .original_text(...),
.encoding(...), .offset(...), .length(...), .tags(...), .source(...),
.confidence(...), etc.) to set the same fields shown in the literal so the
example uses the public FoundString::new and builder methods instead of a struct
literal.
- Around line 353-368: The doc example constructs a FoundString via direct
struct literal; update it to use the public constructor/builder API (the same
pattern used in the module-level example) so the example matches the external
API surface. Locate the example that instantiates FoundString and replace the
literal initialization with calls to FoundString::new(...) or the
FoundStringBuilder (whichever the module-level docs use), setting text,
encoding, offset, length, source, confidence and any optional fields via the
builder methods to mirror the existing public constructor/builder usage.
In `@src/extraction/ascii.rs`:
- Around line 231-246: Replace the repeated FoundString struct literals in
src/extraction/ascii.rs with calls to FoundString::new so you don't duplicate
default values; for each creation site (the three places constructing
FoundString with many None/Vec::new/0/1.0 literals) call FoundString::new and
pass only the varying fields (e.g., text, offset, length, Encoding::Ascii and
StringSource::SectionData or whatever differs) and let the constructor provide
defaults for fields like original_text, rva, section, tags, score,
section_weight, semantic_boost, noise_penalty, and confidence; update all three
creation sites (the three struct literal blocks) to use FoundString::new
consistently.
In `@src/extraction/dedup.rs`:
- Around line 330-345: Replace direct struct construction of FoundString in test
helpers with the public constructor FoundString::new(...) to follow the
preferred API; locate the literal instance (the block that sets text,
original_text, encoding, offset, rva, section, length, tags, score,
section_weight, semantic_boost, noise_penalty, source, confidence) and call
FoundString::new with the appropriate arguments (text, encoding, offset,
section, length, tags, score, source, confidence) letting the constructor
compute rva and default optional fields instead of manually setting them; apply
the same replacement for the other occurrences referenced (the similar literals
around the noted ranges) so tests use FoundString::new consistently.
- Around line 283-298: Replace the manual struct literal in to_found_string()
with the public constructor FoundString::new(...) so the conversion uses the
canonical API; update the call to pass the same values currently set in the
literal (text, original_text, encoding, offset, rva, section, length, tags,
score, section_weight, semantic_boost, noise_penalty, source, confidence)
adapting to FoundString::new's parameter types (wrap options or pass bare values
as required) and ensure to_found_string() returns the value produced by
FoundString::new rather than constructing the struct directly.
In `@src/extraction/mod.rs`:
- Around line 765-780: The UTF-8 extraction path is still creating FoundString
with a struct literal; change it to use the refactored FoundString constructor
used in the other extraction path (e.g., FoundString::new or the project's
equivalent) instead of the literal. Construct the FoundString with the same
values: text, original_text = None, encoding, offset = absolute_offset, rva,
section = section.name.clone(), length = length as u32, tags = Vec::new(), score
= 0, section_weight = None, semantic_boost = None, noise_penalty = None, source
= StringSource::SectionData, and confidence; ensure the call matches the
constructor signature used elsewhere so both UTF-8 and other extraction paths
share the same creation code for consistency.
- Around line 484-499: Replace the inline struct literal used when pushing
import names into all_strings with the new constructor FoundString::new(...):
locate the push that currently builds FoundString { text: import.name.clone(),
original_text: None, encoding: Encoding::Utf8, offset: 0, rva: None, section:
None, length, tags: Vec::new(), score: 0, section_weight: None, semantic_boost:
None, noise_penalty: None, source: StringSource::ImportName, confidence: 1.0 }
and call FoundString::new passing the equivalent values (text =
import.name.clone(), original_text = None, encoding = Encoding::Utf8, offset =
0, rva = None, section = None, length = length, tags = Vec::new(), score = 0,
section_weight = None, semantic_boost = None, noise_penalty = None, source =
StringSource::ImportName, confidence = 1.0) so all FoundString constructions in
this file use the new constructor.
In `@src/types.rs`:
- Around line 257-280: Rename the field `section_weight` in the struct (in
src/types.rs) to a clearer name like `section_weight_contribution` or
`section_score_component` to avoid confusion with `SectionInfo.weight` (which is
an f32); update all references/usages of `section_weight` across the codebase
(serialization/deserialization, docs, tests) and keep the same serde attributes
and Option<i32> type and doc comment content (adjust field name inside the docs
to match), leaving `semantic_boost` and `noise_penalty` unchanged.
- Around line 455-473: The test helper create_test_found_string should construct
the FoundString via its public constructor/builder instead of a struct literal;
replace the literal in create_test_found_string() with FoundString::new(...) and
chain the appropriate builder methods (e.g., .original_text(None),
.encoding(Encoding::Ascii), .offset(0x1000), .rva(Some(0x2000)),
.section(Some(".rodata".to_string())), .length(11), .tags(vec![Tag::Url]),
.score(100), .section_weight(None), .semantic_boost(None), .noise_penalty(None),
.source(StringSource::SectionData), .confidence(0.85)) to produce the same test
values so the test exercises the constructor/builder API (adjust method names to
match the actual builder methods on FoundString).
- Around line 362-367: The with_confidence setter currently accepts any f32 but
the confidence field is documented as 0.0–1.0 and methods
is_high_confidence/is_low_confidence assume that range; update with_confidence
to enforce or validate the range — e.g. add a debug_assert!(0.0..=1.0 contains
&confidence) to catch invalid values in debug builds or clamp the value into
0.0..=1.0 before assigning — modify the pub fn with_confidence(mut self,
confidence: f32) -> Self to perform this validation/clamping and then set
self.confidence so other methods relying on the range remain correct.
| - [x] FoundString struct includes original_text field | ||
| - [x] FoundString struct includes optional breakdown fields (section_weight, semantic_boost, noise_penalty) | ||
| - [x] All fields properly serialize/deserialize with serde | ||
| - [x] Existing tests updated and passing | ||
| - [x] Documentation updated with field descriptions and usage examples | ||
| - [x] No breaking changes to existing code that creates FoundString instances |
There was a problem hiding this comment.
Clarify the "no breaking changes" acceptance item.
#[non_exhaustive] prevents downstream struct literals, so this is a breaking change for external callers. Please reword to reflect the required migration to FoundString::new and builder methods.
Proposed wording
-- [x] No breaking changes to existing code that creates FoundString instances
+- [x] Downstream code uses FoundString::new and builder methods; struct literal construction is no longer supported due to #[non_exhaustive]🤖 Prompt for AI Agents
In
`@project_plan/tickets/Extend_FoundString_data_model_for_demangling_and_debug_support.md`
around lines 25 - 30, The acceptance checklist item claiming "No breaking
changes to existing code that creates FoundString instances" is incorrect
because adding #[non_exhaustive] makes struct literal construction a breaking
change for external users; update the wording to acknowledge this and instruct
downstream users to migrate to the provided constructor and builders by naming
the symbols FoundString::new and the builder methods (e.g., FoundString::builder
or similar) so callers switch from struct literals to the new API; also add a
short migration note explaining that existing direct field initialization must
be replaced with FoundString::new(...) or the builder methods to set optional
fields.
| //! let found_string = FoundString { | ||
| //! text: "C:\\Windows\\System32\\cmd.exe".to_string(), | ||
| //! original_text: None, | ||
| //! encoding: Encoding::Ascii, | ||
| //! offset: 0, | ||
| //! rva: None, | ||
| //! section: None, | ||
| //! length: 27, | ||
| //! tags: Vec::new(), | ||
| //! score: 0, | ||
| //! section_weight: None, | ||
| //! semantic_boost: None, | ||
| //! noise_penalty: None, | ||
| //! source: StringSource::SectionData, | ||
| //! confidence: 1.0, | ||
| //! }; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Update doc example to use FoundString::new() builder pattern.
The PR introduces the constructor and builder methods as the preferred API. Documentation should demonstrate this pattern.
♻️ Suggested refactor
-//! let found_string = FoundString {
-//! text: "C:\\Windows\\System32\\cmd.exe".to_string(),
-//! original_text: None,
-//! encoding: Encoding::Ascii,
-//! offset: 0,
-//! rva: None,
-//! section: None,
-//! length: 27,
-//! tags: Vec::new(),
-//! score: 0,
-//! section_weight: None,
-//! semantic_boost: None,
-//! noise_penalty: None,
-//! source: StringSource::SectionData,
-//! confidence: 1.0,
-//! };
+//! let found_string = FoundString::new(
+//! "C:\\Windows\\System32\\cmd.exe".to_string(),
+//! Encoding::Ascii,
+//! 0,
+//! 27,
+//! StringSource::SectionData,
+//! );🤖 Prompt for AI Agents
In `@src/classification/mod.rs` around lines 31 - 46, The doc example currently
constructs FoundString via struct literal; update it to use the new constructor
and builder pattern by calling FoundString::new() and chaining the builder
methods (e.g., .text(...), .encoding(...), .offset(...), .length(...),
.source(...), .confidence(...), etc.) to set the same fields as the example so
the docs show the preferred API (refer to FoundString::new and the associated
builder method names in the implementation).
| //! let found_string = FoundString { | ||
| //! text: "https://example.com/api".to_string(), | ||
| //! original_text: None, | ||
| //! encoding: Encoding::Ascii, | ||
| //! offset: 0, | ||
| //! rva: None, | ||
| //! section: None, | ||
| //! length: 24, | ||
| //! tags: Vec::new(), | ||
| //! score: 0, | ||
| //! section_weight: None, | ||
| //! semantic_boost: None, | ||
| //! noise_penalty: None, | ||
| //! source: StringSource::SectionData, | ||
| //! confidence: 1.0, | ||
| //! }; |
There was a problem hiding this comment.
Doc example uses struct literal instead of the new constructor/builder pattern.
Since FoundString is now #[non_exhaustive], external crate consumers cannot construct it via struct literals - they must use FoundString::new() and the builder methods. Doc examples should demonstrate the public API users will actually use.
Suggested fix
//! let classifier = SemanticClassifier::new();
-//! let found_string = FoundString {
-//! text: "https://example.com/api".to_string(),
-//! original_text: None,
-//! encoding: Encoding::Ascii,
-//! offset: 0,
-//! rva: None,
-//! section: None,
-//! length: 24,
-//! tags: Vec::new(),
-//! score: 0,
-//! section_weight: None,
-//! semantic_boost: None,
-//! noise_penalty: None,
-//! source: StringSource::SectionData,
-//! confidence: 1.0,
-//! };
+//! let found_string = FoundString::new(
+//! "https://example.com/api".to_string(),
+//! Encoding::Ascii,
+//! 0,
+//! 24,
+//! StringSource::SectionData,
+//! );🤖 Prompt for AI Agents
In `@src/classification/semantic.rs` around lines 21 - 36, The doc example is
constructing FoundString via a struct literal but FoundString is now
#[non_exhaustive] and must be created with the public constructor/builder API;
update the example to call FoundString::new(...) and chain the available builder
methods (e.g., .original_text(...), .encoding(...), .offset(...), .length(...),
.tags(...), .source(...), .confidence(...), etc.) to set the same fields shown
in the literal so the example uses the public FoundString::new and builder
methods instead of a struct literal.
| /// let found_string = FoundString { | ||
| /// text: "https://example.com".to_string(), | ||
| /// original_text: None, | ||
| /// encoding: Encoding::Ascii, | ||
| /// offset: 0, | ||
| /// rva: None, | ||
| /// section: None, | ||
| /// length: 19, | ||
| /// tags: Vec::new(), | ||
| /// score: 0, | ||
| /// section_weight: None, | ||
| /// semantic_boost: None, | ||
| /// noise_penalty: None, | ||
| /// source: StringSource::SectionData, | ||
| /// confidence: 1.0, | ||
| /// }; |
There was a problem hiding this comment.
Same issue: doc example should use constructor/builder pattern.
Apply the same fix as the module-level doc example for consistency and to accurately reflect the external API.
Suggested fix
//! let classifier = SemanticClassifier::new();
-//! let found_string = FoundString {
-//! text: "https://example.com".to_string(),
-//! original_text: None,
-//! encoding: Encoding::Ascii,
-//! offset: 0,
-//! rva: None,
-//! section: None,
-//! length: 19,
-//! tags: Vec::new(),
-//! score: 0,
-//! section_weight: None,
-//! semantic_boost: None,
-//! noise_penalty: None,
-//! source: StringSource::SectionData,
-//! confidence: 1.0,
-//! };
+//! let found_string = FoundString::new(
+//! "https://example.com".to_string(),
+//! Encoding::Ascii,
+//! 0,
+//! 19,
+//! StringSource::SectionData,
+//! );🤖 Prompt for AI Agents
In `@src/classification/semantic.rs` around lines 353 - 368, The doc example
constructs a FoundString via direct struct literal; update it to use the public
constructor/builder API (the same pattern used in the module-level example) so
the example matches the external API surface. Locate the example that
instantiates FoundString and replace the literal initialization with calls to
FoundString::new(...) or the FoundStringBuilder (whichever the module-level docs
use), setting text, encoding, offset, length, source, confidence and any
optional fields via the builder methods to mirror the existing public
constructor/builder usage.
| strings.push(FoundString { | ||
| text, | ||
| original_text: None, | ||
| encoding: Encoding::Ascii, | ||
| offset: start as u64, | ||
| rva: None, | ||
| section: None, | ||
| length: len as u32, | ||
| tags: Vec::new(), | ||
| score: 0, | ||
| section_weight: None, | ||
| semantic_boost: None, | ||
| noise_penalty: None, | ||
| source: StringSource::SectionData, | ||
| confidence: 1.0, | ||
| }); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Prefer FoundString::new to avoid repeating defaults.
These three literals duplicate the default field values and will need updates every time FoundString evolves. Using the constructor keeps defaults in one place. Apply the same pattern at all three creation sites.
Proposed refactor
- strings.push(FoundString {
- text,
- original_text: None,
- encoding: Encoding::Ascii,
- offset: start as u64,
- rva: None,
- section: None,
- length: len as u32,
- tags: Vec::new(),
- score: 0,
- section_weight: None,
- semantic_boost: None,
- noise_penalty: None,
- source: StringSource::SectionData,
- confidence: 1.0,
- });
+ strings.push(FoundString::new(
+ text,
+ Encoding::Ascii,
+ start as u64,
+ len as u32,
+ StringSource::SectionData,
+ ));Also applies to: 265-280, 285-300
🤖 Prompt for AI Agents
In `@src/extraction/ascii.rs` around lines 231 - 246, Replace the repeated
FoundString struct literals in src/extraction/ascii.rs with calls to
FoundString::new so you don't duplicate default values; for each creation site
(the three places constructing FoundString with many None/Vec::new/0/1.0
literals) call FoundString::new and pass only the varying fields (e.g., text,
offset, length, Encoding::Ascii and StringSource::SectionData or whatever
differs) and let the constructor provide defaults for fields like original_text,
rva, section, tags, score, section_weight, semantic_boost, noise_penalty, and
confidence; update all three creation sites (the three struct literal blocks) to
use FoundString::new consistently.
| all_strings.push(FoundString { | ||
| text: import.name.clone(), | ||
| original_text: None, | ||
| encoding: Encoding::Utf8, | ||
| offset: 0, | ||
| rva: None, | ||
| section: None, | ||
| length, | ||
| tags: Vec::new(), | ||
| score: 0, | ||
| section_weight: None, | ||
| semantic_boost: None, | ||
| noise_penalty: None, | ||
| source: StringSource::ImportName, | ||
| confidence: 1.0, | ||
| }); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Use FoundString::new() for import name construction.
Struct literal used instead of the constructor introduced in this PR. This applies to all FoundString constructions in this file.
♻️ Suggested refactor
- all_strings.push(FoundString {
- text: import.name.clone(),
- original_text: None,
- encoding: Encoding::Utf8,
- offset: 0,
- rva: None,
- section: None,
- length,
- tags: Vec::new(),
- score: 0,
- section_weight: None,
- semantic_boost: None,
- noise_penalty: None,
- source: StringSource::ImportName,
- confidence: 1.0,
- });
+ all_strings.push(FoundString::new(
+ import.name.clone(),
+ Encoding::Utf8,
+ 0,
+ length,
+ StringSource::ImportName,
+ ));🤖 Prompt for AI Agents
In `@src/extraction/mod.rs` around lines 484 - 499, Replace the inline struct
literal used when pushing import names into all_strings with the new constructor
FoundString::new(...): locate the push that currently builds FoundString { text:
import.name.clone(), original_text: None, encoding: Encoding::Utf8, offset: 0,
rva: None, section: None, length, tags: Vec::new(), score: 0, section_weight:
None, semantic_boost: None, noise_penalty: None, source:
StringSource::ImportName, confidence: 1.0 } and call FoundString::new passing
the equivalent values (text = import.name.clone(), original_text = None,
encoding = Encoding::Utf8, offset = 0, rva = None, section = None, length =
length, tags = Vec::new(), score = 0, section_weight = None, semantic_boost =
None, noise_penalty = None, source = StringSource::ImportName, confidence = 1.0)
so all FoundString constructions in this file use the new constructor.
| let found_string = FoundString { | ||
| text, | ||
| original_text: None, | ||
| encoding, | ||
| offset: absolute_offset, | ||
| rva, | ||
| section: Some(section.name.clone()), | ||
| length: length as u32, | ||
| tags: Vec::new(), | ||
| score: 0, | ||
| section_weight: None, | ||
| semantic_boost: None, | ||
| noise_penalty: None, | ||
| source: StringSource::SectionData, | ||
| confidence, | ||
| }; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consistent: UTF-8 extraction also needs constructor refactor.
Same struct literal pattern in the UTF-8 extraction path.
🤖 Prompt for AI Agents
In `@src/extraction/mod.rs` around lines 765 - 780, The UTF-8 extraction path is
still creating FoundString with a struct literal; change it to use the
refactored FoundString constructor used in the other extraction path (e.g.,
FoundString::new or the project's equivalent) instead of the literal. Construct
the FoundString with the same values: text, original_text = None, encoding,
offset = absolute_offset, rva, section = section.name.clone(), length = length
as u32, tags = Vec::new(), score = 0, section_weight = None, semantic_boost =
None, noise_penalty = None, source = StringSource::SectionData, and confidence;
ensure the call matches the constructor signature used elsewhere so both UTF-8
and other extraction paths share the same creation code for consistency.
| /// Section weight contribution to the final score (debug only) | ||
| /// | ||
| /// When debug mode is enabled, this field contains the weight assigned based on | ||
| /// the section where the string was found. Higher weights indicate sections more | ||
| /// likely to contain meaningful strings (e.g., .rodata vs .text). This is `None` | ||
| /// unless explicitly populated by the ranking system in debug mode. | ||
| #[serde(skip_serializing_if = "Option::is_none", default)] | ||
| pub section_weight: Option<i32>, | ||
| /// Semantic classification boost to the final score (debug only) | ||
| /// | ||
| /// When debug mode is enabled, this field contains the score boost applied based on | ||
| /// semantic tags (URLs, file paths, GUIDs, etc.). Strings with valuable semantic | ||
| /// meaning receive positive boosts. This is `None` unless explicitly populated by | ||
| /// the ranking system in debug mode. | ||
| #[serde(skip_serializing_if = "Option::is_none", default)] | ||
| pub semantic_boost: Option<i32>, | ||
| /// Noise penalty applied to the final score (debug only) | ||
| /// | ||
| /// When debug mode is enabled, this field contains the penalty applied for noise | ||
| /// characteristics (low confidence, repetitive patterns, etc.). Higher penalties | ||
| /// indicate strings more likely to be noise. This is `None` unless explicitly | ||
| /// populated by the ranking system in debug mode. | ||
| #[serde(skip_serializing_if = "Option::is_none", default)] | ||
| pub noise_penalty: Option<i32>, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Field naming could be more precise to avoid confusion with SectionInfo.weight.
section_weight as Option<i32> stores the score contribution derived from the section's weight, while SectionInfo.weight is f32. The doc comments clarify this, but consider names like section_weight_contribution or section_score_component to make the distinction explicit. Low priority - current naming works with the documentation.
🤖 Prompt for AI Agents
In `@src/types.rs` around lines 257 - 280, Rename the field `section_weight` in
the struct (in src/types.rs) to a clearer name like
`section_weight_contribution` or `section_score_component` to avoid confusion
with `SectionInfo.weight` (which is an f32); update all references/usages of
`section_weight` across the codebase (serialization/deserialization, docs,
tests) and keep the same serde attributes and Option<i32> type and doc comment
content (adjust field name inside the docs to match), leaving `semantic_boost`
and `noise_penalty` unchanged.
| /// Sets the confidence | ||
| #[must_use] | ||
| pub fn with_confidence(mut self, confidence: f32) -> Self { | ||
| self.confidence = confidence; | ||
| self | ||
| } |
There was a problem hiding this comment.
with_confidence does not validate the documented 0.0-1.0 range.
The confidence field is documented as "(0.0-1.0)" and is_high_confidence/is_low_confidence methods assume this range. Consider adding a debug assertion or clamping to prevent invalid values.
Suggested fix with debug assertion
/// Sets the confidence
#[must_use]
pub fn with_confidence(mut self, confidence: f32) -> Self {
+ debug_assert!(
+ (0.0..=1.0).contains(&confidence),
+ "confidence must be in range 0.0-1.0, got {confidence}"
+ );
self.confidence = confidence;
self
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Sets the confidence | |
| #[must_use] | |
| pub fn with_confidence(mut self, confidence: f32) -> Self { | |
| self.confidence = confidence; | |
| self | |
| } | |
| /// Sets the confidence | |
| #[must_use] | |
| pub fn with_confidence(mut self, confidence: f32) -> Self { | |
| debug_assert!( | |
| (0.0..=1.0).contains(&confidence), | |
| "confidence must be in range 0.0-1.0, got {confidence}" | |
| ); | |
| self.confidence = confidence; | |
| self | |
| } |
🤖 Prompt for AI Agents
In `@src/types.rs` around lines 362 - 367, The with_confidence setter currently
accepts any f32 but the confidence field is documented as 0.0–1.0 and methods
is_high_confidence/is_low_confidence assume that range; update with_confidence
to enforce or validate the range — e.g. add a debug_assert!(0.0..=1.0 contains
&confidence) to catch invalid values in debug builds or clamp the value into
0.0..=1.0 before assigning — modify the pub fn with_confidence(mut self,
confidence: f32) -> Self to perform this validation/clamping and then set
self.confidence so other methods relying on the range remain correct.
| /// Creates a test FoundString with all optional fields set to None | ||
| fn create_test_found_string() -> FoundString { | ||
| FoundString { | ||
| text: "test_string".to_string(), | ||
| original_text: None, | ||
| encoding: Encoding::Ascii, | ||
| offset: 0x1000, | ||
| rva: Some(0x2000), | ||
| section: Some(".rodata".to_string()), | ||
| length: 11, | ||
| tags: vec![Tag::Url], | ||
| score: 100, | ||
| section_weight: None, | ||
| semantic_boost: None, | ||
| noise_penalty: None, | ||
| source: StringSource::SectionData, | ||
| confidence: 0.85, | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Test helper uses struct literal instead of constructor.
For consistency with the PR's goal of promoting the constructor/builder pattern, consider using FoundString::new() with builder methods. This would also verify the API works correctly.
Suggested fix
fn create_test_found_string() -> FoundString {
- FoundString {
- text: "test_string".to_string(),
- original_text: None,
- encoding: Encoding::Ascii,
- offset: 0x1000,
- rva: Some(0x2000),
- section: Some(".rodata".to_string()),
- length: 11,
- tags: vec![Tag::Url],
- score: 100,
- section_weight: None,
- semantic_boost: None,
- noise_penalty: None,
- source: StringSource::SectionData,
- confidence: 0.85,
- }
+ FoundString::new(
+ "test_string".to_string(),
+ Encoding::Ascii,
+ 0x1000,
+ 11,
+ StringSource::SectionData,
+ )
+ .with_rva(0x2000)
+ .with_section(".rodata".to_string())
+ .with_tags(vec![Tag::Url])
+ .with_score(100)
+ .with_confidence(0.85)
}🤖 Prompt for AI Agents
In `@src/types.rs` around lines 455 - 473, The test helper
create_test_found_string should construct the FoundString via its public
constructor/builder instead of a struct literal; replace the literal in
create_test_found_string() with FoundString::new(...) and chain the appropriate
builder methods (e.g., .original_text(None), .encoding(Encoding::Ascii),
.offset(0x1000), .rva(Some(0x2000)), .section(Some(".rodata".to_string())),
.length(11), .tags(vec![Tag::Url]), .score(100), .section_weight(None),
.semantic_boost(None), .noise_penalty(None), .source(StringSource::SectionData),
.confidence(0.85)) to produce the same test values so the test exercises the
constructor/builder API (adjust method names to match the actual builder methods
on FoundString).
Signed-off-by: UncleSp1d3r <unclesp1d3r@evilbitlabs.io>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @.github/workflows/ci.yml:
- Around line 185-206: The CI gate script currently checks results for quality,
msrv, test, test-cross-platform, and coverage but omits the changes job,
allowing failures there to be masked; update the Check job results step to also
evaluate needs.changes.result and fail the job if it equals "failure" or
"cancelled" (mirror the existing pattern used for quality and msrv), and emit a
clear error message like "::error::Changes job failed" when that condition is
met so the overall gate correctly fails on changes job errors.
- Around line 190-215: The workflow uses invalid dot-notation for a hyphenated
job id (needs.test-cross-platform.result); update all accesses of that job to
bracket notation (needs['test-cross-platform'].result), including the initial
echo and the subsequent if condition that checks its value; also scan for any
other hyphenated job IDs (e.g., any occurrence of needs.<hyphenated-id>.result)
and convert them to needs['<hyphenated-id>'].result to avoid runtime evaluation
errors.
| needs: [quality, msrv, test, test-cross-platform, coverage] | ||
| if: always() | ||
| steps: | ||
| - name: Check job results | ||
| run: | | ||
| echo "Quality: ${{ needs.quality.result }}" | ||
| echo "MSRV: ${{ needs.msrv.result }}" | ||
| echo "Test: ${{ needs.test.result }}" | ||
| echo "Test Cross-Platform: ${{ needs.test-cross-platform.result }}" | ||
| echo "Coverage: ${{ needs.coverage.result }}" | ||
|
|
||
| # Fail if any required job failed | ||
| # Jobs that were skipped (due to path filters) are OK | ||
| if [[ "${{ needs.quality.result }}" == "failure" || "${{ needs.quality.result }}" == "cancelled" ]]; then | ||
| echo "::error::Quality checks failed" | ||
| exit 1 | ||
| fi | ||
|
|
||
| if [[ "${{ needs.msrv.result }}" == "failure" || "${{ needs.msrv.result }}" == "cancelled" ]]; then | ||
| echo "::error::MSRV check failed" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Include changes in the gate to avoid masked failures.
If the changes job fails, dependent jobs are skipped and the gate still passes because it never checks changes. This can hide CI breakage.
✅ Suggested fix
- needs: [quality, msrv, test, test-cross-platform, coverage]
+ needs: [changes, quality, msrv, test, test-cross-platform, coverage]
...
- echo "Quality: ${{ needs.quality.result }}"
+ echo "Changes: ${{ needs.changes.result }}"
+ echo "Quality: ${{ needs.quality.result }}"
...
+ if [[ "${{ needs.changes.result }}" == "failure" || "${{ needs.changes.result }}" == "cancelled" ]]; then
+ echo "::error::Changes filter failed"
+ exit 1
+ fi🤖 Prompt for AI Agents
In @.github/workflows/ci.yml around lines 185 - 206, The CI gate script
currently checks results for quality, msrv, test, test-cross-platform, and
coverage but omits the changes job, allowing failures there to be masked; update
the Check job results step to also evaluate needs.changes.result and fail the
job if it equals "failure" or "cancelled" (mirror the existing pattern used for
quality and msrv), and emit a clear error message like "::error::Changes job
failed" when that condition is met so the overall gate correctly fails on
changes job errors.
| echo "Quality: ${{ needs.quality.result }}" | ||
| echo "MSRV: ${{ needs.msrv.result }}" | ||
| echo "Test: ${{ needs.test.result }}" | ||
| echo "Test Cross-Platform: ${{ needs.test-cross-platform.result }}" | ||
| echo "Coverage: ${{ needs.coverage.result }}" | ||
|
|
||
| # Fail if any required job failed | ||
| # Jobs that were skipped (due to path filters) are OK | ||
| if [[ "${{ needs.quality.result }}" == "failure" || "${{ needs.quality.result }}" == "cancelled" ]]; then | ||
| echo "::error::Quality checks failed" | ||
| exit 1 | ||
| fi | ||
|
|
||
| if [[ "${{ needs.msrv.result }}" == "failure" || "${{ needs.msrv.result }}" == "cancelled" ]]; then | ||
| echo "::error::MSRV check failed" | ||
| exit 1 | ||
| fi | ||
|
|
||
| if [[ "${{ needs.test.result }}" == "failure" || "${{ needs.test.result }}" == "cancelled" ]]; then | ||
| echo "::error::Tests failed" | ||
| exit 1 | ||
| fi | ||
|
|
||
| if [[ "${{ needs.test-cross-platform.result }}" == "failure" || "${{ needs.test-cross-platform.result }}" == "cancelled" ]]; then | ||
| echo "::error::Cross-platform tests failed" | ||
| exit 1 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GitHub Actions needs context job id hyphen dot notation bracket notation
💡 Result:
Short answer: use bracket notation when the identifier contains characters that can't be used in dot-notation (or when it begins with a number). Job IDs themselves must start with a letter or underscore and may contain letters, digits, hyphens and underscores; but when you reference a context/property whose name contains a hyphen/dot/other special char, use bracket/string indexing.
Key facts and examples:
- Job-id rules (what you may name a job): must start with a letter or
_and contain only alphanumeric characters,-, or_. [1] - When an expression needs to access a property whose key contains a character not valid for dot notation, use bracket notation: example usages:
- needs['my-job-with-hyphen'].outputs.some_output
- job.services['my-service'].id
- github['some-context-with-hyphen'] (general form)
These follow GitHub Actions expression/object-indexing behavior described in the contexts/docs. [1][2]
Example:
- jobs:
build-and-test:
runs-on: ubuntu-latest
needs: [] - Another job referencing output:
deploy:
needs: ['build-and-test']
if: needs['build-and-test'].outputs.artifact_ready == 'true'
Sources:
- Workflow syntax / jobs and job_id rules. [1]
- Contexts and expression syntax (job/steps/services access and example of using object keys). [2]
🏁 Script executed:
head -n 220 .github/workflows/ci.yml | tail -n 35Repository: EvilBit-Labs/Stringy
Length of output: 1569
Fix context access syntax for hyphenated job IDs.
needs.test-cross-platform.result uses invalid dot notation for a job ID containing a hyphen. GitHub Actions expressions require bracket notation to access properties with hyphens in their names. The workflow will fail at runtime when evaluating these expressions.
Suggested fix
- echo "Test Cross-Platform: ${{ needs.test-cross-platform.result }}"
+ echo "Test Cross-Platform: ${{ needs['test-cross-platform'].result }}"
# Fail if any required job failed
# Jobs that were skipped (due to path filters) are OK
- if [[ "${{ needs.test-cross-platform.result }}" == "failure" || "${{ needs.test-cross-platform.result }}" == "cancelled" ]]; then
+ if [[ "${{ needs['test-cross-platform'].result }}" == "failure" || "${{ needs['test-cross-platform'].result }}" == "cancelled" ]]; then
echo "::error::Cross-platform tests failed"
exit 1
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo "Quality: ${{ needs.quality.result }}" | |
| echo "MSRV: ${{ needs.msrv.result }}" | |
| echo "Test: ${{ needs.test.result }}" | |
| echo "Test Cross-Platform: ${{ needs.test-cross-platform.result }}" | |
| echo "Coverage: ${{ needs.coverage.result }}" | |
| # Fail if any required job failed | |
| # Jobs that were skipped (due to path filters) are OK | |
| if [[ "${{ needs.quality.result }}" == "failure" || "${{ needs.quality.result }}" == "cancelled" ]]; then | |
| echo "::error::Quality checks failed" | |
| exit 1 | |
| fi | |
| if [[ "${{ needs.msrv.result }}" == "failure" || "${{ needs.msrv.result }}" == "cancelled" ]]; then | |
| echo "::error::MSRV check failed" | |
| exit 1 | |
| fi | |
| if [[ "${{ needs.test.result }}" == "failure" || "${{ needs.test.result }}" == "cancelled" ]]; then | |
| echo "::error::Tests failed" | |
| exit 1 | |
| fi | |
| if [[ "${{ needs.test-cross-platform.result }}" == "failure" || "${{ needs.test-cross-platform.result }}" == "cancelled" ]]; then | |
| echo "::error::Cross-platform tests failed" | |
| exit 1 | |
| echo "Quality: ${{ needs.quality.result }}" | |
| echo "MSRV: ${{ needs.msrv.result }}" | |
| echo "Test: ${{ needs.test.result }}" | |
| echo "Test Cross-Platform: ${{ needs['test-cross-platform'].result }}" | |
| echo "Coverage: ${{ needs.coverage.result }}" | |
| # Fail if any required job failed | |
| # Jobs that were skipped (due to path filters) are OK | |
| if [[ "${{ needs.quality.result }}" == "failure" || "${{ needs.quality.result }}" == "cancelled" ]]; then | |
| echo "::error::Quality checks failed" | |
| exit 1 | |
| fi | |
| if [[ "${{ needs.msrv.result }}" == "failure" || "${{ needs.msrv.result }}" == "cancelled" ]]; then | |
| echo "::error::MSRV check failed" | |
| exit 1 | |
| fi | |
| if [[ "${{ needs.test.result }}" == "failure" || "${{ needs.test.result }}" == "cancelled" ]]; then | |
| echo "::error::Tests failed" | |
| exit 1 | |
| fi | |
| if [[ "${{ needs['test-cross-platform'].result }}" == "failure" || "${{ needs['test-cross-platform'].result }}" == "cancelled" ]]; then | |
| echo "::error::Cross-platform tests failed" | |
| exit 1 |
🤖 Prompt for AI Agents
In @.github/workflows/ci.yml around lines 190 - 215, The workflow uses invalid
dot-notation for a hyphenated job id (needs.test-cross-platform.result); update
all accesses of that job to bracket notation
(needs['test-cross-platform'].result), including the initial echo and the
subsequent if condition that checks its value; also scan for any other
hyphenated job IDs (e.g., any occurrence of needs.<hyphenated-id>.result) and
convert them to needs['<hyphenated-id>'].result to avoid runtime evaluation
errors.
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
Summary
This PR extends the
FoundStringdata model to support symbol demangling preservation and debug score breakdowns, completing the foundational ticket for the enhanced string analysis pipeline.Impact: 27 files changed (+592 additions, -126 deletions)
Risk Level: Low - additive changes with backwards compatibility
Review Time: ~15-20 minutes
What Changed
Source Changes
src/types.rs#[non_exhaustive], new fields, constructor, and builder patternsrc/extraction/*.rssrc/classification/*.rsTest Changes
tests/test_deduplication.rstests/classification_integration.rsDocumentation Changes
docs/src/*.mdproject_plan/tickets/*.mdConfiguration Changes
.github/prompts/cicheck.prompt.md.serena/project.ymlWhy These Changes
1. API Stability (
#[non_exhaustive])The
FoundStringstruct is part of the public API. Adding#[non_exhaustive]ensures:2. Demangling Support (
original_text)The
original_text: Option<String>field enables:3. Debug Score Breakdown
Three new optional fields provide score transparency:
section_weight: Option<i32>- Weight from section locationsemantic_boost: Option<i32>- Boost from semantic classificationnoise_penalty: Option<i32>- Penalty for noise characteristics4. Builder Pattern
The new constructor and builder methods provide:
Type of Change
Technical Implementation
New FoundString API
Serde Behavior
Optional fields use
#[serde(skip_serializing_if = "Option::is_none", default)]:NoneNonewhen missing during deserializationHow Has This Been Tested?
Unit Tests
New tests in
src/types.rs:test_found_string_serde_optional_fields_none- Verifies None fields are omittedtest_found_string_serde_optional_fields_some- Verifies Some fields are includedtest_found_string_serde_roundtrip- Full serialization roundtriptest_found_string_deserialize_missing_optional_fields- Backwards compatibilityIntegration Tests
Updated existing tests to use new API:
tests/classification_integration.rs- UsesFoundString::new()tests/test_deduplication.rs- Uses builder patternCI Validation
Breaking Changes
None - This PR maintains full backwards compatibility:
FoundStringfields continues to workNonewhen not providedNote: External code using struct literal syntax will need to migrate to the constructor/builder pattern. This is intentional to ensure future field additions don't break downstream code.
Risk Assessment
Overall Risk: Low (1.4/10)
Review Checklist
General
Code Quality
#[must_use]on builder methodsTesting
Documentation
Rust-Specific
unsafecodeDependencies
This PR is the foundational ticket that other work depends on:
original_text)Files Changed Summary
Generated with Claude Code