diff --git a/CHANGELOG.md b/CHANGELOG.md index 56ef2d8..0739354 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,52 @@ +# v0.7.3 — Structural Correctness & Deterministic Heuristics +**Released: 2026‑05‑11** + +## Added +- Comprehensive structural validation across all PE subsystems +- New checks for entrypoint mapping, section flags, RVA graph consistency, TLS callbacks, and certificate bounds +- Region‑specific entropy validation +- Deterministic structural anomaly surfacing in heuristics layer +- Extensive new structural and heuristic tests +- Snapshot tests ensuring deterministic output + +## Changed +- Reworked entrypoint validator with correct RVA→file offset mapping +- Expanded section validator with overlap, ordering, and flag‑consistency checks +- Strengthened optional header validation (alignment, size fields, directory count) +- Hardened RVA graph validator (bounds, mapping, overlap) +- Improved TLS validator (range, callbacks, mapping) +- Improved signature validator (symmetry, bounds, type/revision checks) +- Refined entropy validator (low entropy, region entropy, uniformity) + +## Fixed +- Conceptual inconsistencies around RVA vs file offsets +- Redundant or contradictory structural checks +- Missing structural anomalies in several validators +- Inconsistent or unclear ReasonCodes +- Edge‑case crashes on malformed or truncated binaries + +## Removed +- No removals in this release + +## Notes +- v0.7.3 remains strictly static-only +- No dynamic analysis, unpacking, emulation, or new dependencies introduced + +--- + +# v0.7.2 — Dependency fix +**Released: 2026‑05‑01** + +## Added +- Required `idna` dependency for punycode and Unicode domain handling +- No behavioural changes to extractors +- No schema changes +- Fully compatible with v0.7.1 + +--- + # **v0.7.1 — Heuristics Engine Expansion & Structural Analysis Improvements** -**Released: 2026‑05‑??** +**Released: 2026‑05‑01** v0.7.1 delivers a major upgrade to IOCX’s **PE heuristics engine**, **extractor correctness**, and **adversarial‑input resilience**. This release introduces six new structural heuristics, broad extractor hardening, and a significantly expanded adversarial test suite — including **full adversarial coverage for every IOC category**. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d25e570..eaf5d56 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -6,20 +6,24 @@ IOCX is part of the MalX Labs ecosystem — a family of modern, deterministic, d We welcome contributions of all kinds: bug fixes, static‑analysis improvements, new extractors, documentation updates, and thoughtful design discussions. This guide explains how to contribute effectively while keeping IOCX predictable, secure, and maintainable. +--- + ## Project Philosophy IOCX is intentionally: -- Minimal — extremely small dependency footprint -- Secure — safe handling of untrusted input -- Deterministic — no network access, no non-deterministic behaviour -- Extensible — new static‑analysis modules can be added cleanly +- **Minimal** — extremely small dependency footprint +- **Secure** — safe handling of untrusted input +- **Deterministic** — no network access, no non‑deterministic behaviour +- **Extensible** — new static‑analysis modules can be added cleanly All contributions must align with these principles. +--- + ## Core vs Plugins -IOCX has a strict boundary between core functionality and plugin‑based extensions. +IOCX has a strict boundary between **core functionality** and **plugin‑based extensions**. This keeps the core predictable and universally safe while allowing users to extend IOCX for their own environments. ### What Belongs in the Core @@ -51,7 +55,7 @@ Plugins are for functionality that is: - optional or environment‑specific - based on external data - organisation‑specific -- user-maintained +- user‑maintained - likely to evolve independently Examples: @@ -65,6 +69,8 @@ If the information comes from the user’s environment, it belongs in a plugin. This separation keeps IOCX clean, predictable, and safe to run anywhere. +--- + ## How to Contribute ### Fix bugs @@ -80,12 +86,12 @@ Open an issue or submit a PR with: Regex‑based extractors live under: ``` -detectors/extractors/ +iocx/detectors/extractors/ ``` Please include: -- a clear, well-scoped regex +- a clear, well‑scoped regex - validation logic - test cases - example inputs @@ -102,7 +108,7 @@ Enhancements to metadata extraction, imports, sections, or resources are welcome - static - deterministic -- dependency-minimal +- dependency‑minimal ### Add synthetic test samples @@ -113,67 +119,69 @@ See the “Testing” section below. Better examples, diagrams, and explanations are always appreciated. -### Contribution Process +--- + +## Contribution Process -1. Fork the repository +1. **Fork the repository** ```bash git clone https://github.com/iocx-dev/iocx.git - ``` -2. Create a feature branch +2. **Create a feature branch** ```bash git checkout -b feature/my-improvement - ``` -3. Install locally +3. **Install locally** ```bash pip install -e . ``` -4. Run tests +4. **Run tests** ```bash pytest ``` -5. Run security checks +5. **Run security checks** ```bash bandit -r iocx -lll pip-audit --skip-editable ``` -6. Open a Pull Request +6. **Open a Pull Request** -- Target the main branch +- Target the `main` branch - Describe what you changed and why - Link any related issues CI will run automatically. +--- + ## Testing IOCX is designed to be **safe to develop on any machine**. ### Do NOT: -- Upload or commit real malware -- Submit password‑protected malware archives -- Include malicious payloads or exploit code -- Add samples requiring execution to analyse +- upload or commit real malware +- submit password‑protected malware archives +- include malicious payloads or exploit code +- add samples requiring execution to analyse ### Do: -- Use synthetic PE files -- Embed fake IOCs inside harmless executables -- Use benign Windows binaries for structural testing -- Use public test files like EICAR or GTUBE -- Add text files containing mixed IOCs +- use synthetic PE files +- embed fake IOCs inside harmless executables +- use benign Windows binaries for structural testing +- use public test files like EICAR or GTUBE +- add text files containing mixed IOCs If unsure, open an issue before submitting. @@ -183,26 +191,29 @@ All new features should include tests. Bug fixes should include a test that reproduces the issue. Tests live in: -```plaintext + +``` tests/ ``` We use pytest. +--- + ## Adding New Extractors Extractors live in: -```plaintext +``` iocx/detectors/extractors/ ``` To add one: -- Create a new file in that directory -- Follow existing patterns -- Ensure it registers itself on import -- Add tests under `tests/unit/extractors/` +- create a new file in that directory +- follow existing patterns +- ensure it registers itself on import +- add tests under `tests/unit/extractors/` Extractors must be: @@ -210,6 +221,8 @@ Extractors must be: - side‑effect‑free - safe for untrusted input +--- + ## Code Style We keep the codebase clean and consistent. @@ -225,20 +238,48 @@ ruff check iocx black iocx ``` +--- + ## Security If you discover a security issue, do not open a GitHub issue. +Follow the instructions in `SECURITY.md`. -Follow the instructions in SECURITY.md. +--- ## Code of Conduct Be respectful, constructive, and supportive. We aim for a collaborative, professional environment. -## License +--- + +## Licensing of Contributions + +By contributing to IOCX, you agree that: + +- Your contributions are licensed under the **Mozilla Public License 2.0 (MPL‑2.0)**. +- You grant the project maintainers the right to **dual‑license your contributions** under commercial terms as part of the IOCX open‑core model. +- You retain copyright to your contributions. + +This ensures: + +- the open‑source core remains healthy +- improvements remain open +- commercial customers can use IOCX under proprietary terms +- your work is properly attributed + +By submitting a contribution, you certify that you have the right to do so and that your contribution does not violate any third-party rights. + +--- + +## Trademark Notice + +Contributors may not use the IOCX name in a way that implies endorsement. +See [TRADEMARK_POLICY.md](TRADEMARK_POLICY.md) for details. +See [LICENSE](LICENSE) for full MPL-2.0 terms. -By contributing, you agree that your contributions are licensed under the project's MIT License. +--- ## Thank You diff --git a/LICENSE b/LICENSE index e6125ec..d0a1fa1 100644 --- a/LICENSE +++ b/LICENSE @@ -1,9 +1,373 @@ -MIT License +Mozilla Public License Version 2.0 +================================== -Copyright © 2026 MalX Labs (All rights reserved for the IOCX project identity and branding) +1. Definitions +-------------- -Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: +1.1. "Contributor" + means each individual or legal entity that creates, contributes to + the creation of, or owns Covered Software. -The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. +1.2. "Contributor Version" + means the combination of the Contributions of others (if any) used + by a Contributor and that particular Contributor's Contribution. -THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +1.3. "Contribution" + means Covered Software of a particular Contributor. + +1.4. "Covered Software" + means Source Code Form to which the initial Contributor has attached + the notice in Exhibit A, the Executable Form of such Source Code + Form, and Modifications of such Source Code Form, in each case + including portions thereof. + +1.5. "Incompatible With Secondary Licenses" + means + + (a) that the initial Contributor has attached the notice described + in Exhibit B to the Covered Software; or + + (b) that the Covered Software was made available under the terms of + version 1.1 or earlier of the License, but not also under the + terms of a Secondary License. + +1.6. "Executable Form" + means any form of the work other than Source Code Form. + +1.7. "Larger Work" + means a work that combines Covered Software with other material, in + a separate file or files, that is not Covered Software. + +1.8. "License" + means this document. + +1.9. "Licensable" + means having the right to grant, to the maximum extent possible, + whether at the time of the initial grant or subsequently, any and + all of the rights conveyed by this License. + +1.10. "Modifications" + means any of the following: + + (a) any file in Source Code Form that results from an addition to, + deletion from, or modification of the contents of Covered + Software; or + + (b) any new file in Source Code Form that contains any Covered + Software. + +1.11. "Patent Claims" of a Contributor + means any patent claim(s), including without limitation, method, + process, and apparatus claims, in any patent Licensable by such + Contributor that would be infringed, but for the grant of the + License, by the making, using, selling, offering for sale, having + made, import, or transfer of either its Contributions or its + Contributor Version. + +1.12. "Secondary License" + means either the GNU General Public License, Version 2.0, the GNU + Lesser General Public License, Version 2.1, the GNU Affero General + Public License, Version 3.0, or any later versions of those + licenses. + +1.13. "Source Code Form" + means the form of the work preferred for making modifications. + +1.14. "You" (or "Your") + means an individual or a legal entity exercising rights under this + License. For legal entities, "You" includes any entity that + controls, is controlled by, or is under common control with You. For + purposes of this definition, "control" means (a) the power, direct + or indirect, to cause the direction or management of such entity, + whether by contract or otherwise, or (b) ownership of more than + fifty percent (50%) of the outstanding shares or beneficial + ownership of such entity. + +2. License Grants and Conditions +-------------------------------- + +2.1. Grants + +Each Contributor hereby grants You a world-wide, royalty-free, +non-exclusive license: + +(a) under intellectual property rights (other than patent or trademark) + Licensable by such Contributor to use, reproduce, make available, + modify, display, perform, distribute, and otherwise exploit its + Contributions, either on an unmodified basis, with Modifications, or + as part of a Larger Work; and + +(b) under Patent Claims of such Contributor to make, use, sell, offer + for sale, have made, import, and otherwise transfer either its + Contributions or its Contributor Version. + +2.2. Effective Date + +The licenses granted in Section 2.1 with respect to any Contribution +become effective for each Contribution on the date the Contributor first +distributes such Contribution. + +2.3. Limitations on Grant Scope + +The licenses granted in this Section 2 are the only rights granted under +this License. No additional rights or licenses will be implied from the +distribution or licensing of Covered Software under this License. +Notwithstanding Section 2.1(b) above, no patent license is granted by a +Contributor: + +(a) for any code that a Contributor has removed from Covered Software; + or + +(b) for infringements caused by: (i) Your and any other third party's + modifications of Covered Software, or (ii) the combination of its + Contributions with other software (except as part of its Contributor + Version); or + +(c) under Patent Claims infringed by Covered Software in the absence of + its Contributions. + +This License does not grant any rights in the trademarks, service marks, +or logos of any Contributor (except as may be necessary to comply with +the notice requirements in Section 3.4). + +2.4. Subsequent Licenses + +No Contributor makes additional grants as a result of Your choice to +distribute the Covered Software under a subsequent version of this +License (see Section 10.2) or under the terms of a Secondary License (if +permitted under the terms of Section 3.3). + +2.5. Representation + +Each Contributor represents that the Contributor believes its +Contributions are its original creation(s) or it has sufficient rights +to grant the rights to its Contributions conveyed by this License. + +2.6. Fair Use + +This License is not intended to limit any rights You have under +applicable copyright doctrines of fair use, fair dealing, or other +equivalents. + +2.7. Conditions + +Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted +in Section 2.1. + +3. Responsibilities +------------------- + +3.1. Distribution of Source Form + +All distribution of Covered Software in Source Code Form, including any +Modifications that You create or to which You contribute, must be under +the terms of this License. You must inform recipients that the Source +Code Form of the Covered Software is governed by the terms of this +License, and how they can obtain a copy of this License. You may not +attempt to alter or restrict the recipients' rights in the Source Code +Form. + +3.2. Distribution of Executable Form + +If You distribute Covered Software in Executable Form then: + +(a) such Covered Software must also be made available in Source Code + Form, as described in Section 3.1, and You must inform recipients of + the Executable Form how they can obtain a copy of such Source Code + Form by reasonable means in a timely manner, at a charge no more + than the cost of distribution to the recipient; and + +(b) You may distribute such Executable Form under the terms of this + License, or sublicense it under different terms, provided that the + license for the Executable Form does not attempt to limit or alter + the recipients' rights in the Source Code Form under this License. + +3.3. Distribution of a Larger Work + +You may create and distribute a Larger Work under terms of Your choice, +provided that You also comply with the requirements of this License for +the Covered Software. If the Larger Work is a combination of Covered +Software with a work governed by one or more Secondary Licenses, and the +Covered Software is not Incompatible With Secondary Licenses, this +License permits You to additionally distribute such Covered Software +under the terms of such Secondary License(s), so that the recipient of +the Larger Work may, at their option, further distribute the Covered +Software under the terms of either this License or such Secondary +License(s). + +3.4. Notices + +You may not remove or alter the substance of any license notices +(including copyright notices, patent notices, disclaimers of warranty, +or limitations of liability) contained within the Source Code Form of +the Covered Software, except that You may alter any license notices to +the extent required to remedy known factual inaccuracies. + +3.5. Application of Additional Terms + +You may choose to offer, and to charge a fee for, warranty, support, +indemnity or liability obligations to one or more recipients of Covered +Software. However, You may do so only on Your own behalf, and not on +behalf of any Contributor. You must make it absolutely clear that any +such warranty, support, indemnity, or liability obligation is offered by +You alone, and You hereby agree to indemnify every Contributor for any +liability incurred by such Contributor as a result of warranty, support, +indemnity or liability terms You offer. You may include additional +disclaimers of warranty and limitations of liability specific to any +jurisdiction. + +4. Inability to Comply Due to Statute or Regulation +--------------------------------------------------- + +If it is impossible for You to comply with any of the terms of this +License with respect to some or all of the Covered Software due to +statute, judicial order, or regulation then You must: (a) comply with +the terms of this License to the maximum extent possible; and (b) +describe the limitations and the code they affect. Such description must +be placed in a text file included with all distributions of the Covered +Software under this License. Except to the extent prohibited by statute +or regulation, such description must be sufficiently detailed for a +recipient of ordinary skill to be able to understand it. + +5. Termination +-------------- + +5.1. The rights granted under this License will terminate automatically +if You fail to comply with any of its terms. However, if You become +compliant, then the rights granted under this License from a particular +Contributor are reinstated (a) provisionally, unless and until such +Contributor explicitly and finally terminates Your grants, and (b) on an +ongoing basis, if such Contributor fails to notify You of the +non-compliance by some reasonable means prior to 60 days after You have +come back into compliance. Moreover, Your grants from a particular +Contributor are reinstated on an ongoing basis if such Contributor +notifies You of the non-compliance by some reasonable means, this is the +first time You have received notice of non-compliance with this License +from such Contributor, and You become compliant prior to 30 days after +Your receipt of the notice. + +5.2. If You initiate litigation against any entity by asserting a patent +infringement claim (excluding declaratory judgment actions, +counter-claims, and cross-claims) alleging that a Contributor Version +directly or indirectly infringes any patent, then the rights granted to +You by any and all Contributors for the Covered Software under Section +2.1 of this License shall terminate. + +5.3. In the event of termination under Sections 5.1 or 5.2 above, all +end user license agreements (excluding distributors and resellers) which +have been validly granted by You or Your distributors under this License +prior to termination shall survive termination. + +************************************************************************ +* * +* 6. Disclaimer of Warranty * +* ------------------------- * +* * +* Covered Software is provided under this License on an "as is" * +* basis, without warranty of any kind, either expressed, implied, or * +* statutory, including, without limitation, warranties that the * +* Covered Software is free of defects, merchantable, fit for a * +* particular purpose or non-infringing. The entire risk as to the * +* quality and performance of the Covered Software is with You. * +* Should any Covered Software prove defective in any respect, You * +* (not any Contributor) assume the cost of any necessary servicing, * +* repair, or correction. This disclaimer of warranty constitutes an * +* essential part of this License. No use of any Covered Software is * +* authorized under this License except under this disclaimer. * +* * +************************************************************************ + +************************************************************************ +* * +* 7. Limitation of Liability * +* -------------------------- * +* * +* Under no circumstances and under no legal theory, whether tort * +* (including negligence), contract, or otherwise, shall any * +* Contributor, or anyone who distributes Covered Software as * +* permitted above, be liable to You for any direct, indirect, * +* special, incidental, or consequential damages of any character * +* including, without limitation, damages for lost profits, loss of * +* goodwill, work stoppage, computer failure or malfunction, or any * +* and all other commercial damages or losses, even if such party * +* shall have been informed of the possibility of such damages. This * +* limitation of liability shall not apply to liability for death or * +* personal injury resulting from such party's negligence to the * +* extent applicable law prohibits such limitation. Some * +* jurisdictions do not allow the exclusion or limitation of * +* incidental or consequential damages, so this exclusion and * +* limitation may not apply to You. * +* * +************************************************************************ + +8. Litigation +------------- + +Any litigation relating to this License may be brought only in the +courts of a jurisdiction where the defendant maintains its principal +place of business and such litigation shall be governed by laws of that +jurisdiction, without reference to its conflict-of-law provisions. +Nothing in this Section shall prevent a party's ability to bring +cross-claims or counter-claims. + +9. Miscellaneous +---------------- + +This License represents the complete agreement concerning the subject +matter hereof. If any provision of this License is held to be +unenforceable, such provision shall be reformed only to the extent +necessary to make it enforceable. Any law or regulation which provides +that the language of a contract shall be construed against the drafter +shall not be used to construe this License against a Contributor. + +10. Versions of the License +--------------------------- + +10.1. New Versions + +Mozilla Foundation is the license steward. Except as provided in Section +10.3, no one other than the license steward has the right to modify or +publish new versions of this License. Each version will be given a +distinguishing version number. + +10.2. Effect of New Versions + +You may distribute the Covered Software under the terms of the version +of the License under which You originally received the Covered Software, +or under the terms of any subsequent version published by the license +steward. + +10.3. Modified Versions + +If you create software not governed by this License, and you want to +create a new license for such software, you may create and use a +modified version of this License if you rename the license and remove +any references to the name of the license steward (except to note that +such modified license differs from this License). + +10.4. Distributing Source Code Form that is Incompatible With Secondary +Licenses + +If You choose to distribute Source Code Form that is Incompatible With +Secondary Licenses under the terms of this version of the License, the +notice described in Exhibit B of this License must be attached. + +Exhibit A - Source Code Form License Notice +------------------------------------------- + + This Source Code Form is subject to the terms of the Mozilla Public + License, v. 2.0. If a copy of the MPL was not distributed with this + file, You can obtain one at https://mozilla.org/MPL/2.0/. + +If it is not possible or desirable to put the notice in a particular +file, then You may include the notice in a location (such as a LICENSE +file in a relevant directory) where a recipient would be likely to look +for such a notice. + +You may add additional accurate notices of copyright ownership. + +Exhibit B - "Incompatible With Secondary Licenses" Notice +--------------------------------------------------------- + + This Source Code Form is "Incompatible With Secondary Licenses", as + defined by the Mozilla Public License, v. 2.0. diff --git a/README-pypi.md b/README-pypi.md index 1c408ee..c3384a5 100644 --- a/README-pypi.md +++ b/README-pypi.md @@ -1,115 +1,85 @@ -# IOCX — Static IOC Extraction Engine - +# **IOCX — Deterministic, Zero‑Risk IOC Extraction for Modern Security Pipelines** ### Official IOCX Project -This is the **official IOCX engine** for static IOC extraction and PE analysis. +**IOCX** is a high‑performance, deterministic static analysis engine for extracting Indicators of Compromise (IOCs) from binaries and text. +It exists for one reason: **to provide a fast, safe, predictable IOC extractor that DFIR teams and automation pipelines can trust.** -- **PyPI:** https://pypi.org/project/iocx/ -- **GitHub:** https://github.com/iocx-dev/iocx -- **Organisation:** https://github.com/iocx-dev -- **Website:** https://iocx.dev +- **PyPI:** [https://pypi.org/project/iocx/](https://pypi.org/project/iocx/) +- **GitHub:** [https://github.com/iocx-dev/iocx](https://github.com/iocx-dev/iocx) +- **Website:** [https://iocx.dev](https://iocx.dev) -IOCX is **not** an OSINT reputation checker, HTML report generator, or IP/domain scoring tool. -It is a **static analysis engine** focused on extracting Indicators of Compromise (IOCs) from binaries and text with deterministic, snapshot‑stable output. +IOCX is **not** an OSINT reputation checker or scoring tool. +It is a **binary‑aware IOC engine** built for DFIR, SOC automation, CI/CD, and threat‑intel ingestion. --- -## What IOCX does - -IOCX is a fast, safe, deterministic engine for extracting Indicators of Compromise (IOCs) from: - -- Windows PE files -- Raw text -- Logs and unstructured data - -It performs **pure static analysis** — no execution, no sandboxing, no risk. - -## What's new in v0.7.1 - -### **Bare Domain Extractor Overhaul** -- Expanded **TLD allow‑list** and strengthened **BAD_TLD deny‑list** -- Refined boundary rules to reduce false positives in noisy text -- Added **punycode decoding**, Unicode script classification, and homoglyph/confusable detection -- Hardened regex for **predictable linear performance** under adversarial input -- New metadata fields: - - `punycode`, `punycode_decodes_to_unicode` - - `decoded_unicode` - - `contains_confusables` - - `script` - -### **Performance guarantees** -- **~150-300 MB/s** for individual detectors (domains, crypto, filepaths, IPs) -- **Strict linear scaling** across all detectors -- Pathological punycode, IPv6, and filepath inputs complete in **< 15 ms** -- End‑to‑end engine throughput: **20-30 MB/s** +## Why IOCX Exists -### **Heuristic engine and adversarial fixture expansion** -- Deterministic section overlap and alignment, optional header consistency, entrypoint mapping, data directory anomalies, and import directory validity heuristics -- Adversarial fixtures covering all new heuristics and IOC subsystems. +Most IOC extractors are: -### **Documentation updates** -- New adversarial appendices -- New Performance guarantees -- Expanded schema‑contract guidance +- regex‑only +- non‑deterministic +- slow under adversarial input +- unaware of binary structure +- unstable across versions -## Recent changes +**IOCX fixes all of that.** -### v0.7.0 +It provides: -- **Deterministic heuristic engine** +- **snapshot‑stable output** +- **deterministic PE metadata extraction** +- **binary‑aware heuristics** +- **strict performance guarantees** +- **a stable JSON schema** +- **safe, static‑only analysis** -Anti‑debug APIs, TLS anomalies, packer‑like signals, RWX sections, import anomalies. +If you need predictable, automatable IOC extraction — IOCX is built for you. -- **First adversarial samples added** - -`heuristic_rich.exe`, `crypto_entropy_payload.exe`, `string_obfuscation_tricks.exe`. - -- **Snapshot‑based contract testing** - -Deterministic output for sections, imports, heuristics, and IOCs. - -- **Rich Header crash fixed** - -Deep hex‑encoding of nested byte structures prevents JSON serialization failures. +--- -- **Documentation updates** +## Version highlights (v0.7.3) -New appendices and deterministic‑output guidance. +- Major hardening of all PE structural validators +- Deterministic, snapshot‑stable output across malformed binaries +- Stronger section, entrypoint, RVA‑graph, TLS, and signature checks +- Corrected RVA→file‑offset mapping for overlay detection +- Improved entropy analysis with clearer, conservative signals +- Cleaner, consistent `ReasonCodes` across the engine +- Expanded structural + heuristic test coverage -### v0.6.0 +--- -- Stable JSON schema across all analysis levels -- Deterministic PE metadata (headers, TLS, optional header, signatures) -- Guaranteed IOC categories (always present, empty arrays when no matches) -- Formalised analysis levels: - - core behaviour → no analysis block - - basic → section layout + entropy - - deep → adds obfuscation heuristics - - full → extended metadata summaries -- Schema‑contract tests to prevent drift across releases +## **Performance** -## Schema stability +- **150–300 MB/s on raw text** +- **6–15 MB/s on typical PEs** +- **Predictable** even under worst‑case adversarial load. -IOCX guarantees a stable JSON schema across releases. JSON objects are unordered by definition, so consumers should rely on field presence and structure rather than positional ordering. +--- ## Features -- Extracts IOCs from Windows PE files and raw text -- Detects URLs, domains, IPv4/IPv6, file paths, hashes, emails, Base64 -- Crypto wallet detection (Ethereum, Bitcoin) -- Deterministic output suitable for automation and CI/CD -- Multi-level analysis depth (`basic` → `full`) -- Minimal dependencies, safe for enterprise environments -- CLI and Python API -- Binary-aware static analysis with entropy, sections, imports, TLS, signatures +- Extracts IOCs from PE files and raw text +- Detects domains, URLs, IPv4/IPv6, file paths, hashes, emails, Base64 +- Crypto wallet detection (BTC, ETH) +- Deterministic, snapshot‑stable JSON output +- Multi‑level analysis depth (`basic` → `full`) +- Binary‑aware static analysis (entropy, sections, imports, TLS, signatures) +- Lightweight plugin system +- CLI + Python API + +--- -## Installation +## Install ```bash pip install iocx ``` -## CLI Usage +--- + +## CLI ```bash iocx suspicious.exe @@ -119,6 +89,8 @@ iocx suspicious.exe echo "Visit http://bad.example.com" | iocx - ``` +--- + ## Python API ```python @@ -129,37 +101,20 @@ results = engine.extract("suspicious.exe") print(results) ``` -## Why IOCX? - -- Static‑only design (never executes untrusted code) -- Binary‑aware IOC extraction -- Stable, predictable JSON schema -- High performance: ~25-30 MB/s end-to-end, with individual detectors reaching 150-450 MB/s throughput) -- Ideal for DFIR, SOC automation, CI/CD, and threat‑intel pipelines - -## Project identity & naming - -The name **IOCX** refers specifically to this project and its associated PyPI package and repositories under the **iocx-dev** organisation. - -Third‑party tools **must not**: - -- Use `iocx` as their repository name -- Present themselves as the IOCX engine -- Use the PyPI badge for this package in a way that implies authorship -- Imply official affiliation or endorsement without permission +--- -Community tools that integrate with IOCX are encouraged to use names like: +## Project Identity -- `iocx-` -- `iocx-plugin-` -- `iocx-extension-` +The name **IOCX** refers exclusively to this project and the repositories under **iocx-dev**. +Third‑party tools must not present themselves as the IOCX engine. -## Extensibility +Community integrations should use names like: -IOCX includes a lightweight plugin system for custom detectors, parsers, and transformation rules. Plugins can emit new IOC categories, override built‑in behaviour, or integrate IOCX into larger analysis pipelines. +- `iocx-` +- `iocx-extension-` -See the documentation for details on writing detectors and plugins. +--- ## License -MIT License +**MPL‑2.0** diff --git a/README.md b/README.md index 43091a7..13fd1a5 100644 --- a/README.md +++ b/README.md @@ -1,496 +1,268 @@ -# Official IOCX Project - -This is the **original IOCX engine** for static IOC extraction and PE analysis. -Any other repositories using the name "iocx" are **not affiliated** with this project. - -- PyPI: [https://pypi.org/project/iocx/](https://pypi.org/project/iocx/) -- Github: [https://github.com/iocx-dev/iocx](https://github.com/iocx-dev/iocx) -- Website: [https://iocx.dev/](https://iocx.dev/) +# IOCX +### **Deterministic, Zero‑Risk IOC Extraction for Modern Security Pipelines**

- - PyPI Version - - Coverage - Tests - Python Version - - License - - - Build Status - - - Contract tests - - - Performance Summary - + IOCX Demo

+

Static IOC extraction from a PE file using the IOCX CLI

+

- IOCX Demo -

-

- Static IOC extraction from a PE file using the IOCX CLI + + + + + +

-## IOCX — Static IOC Extraction for Binaries, Text, and Artifacts - -**Fast, safe, deterministic IOC extraction for DFIR, SOC automation, and large-scale threat analysis.** - -IOCX is a lightweight, extensible engine for extracting Indicators of Compromise (IOCs) and structural metadata using **pure static analysis**. No execution. No sandboxing. No risk. - -Built for: - -- DFIR workflows -- SOC automation -- Threat-intel pipelines -- CI/CD security checks -- Large‑scale batch processing - -IOCX is a core component of the MalX Labs ecosystem for scalable, modern threat‑analysis tooling. - -## Why IOCX? - -IOCX is designed for environments where **safety, determinism, and automation** matter. Unlike extractors that operate only on raw text, IOCX includes: - -- Binary‑aware static analysis -- A plugin-friendly rule system -- A stable JSON schema suitable for pipelines and long-term integrations - -## Key advantages - -- **Static‑only design** — never executes untrusted code -- **Binary parsing** — PE-aware extraction with section analysis and structural heuristics -- **Analysis level** — basic, deep, and full for performance-tuned workflows -- **Deterministic behaviour** — stable output and predictable performance -- **Extensible rule engine** — custom detectors, parsers, and plugins -- **Consistent JSON schema** — clean integration with SIEM/SOAR -- **Low dependency footprint** — safe for enterprise environments -- **Pipeline-ready** — fast start‑up, fast throughput - -## What IOCX *Is Not* - -To avoid confusion: - -- Not a sandbox -- Not a behavioural analysis tool -- Not an emulator -- Not an enrichment engine +# Official IOCX Project -IOCX is **static extraction only**, by design. +This is the original IOCX engine for deterministic static IOC extraction and PE analysis. +Any other repositories using the name "iocx" are **not affiliated** with this project. -## Use Cases +**Official links:** -### SOC & Incident Response -- Extract indicators from emails, alerts, or analyst clipboard text -- Parse IOCs from reports into structured JSON -- Safely inspect malware samples without execution +- PyPI: [https://pypi.org/project/iocx/](https://pypi.org/project/iocx/) +- Github: [https://github.com/iocx-dev/iocx](https://github.com/iocx-dev/iocx) +- Website: [https://iocx.dev/](https://iocx.dev/) -### Threat Intelligence Processing -- Normalise indicators from feeds -- Batch‑process unstructured text -- Build enrichment pipelines on top of deterministic output +--- -### CI/CD & DevSecOps -- Scan binaries for embedded indicators before publishing -- Integrate IOC extraction into automated checks -- Detect accidental inclusion of URLs or addresses in builds +# Why IOCX Matters -### Bulk Automation & Scripting -- Pipe logs or artifacts through IOCX -- Use the Python API for ETL or batch workflows -- Extend with custom detectors for internal patterns +Modern malware is **adversarial by default** — malformed, evasive, and engineered to break naive extractors. -## Version Highlights +- **Binary‑unaware tools** collapse under malformed PEs +- **Sandboxes** are unsafe and unusable in CI/CD +- **Reproducibility** is essential for automated pipelines -### v0.7.2 — Dependency Fix +**IOCX is built for environments where correctness and determinism actually matter.** -A small patch release correcting a missing dependency: +--- -- Added required `idna` dependency for punycode and Unicode domain handling -- No behavioural changes to extractors -- No schema changes -- **Fully compatible with v0.7.1** +# The IOCX Engine -### v0.7.1 — Adversarial Heuristics Expansion & Parser Hardening +**IOCX is the official static IOC extraction engine** — a deterministic, binary‑aware system built for DFIR, SOC automation, CI/CD security, and large‑scale threat‑intel pipelines. -v0.7.1 strengthens IOCX’s PE analysis layer with **six new structural heuristics** and introduces a broad adversarial corpus to validate them. This release focuses on robustness, determinism, and resilience against malformed binaries and hostile IOC‑like strings. +Unlike regex‑only extractors or sandbox‑dependent tools, IOCX performs: -- **New PE heuristics added** - - Section overlap detection - - Section alignment validation - - Optional‑header consistency checks - - Entrypoint → section mapping validation - - Data‑directory anomaly detection - - Import‑directory validity checks -- **Expanded adversarial PE corpus**: malformed imports, corrupted RVAs, invalid optional headers, truncated Rich headers, overlapping sections, franken‑PE hybrids -- **Adversarial fixtures for *all* IOC categories**: crypto, homoglyph domains, malformed URLs, broken IPs, long paths, noisy hashes, invalid base64, deceptive emails -- **Domain, URL, crypto and hash **extractor hardening** -- **Deterministic, JSON‑safe output**: all new samples snapshot‑validated +- **pure static analysis** +- **zero execution risk** +- **stable, deterministic output** +- **adversarial‑tested heuristics** -This release improves IOCX’s **structural awareness**, **error resilience**, and **adversarial coverage**. +It is a core component of the MalX Labs ecosystem for scalable, modern threat analysis. -### v0.7.0 — Deterministic Heuristics & Adversarial Testing Foundation +--- -- Deterministic heuristics: anti‑debug APIs, TLS anomalies, packer‑like behaviour, RWX sections, import anomalies. -- Adversarial testing: initial Layer-3 samples validating heuristics, entropy analysis and IOC extraction. -- Contract testing: deterministic snapshots for sections, imports, heuristics, and IOCs. -- Bug fix: resolved a crash caused by non‑UTF8 Rich Header bytes -- Docs: new deterministic‑output section and adversarial sample appendices. +# Try IOCX in 10 Seconds -### v0.6.0 — Stable Output Schema, Deterministic PE Metadata, Contract‑Safe Analysis Levels +```bash +echo "http://malicious.example" | iocx - +``` -- Fully stable JSON schema -- Strict structural guarantees for `iocs`, `metadata`, and `analysis` -- Normalised PE metadata for deterministic output -- All IOC categories always present -- Formalised analysis‑level behaviour -- Snapshot‑contract tests to prevent schema drift +Or scan a PE file safely: -### v0.5.0 — Analysis Levels, PE Section Analysis, Obfuscation Hints +```bash +iocx suspicious.exe -a deep +``` -- New analysis‑level system -- PE structural analysis: section layout, raw/virtual sizes, entropy -- Obfuscation heuristics -- Clean, stable JSON schema +--- -### v0.4.0 — Plugin Architecture, Custom Detectors, Cleaner Internals +# Why IOCX Exists -- Plugin‑ready rule engine -- Unified detection flow -- Support for custom regex detectors +Security teams face three persistent problems: -### v0.3.0 — Stronger Architecture, New Crypto IOC Detection +1. **Regex extractors** break under adversarial input +2. **Sandboxing** is unsafe, slow, and unsuitable for automation +3. **Most IOC tools** are inconsistent, slow, or produce subtly different output between runs -- Ethereum & Bitcoin wallet detection +IOCX solves this with a **deterministic, static‑only engine** designed for automation, safety, and scale. -### v0.2.0 — High‑Reliability IP Detection +--- -- Major improvements to IPv4/IPv6 extraction +# What IOCX *Is Not* -## **Performance Profiles** +IOCX is intentionally **not**: -IOCX has **three distinct performance profiles**, each reflecting a different class of workload. -This separation gives DFIR, SOC, and CI/CD users a realistic understanding of how the engine behaves across text, normal binaries, and adversarial samples. +- a sandbox +- a behavioural analysis tool +- an emulator +- an enrichment engine -

- - IOCX Performance Profile - -

+It never executes untrusted code. +It never performs dynamic analysis. +It is **static‑only by design** — for safety, determinism, and CI/CD compatibility. -### **1. Raw IOC Extraction (Text, Logs, Buffers)** +--- -**Fast path — no PE parsing, no heuristics.** +# Design Philosophy -These benchmarks measure the raw detectors operating on flat buffers. -They represent the maximum throughput of the IOC extraction engine. +IOCX is engineered for the realities of modern malware, not the assumptions of legacy tools. -| Detector | 1 MB Time | Throughput | -|----------------|-----------|---------------| -| **Crypto** | 0.0037 s | **~270 MB/s** | -| **Filepaths** | 0.0040 s | **~250 MB/s** | -| **IP** | 0.0064 s | **~156 MB/s** | -| **Domains** | 0.0033 s | **~300 MB/s** | +### **1. Determinism over ambiguity** +Stable, reproducible output — no randomness, no volatility. -**Summary:** -- **~150–300 MB/s** sustained throughput -- **~0.003–0.006 s per MB** -- Linear scaling from 100 KB → 1.5 MB -- Worst‑case blobs (IPv6, ETH‑like, deep UNIX paths, punycode-like) remain sub‑millisecond to low‑millisecond +### **2. Static over dynamic** +Execution is unsafe. Static analysis is predictable, scalable, and CI‑friendly. -This is ideal for SOC pipelines, log processing, and bulk text extraction. +### **3. Adversarial‑first engineering** +Malformed PEs, corrupted RVAs, hostile strings — IOCX treats them as normal input. -### **2. Typical PE Files (~39 KB)** +### **4. Schema stability as a contract** +Downstream systems should never break on upgrade. -**Normal Windows executables with standard imports and minimal data.** +### **5. Performance without compromise** +150–300 MB/s on raw text. +6–15 MB/s on typical PEs. +Predictable even under worst‑case adversarial load. -Represents the cost of full PE parsing + IOC extraction on a clean, realistic binary. +--- -- **Typical PE:** 0.0132 s -- **Typical PE (with heuristics):** 0.0153 s -- **Throughput:** **~6–15 MB/s** (full engine) -- **Heuristics:** usually none or minimal +# What Makes IOCX Different -This profile reflects what IOCX will see in CI/CD pipelines, internal tooling, and benign executables. +| Capability | **IOCX** | Typical IOC Extractors | Sandbox / Dynamic Tools | +|-----------|-----------|------------------------|--------------------------| +| **Safety** | Zero‑execution, static‑only | Regex‑only, no binary safety | Executes untrusted code (high‑risk) | +| **Determinism** | Fully deterministic output | Non‑deterministic under noise | Non‑deterministic by design | +| **Binary Awareness** | Full PE parsing, heuristics | No binary support | Yes, but unsafe + slow | +| **Adversarial Resilience** | Tested against malformed PEs, hostile strings | Easily bypassed | Often crashes or misclassifies | +| **Performance** | 150–300 MB/s (text), 6–15 MB/s (PE) | Highly variable | Extremely slow | +| **CI/CD Friendly** | Yes — safe, deterministic, fast | Partial | No — unsafe for pipelines | +| **Schema Stability** | Guaranteed | Rare | None | -### **3. Adversarial Dense PE (1.5 MB)** +**In short:** IOCX is built for *real adversarial reality*, not idealized input. -**Worst‑case full‑engine workload.** +--- -A synthetic PE designed to stress: +# Use Cases -- section scanning -- RVA mapping -- import/TLS analysis -- heuristic engine -- IOC extraction across large, dense regions +### CI/CD & DevSecOps +- Scan binaries before release +- Detect accidental URLs, IPs, or secrets in builds +- Enforce security gates with zero execution risk -- **Dense PE:** 0.1977 s -- **Throughput:** **~7.6 MB/s** -- **Triggers:** TLS anomalies, structural anomalies, anti‑debug patterns +### SOC & Incident Response +- Extract indicators from alerts or analyst clipboard text +- Safely inspect malware samples without execution +- Normalize IOCs into structured JSON -This demonstrates IOCX’s stability and predictability under adversarial conditions. +### Threat Intelligence +- Process feeds at scale +- Parse unstructured reports +- Build enrichment pipelines on deterministic output -### **4. Full Engine (Non‑PE) End‑to‑End Path** +### Automation & Scripting +- Pipe logs or artifacts through IOCX +- Use the Python API for ETL or batch workflows +- Extend with custom detectors -For completeness, the full engine path on raw data (including overhead): +--- -- **1 MB end‑to‑end:** 0.0411 s +# Performance Profiles -This includes engine setup, routing, and output formatting — not just detector throughput. +### **1. Raw IOC Extraction (Text, Logs, Buffers)** +**150–300 MB/s** sustained throughput +Fast path — no PE parsing. -### **Summary Table** +| Detector | 1 MB Time | Throughput | +|----------|-----------|------------| +| Crypto | 0.0037 s | ~270 MB/s | +| Filepaths | 0.0040 s | ~250 MB/s | +| IP | 0.0064 s | ~156 MB/s | +| Domains | 0.0033 s | ~300 MB/s | -| Workload Type | Size | Time | Throughput | Notes | -|------------------------------------|--------|----------|---------------|---------------------------| -| **Raw IOC extraction (domains)** | 1 MB | 0.0033 s | **~300 MB/s** | Fast path | -| **Raw IOC extraction (crypto)** | 1 MB | 0.0037 s | **~270 MB/s** | Fast path | -| **Raw IOC extraction (filepaths)** | 1 MB | 0.0040 s | **~250 MB/s** | Fast path | -| **Raw IOC extraction (IP)** | 1 MB | 0.0064 s | **~156 MB/s** | Fast path | -| **Typical PE** | 39 KB | 0.0132 s | **6–15 MB/s** | Normal binaries | -| **Typical PE + heuristics** | 39 KB | 0.0153 s | **6–15 MB/s** | Full analysis | -| **Adversarial dense PE** | 1.5 MB | 0.1977 s | **~7.6 MB/s** | Worst‑case | -| **Full engine (non‑PE)** | 1 MB | 0.0411 s | **~24 MB/s** | Includes routing/overhead | +--- -### **Interpretation** +### **2. Typical PE Files (~39 KB)** +- **0.0132 s** (typical) +- **0.0153 s** (with heuristics) +- **6–15 MB/s** throughput -- IOCX is **extremely fast** on raw text and log data (150–300 MB/s). -- IOCX is **fast and predictable** on normal Windows binaries (~13–15 ms). -- IOCX remains **stable and linear** even on adversarial PE files designed to stress the engine. -- No pathological slowdowns, no exponential behaviour, no regex backtracking stalls. +--- -This three‑tier model provides a realistic, defensible performance profile for DFIR, SOC automation, and CI/CD environments. +### **3. Adversarial Dense PE (1.5 MB)** +- **0.1977 s** +- **~7.6 MB/s** throughput +- Triggers TLS anomalies, structural anomalies, anti‑debug patterns -## Example JSON Output +--- -
-Show Example JSON Output -
+### **4. Full Engine (Non‑PE)** +- **1 MB:** 0.0411 s -```json -$ iocx chaos_corpus.json -{ - "file": "examples/samples/structured/chaos_corpus.json", - "type": "text", - "iocs": { - "urls": [ - "http://[2001:db8::1]:443" - ], - "domains": [], - "ips": [ - "2001:db8::1", - "10.0.0.1", - "192.168.1.10", - "fe80::dead:beef%eth0", - "1.2.3.4", - "fe80::1%eth0", - "192.168.1.110", - "fe80::1%eth0fe80", - "::2%eth1", - "2001:db8::" - ], - "hashes": [], - "emails": [], - "filepaths": [], - "base64": [], - "crypto.btc": [], - "crypto.eth": [] - }, - "metadata": {} -} +--- -``` +# Version Highlights -
-Chaos Corpus: Input → Extracted Output → Explanation +Show Version History
-| Input | Extracted Output | Explanation | -|---------------------------------------|------------------------------------------|---------------------------------------------| -| fe80::dead:beef%eth0/garbage | fe80::dead:beef%eth0 | Salvaged valid IPv6, junk ignored. | -| xxx192.168.1.10yyy | 192.168.1.10 | IPv4 inside junk text. | -| DROP:client=10.0.0.1;;;ERR | 10.0.0.1 | IPv4 from noisy log field. | -| [2001:db8::1]::::443 | 2001:db8::1 | IPv6 and IPv6+port extracted. | -| | 2001:db8::1:443 | | -| GET http://[2001:db8::1]:443/index | http://[2001:db8::1]:443 | URL with IPv6 parsed correctly. | -| udp://[fe80::1%eth0]::::53 | fe80::1%eth0 | Concatenated IPv6 split up. | -| 192.168.1.110.0.0.1 | 192.168.1.110 | Combined IP segment salvaged. | -| fe80::1%eth0fe80::2%eth1 | fe80::1%eth0fe80, ::2%eth1 | Concatenated IPv6 split up. | -| 2001:db8::12001:db8::2 | 2001:db8:: | Longest valid IPv6 prefix found. | -| 256.256.256:256 | — | Invalid indicator ignored. | -
- -## Project Identity & Naming - -IOCX is the name of the official static IOC extraction engine published on: - -- **PyPI**: https://pypi.org/project/iocx/ -- **GitHub**: https://github.com/iocx-dev/iocx - -The IOCX name, branding, and project identity refer **exclusively** to this project and its associated packages, documentation, and releases. - -To protect users from confusion and maintain a healthy ecosystem: - -### What third‑party projects may NOT do - -- Use `iocx` as the name of their repository -- Publish tools named “iocx” that are not this project -- Present themselves as the creators or maintainers of IOCX -- Use the PyPI badge for the official `iocx` package -- Imply official affiliation or endorsement without permission - -These actions mislead users and violate the identity of the project. - -### Allowed & encouraged - -Third‑party tools, plugins, and integrations are welcome. -To avoid confusion, they should follow this naming pattern: - -- `iocx-` -- `iocx-plugin-` -- `iocx-extension-` - -Examples: - -- `iocx-osint-enricher` -- `iocx-detector-custom` +### **v0.7.3 — Structural Correctness & Deterministic Heuristics** +- Major hardening of all PE structural validators +- Deterministic, snapshot‑stable behaviour +- Clear, consistent ReasonCodes +- Stronger heuristics built on structural truth -### Why this matters +--- -IOCX is used in DFIR, SOC automation, CI/CD pipelines, and threat‑intel workflows. -Clear naming ensures: +### **v0.7.2 — Dependency Fix** +- Added missing `idna` dependency +- No behavioural or schema changes -- Users know which tool is the **official** IOCX engine -- Third‑party tools are discoverable without causing confusion -- The ecosystem grows in a structured, healthy way +--- -If you are building something that integrates with IOCX and want guidance on naming or attribution, feel free to open an issue +### **v0.7.1 — Adversarial Heuristics Expansion & Parser Hardening** +- Six new PE heuristics +- Expanded adversarial PE corpus +- Hardened domain/URL/crypto/hash extractors +- Deterministic snapshot‑validated output -## Official IOCX Repositories +--- -- Core Engine: https://github.com/iocx-dev/iocx -- Plugins Meta‑Repo: https://github.com/iocx-dev/iocx-plugins -- Documentation: https://github.com/iocx-dev/iocx/tree/main/docs/specs -- PyPI Package: https://pypi.org/project/iocx/ +### **v0.7.0 — Deterministic Heuristics & Adversarial Testing Foundation** +- Deterministic heuristics +- Layer‑3 adversarial samples +- Snapshot‑contract tests +- Rich Header crash fix -## Features +--- -### IOC Extraction - -- URLs -- Domains -- IPv4 / IPv6 addresses -- File paths (Windows, Linux, UNC, env-vars) -- Hashes (MD5 / SHA1 / SHA256 / SHA512 / Generic Hex) -- Email addresses -- Base64 strings -- Crypto wallets (Ethereum / Bitcoin) - -### Binary-aware Static Analysis - -- Windows PE files (`.exe`, `.dll`) -- Extracted strings from binaries -- Imports, sections, resources, metadata -- **Analysis levels:** - - `basic` - section layout + entropy - - `deep` - adds obfuscation heuristics - - `full` - extended analysis stub (*future-ready*) - -### Performance & Caching - -- Fast startup and throughput -- Optional caching for repeated scans -- Suitable for CI/CD and large batch workflows - -### Developer‑Friendly - -- Clean, stable JSON output -- CLI + Python API -- Modular, extensible rule system -- Minimal dependency footprint - -### Security‑First - -- Zero malware execution -- Safe for untrusted input -- Deterministic behaviour for pipelines - -### Why Static Only? - -Static analysis ensures **safety**, **determinism**, and **CI‑friendly operation**. No sandboxing, no execution, and no risk of triggering malware behaviour. - -## Output Schema (v0.6.0) - -IOCX v0.6.0 defines a stable, deterministic JSON schema designed for DFIR, SOC automation, and threat‑intel pipelines. The schema is intentionally simple, predictable, and safe for long‑term integrations. - -The top‑level structure contains three blocks: - -- `iocs` — extracted indicators -- `metadata` — structural information about the artifact -- `analysis` — optional deeper inspection depending on analysis level - -This structure is identical across all input types, with PE‑specific fields populated only when applicable. - -### IOC Categories - -The `iocs` block always contains the same keys, regardless of analysis level: - -- `urls` -- `domains` -- `ips` -- `hashes` -- `emails` -- `filepaths` -- `base64` -- `crypto.btc` -- `crypto.eth` - -Each category is always an array. Empty categories are returned as empty arrays to ensure predictable downstream parsing. - -### Metadata Categories - -The metadata block contains structural information about the file. For PE files, this includes: - -- Imports and import details -- Sections -- Resources and resource strings -- TLS directory -- Header and optional header -- Rich header -- Signatures - -These fields are always present, even when empty. Metadata is **independent of analysis level** and is always returned in full. +### **v0.6.0 — Stable Output Schema & Deterministic Metadata** +- Fully stable JSON schema +- Normalised PE metadata +- Formalised analysis levels -### Analysis Levels +--- -The `analysis` block is the only part of the schema that changes based on the selected analysis level. +### **v0.5.0 — Analysis Levels, PE Section Analysis, Obfuscation Hints** +- New analysis‑level system +- PE structural analysis +- Obfuscation heuristics -- **basic** — section layout + entropy -- **deep** — adds obfuscation heuristics -- **full** — adds extended metadata summaries +--- -This tiered design allows users to trade off performance vs. depth without changing their downstream parsing logic. +### **v0.4.0 — Plugin Architecture** +- Plugin‑ready rule engine +- Unified detection flow -### Deterministic Output +--- -IOCX v0.6.0 guarantees: +### **v0.3.0 — Crypto IOC Detection** +- Ethereum & Bitcoin wallet detection -- Stable keys -- Stable types -- No volatile values in minimal modes -- Deterministic behaviour across runs and platforms +--- -This makes IOCX safe for SIEM/SOAR ingestion, CI/CD pipelines, and large‑scale batch processing. +### **v0.2.0 — High‑Reliability IP Detection** +- Major IPv4/IPv6 improvements -### Schema stability + -IOCX guarantees a stable JSON schema, not a guaranteed ordering of keys within objects. JSON objects are defined as unordered maps, so consumers should rely on field presence and structure rather than positional ordering. All fields, types, and structural relationships remain consistent across versions, even if internal key order changes. +--- -## Quickstart +# Quickstart ### Install ```bash @@ -507,21 +279,10 @@ iocx suspicious.exe echo "Visit http://bad.example.com" | iocx - ``` -### Extract from a log file -```bash -iocx alerts.log -``` - ### Enable PE analysis ```bash iocx suspicious.exe -a ``` -Or choose a specific level: -```bash -iocx suspicious.exe -a basic -iocx suspicious.exe -a deep -iocx suspicious.exe -a full -``` ### Python API ```python @@ -531,209 +292,215 @@ engine = Engine() results = engine.extract("suspicious.exe") print(results) ``` + +--- + +# Example Output + +> IOCX produces structured, deterministic JSON that includes IOCs, PE metadata, section analysis, heuristics, and obfuscation indicators. +> +> The example below is an abridged output from a real adversarial PE sample. It demonstrates the shape and depth of the schema while keeping the size manageable for documentation purposes. +
Show Example JSON Output -
- ```json { - "file": "suspicious.exe", - "type": "PE", - "iocs": { - "urls": [], - "domains": [], - "ips": [], - "hashes": [], - "emails": [], - "filepaths": [ - "C:\\Windows\\System32\\cmd.exe", - "D:\\Temp\\payload.bin", - "E:/Users/Bob/AppData/Roaming/evil.dll", - "F:\\Program Files\\SomeApp\\bin\\run.exe", - "C:\\Users\\Alice\\Desktop\\notes.txt", - "Z:\\Archive\\2024\\logs\\system.log", - "\\\\SERVER01\\share\\dropper.exe", - "\\\\192.168.1.44\\c$\\Windows\\Temp\\run.ps1", - "\\\\FILESRV\\public\\docs\\report.pdf", - "\\\\NAS01\\data\\backups\\2024\\config.json", - "/usr/bin/python3.11", - "/etc/passwd", - "/var/lib/docker/overlay2/abc123/config.v2.json", - "/tmp/x1/x2/x3/x4/x5/script.sh", - "/opt/tools/bin/runner", - "/home/alice/.config/evil.sh", - ".\\payload.exe", - "..\\lib\\config.json", - "./run.sh", - "../bin/loader.so", - ".\\scripts\\install.ps1", - "%APPDATA%\\Microsoft\\Windows\\Start Menu\\Programs\\Startup\\evil.lnk", - "%TEMP%\\payload.exe", - "%USERPROFILE%\\Downloads\\file.txt", - "$HOME/.config/evil.sh", - "$HOME/bin/run.sh", - "$TMPDIR/cache/tmp123.bin", - "C:\\Windows\\Temp\\payload.bin", - "/home/alice/.config/evil" - ], - "base64": [], - "crypto.btc": [], - "crypto.eth": [] - }, - "metadata": { - "file_type": "PE", - "imports": [ - "KERNEL32.dll", - "msvcrt.dll" - ], - "sections": [ - ".text", - ".data", - ".rdata", - ".pdata", - ".xdata", - ".bss", - ".idata", - ".CRT", - ".tls", - ".rsrc", - ".reloc" - ], - "resource_strings": [ - "C:\\Windows\\System32\\cmd.exe", - "\\\\SERVER01\\share\\dropper.exe", - "/home/alice/.config/evil.sh@%APPDATA%\\Microsoft\\Windows\\Start Menu\\Programs\\Startup\\evil.lnk" - ] - }, - "analysis": { - "sections": [ - { - "name": ".text", - "raw_size": 7168, - "virtual_size": 6712, - "characteristics": 1610612832, - "entropy": 5.790750971742716 - }, - { - "name": ".data", - "raw_size": 512, - "virtual_size": 464, - "characteristics": 3221225536, - "entropy": 2.094202310841767 - }, - { - "name": ".rdata", - "raw_size": 3584, - "virtual_size": 3408, - "characteristics": 1073741888, - "entropy": 4.545752258688727 - }, - { - "name": ".pdata", - "raw_size": 1024, - "virtual_size": 540, - "characteristics": 1073741888, - "entropy": 2.327719716055491 - }, - { - "name": ".xdata", - "raw_size": 512, - "virtual_size": 488, - "characteristics": 1073741888, - "entropy": 4.1370410751038245 - }, - { - "name": ".bss", - "raw_size": 0, - "virtual_size": 384, - "characteristics": 3221225600, - "entropy": 0.0 - }, - { - "name": ".idata", - "raw_size": 1536, - "virtual_size": 1472, - "characteristics": 3221225536, - "entropy": 3.7542599473501452 - }, - { - "name": ".CRT", - "raw_size": 512, - "virtual_size": 96, - "characteristics": 3221225536, - "entropy": 0.2718922950073886 - }, - { - "name": ".tls", - "raw_size": 512, - "virtual_size": 16, - "characteristics": 3221225536, - "entropy": 0.0 - }, - { - "name": ".rsrc", - "raw_size": 512, - "virtual_size": 416, - "characteristics": 1073741888, - "entropy": 2.6481096709923975 - }, - { - "name": ".reloc", - "raw_size": 512, - "virtual_size": 188, - "characteristics": 1107296320, - "entropy": 2.2248162937403557 - } - ], - "obfuscation": [ - { - "value": "abnormal_section_layout_virtual_only", - "start": 0, - "end": 0, - "category": "obfuscation_hint", - "metadata": { - "section": ".bss", - "raw_size": 0, - "virtual_size": 384 + "file": "heuristic_rich.full.exe", + "type": "PE", + "iocs": { + "urls": ["http://not-a-real-domain.test/payload"], + "domains": ["example-malware.com"], + "ips": ["192.0.2.123"], + "hashes": [ + "abcd1234ef567890abcd1234ef567890", + "1234567890", + "3333333333333333" + ], + "filepaths": [ + "/usr/src/mingw-w64-11.0.1-3build1/mingw-w64-crt/crt/crtexe.c", + "/usr/x86_64-w64-mingw32/include", + "/usr/src/mingw-w64-11.0.1-3build1/mingw-w64-crt/crt/pseudo-reloc.c" + ] + }, + "metadata": { + "file_type": "PE", + "imports": ["KERNEL32.dll", "msvcrt.dll", "USER32.dll"], + "sections": [ + ".text", ".data", ".rwx", ".rdata", + "UPX0", ".pdata", ".xdata", ".tls" + ], + "resources": [], + "resource_strings": [], + "delayed_imports": [], + "bound_imports": [], + "exports": [], + "signatures": [], + "has_signature": false, + "tls": { + "start_address": 5368758272, + "end_address": 5368758280, + "callbacks": 5368754232 + }, + "header": { + "entry_point": 5088, + "image_base": 5368709120, + "machine": "AMD64", + "subsystem": "Windows GUI" + }, + "optional_header": { + "section_alignment": 4096, + "file_alignment": 512, + "size_of_image": 155648 } - } - ] - } + }, + "analysis": { + "sections": [ + { "name": ".text", "entropy": 5.92 }, + { "name": ".rwx", "entropy": 0 }, + { "name": "UPX0", "entropy": 0.34 }, + { "name": ".rdata", "entropy": 4.03 } + ], + "obfuscation": [ + { + "value": "abnormal_section_layout_virtual_only", + "category": "obfuscation_hint", + "metadata": { + "section": ".bss", + "raw_size": 0, + "virtual_size": 384 + } + } + ], + "extended": [ + { + "value": "summary", + "category": "pe_metadata", + "metadata": { + "dll_count": 3, + "import_count": 45, + "resource_count": 0, + "has_tls": true, + "has_signature": false + } + } + ], + "heuristics": [ + { + "value": "packer_suspected", + "metadata": { + "reason": "packer_section_name", + "section": "UPX0" + } + }, + { + "value": "anti_debug_heuristic", + "metadata": { + "reason": "anti_debug_api_import", + "dll": "kernel32.dll", + "function": "CheckRemoteDebuggerPresent" + } + }, + { + "value": "anti_debug_heuristic", + "metadata": { + "reason": "timing_api_import", + "dll": "kernel32.dll", + "function": "GetTickCount" + } + }, + { + "value": "pe_structure_anomaly", + "metadata": { + "reason": "section_overlaps_headers", + "section": ".bss", + "raw_address": 0, + "size_of_headers": 1536 + } + }, + { + "value": "pe_structure_anomaly", + "metadata": { + "reason": "data_directory_overlap", + "directory_a": "IMAGE_DIRECTORY_ENTRY_IMPORT", + "directory_b": "IMAGE_DIRECTORY_ENTRY_IAT" + } + } + ] + } } - ```
-## Architecture -```plaintext +--- -iocx/ -│ -├── examples/ # Sample files + generators -├── docs/ # Detector contracts, overlap suppression rules, and plugin authoring guidelines -├── tests/ # Unit, integration, fuzz, robustness, contract, and performance tests -├── iocx - ├── detectors/ # Regex-based IOC detectors - ├── parsers/ # PE parsing, string extraction - ├── plugins/ # Plugin API and registry - ├── cli/ # Command-line interface - ├── analysis/ # PE static-analysis modules +# Architecture ``` +iocx/ +├── examples/ +├── docs/ +├── tests/ +└── iocx + ├── detectors/ + ├── parsers/ + ├── plugins/ + ├── cli/ + └── analysis/ +``` + +--- + +# Plugin Ecosystem & Extensibility + +IOCX is designed to be extended safely and predictably. +Plugins are **first‑class citizens**, validated by the same deterministic snapshot tests as the core engine. + +You can build: + +- custom IOC detectors +- custom regex rules +- binary‑aware plugins +- internal heuristics +- pipeline‑specific extractors + +See: + +- `docs/specs/overlap-suppression.md` +- `docs/specs/plugin-authoring-guidelines.md` + +--- + +# Ecosystem Overview + +IOCX is more than a single binary — it’s a modular ecosystem: + +- **Core Engine** — deterministic IOC extraction + PE analysis +- **Plugin System** — custom detectors and analysis modules +- **Adversarial Corpus** — malformed PEs, hostile strings, fuzz samples +- **Snapshot Testing Framework** — ensures deterministic output +- **Performance Benchmarks** — enforced in CI +- **Documentation Suite** — specs, contracts, and plugin guides + +--- + +# Who Uses IOCX? -The engine is intentionally modular so components can be extended or replaced easily. +IOCX is used across: -## Extending IOCX +- DFIR teams +- SOC automation pipelines +- CI/CD security gates +- Threat‑intel platforms +- Malware research labs +- Security engineering teams -See `docs/specs/` for: +Anywhere indicators need to be extracted **safely**, **deterministically**, and **at scale**, IOCX fits. -- Detector contracts -- Overlap suppression rules -- Plugin authoring guidelines +--- -## Safe Testing (No Malware Required) +# Safe Testing (No Malware Required) All test samples are: @@ -742,33 +509,86 @@ All test samples are: - Publicly safe (EICAR, GTUBE) - Designed to avoid accidental malware handling -## Performance Guarantees +--- + +# Performance Guarantees + +IOCX enforces strict performance thresholds in CI to ensure: + +- No regex backtracking stalls +- No pathological slowdowns +- Stable performance across releases + +See: + +- `docs/performance.md` + +--- + +# Project Identity & Naming + +The name **IOCX** refers exclusively to the official engine published on: + +- PyPI: [https://pypi.org/project/iocx/](https://pypi.org/project/iocx/) +- GitHub: [https://github.com/iocx-dev/iocx](https://github.com/iocx-dev/iocx) + +### Not allowed +- Repositories named `iocx` +- Tools named “iocx” not part of this project +- Implying affiliation without permission + +### Allowed +- `iocx-` +- `iocx-extension-` +- `iocx-detector-` + +--- + +# Official IOCX Repositories + +- Core Engine: [https://github.com/iocx-dev/iocx](https://github.com/iocx-dev/iocx) +- Plugins Meta‑Repo: [https://github.com/iocx-dev/iocx-plugins](https://github.com/iocx-dev/iocx-plugins) +- Documentation: [https://github.com/iocx-dev/iocx/tree/main/docs/specs](https://github.com/iocx-dev/iocx/tree/main/docs/specs) +- PyPI Package: [https://pypi.org/project/iocx/](https://pypi.org/project/iocx/) + +--- + +# Roadmap + +IOCX development focuses on stability, extensibility, and deeper static‑analysis coverage. +The items below represent ongoing areas of work and exploration. -IOCX is engineered for high‑throughput, low‑latency analysis across normal, edge‑case, and adversarial inputs. -We maintain strict performance thresholds enforced in CI to ensure the engine remains fast and predictable across releases. +- **Extended PE heuristics** (delay‑load behaviour, structural anomalies, relocation patterns) +- **Selective suppression rules** for OSINT, DFIR, and threat‑intel workflows +- **ELF and Mach‑O metadata extraction** +- **Batch analysis mode** for multi‑artifact workflows +- **YARA‑style output modes** and enrichment hooks +- **Binary‑agnostic static analysis** +- **Cross‑platform plugin ecosystem** +- **Language bindings** for Rust, Go, and Node.js -See [Performance Guarantees](/docs/performance.md) +--- -## Contributing +# Contributing We welcome: -- New IOC detectors +- New detectors - Parser improvements -- Bug reports - Documentation updates -- Synthetic test samples +- Synthetic adversarial samples -See CONTRIBUTING.md for full guidelines. +See `CONTRIBUTING.md` for guidelines. -## Security +--- -If you discover a security issue, do not open a GitHub issue. -Please follow the instructions in SECURITY.md. +# Security -## License +If you discover a security issue, **do not open a GitHub issue**. +Follow the instructions in `SECURITY.md`. -Licensed under the MIT License. See LICENSE for details. +--- +# License -*The IOCX name and project identity refer exclusively to the IOCX engine maintained under the iocx-dev organisation* +MPL‑2.0 License — see `LICENSE`. diff --git a/SECURITY.md b/SECURITY.md index d98af05..0c1c87d 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -5,16 +5,22 @@ We take security seriously and aim to provide a trustworthy, minimal‑dependenc This document describes our security posture, how we handle vulnerabilities, and how to report issues responsibly. +--- + ## Supported Versions We currently support and maintain only the latest released version of IOCX. -| Version | Status | -|----------------|------------------| -| Latest release | Supported | -| Older versions | Unsupported | +| Version | Status | +|------------------|---------------| +| Latest release | Supported | +| Older versions | Unsupported | Security fixes are applied exclusively to the most recent version. +Security guarantees apply only to the official IOCX core. +Third-party plugins may introduce additional risk. + +--- ## Security Posture @@ -22,20 +28,27 @@ IOCX is designed with security and simplicity in mind. The tool processes untrus ### Minimal Runtime Dependencies -To reduce supply‑chain risk and minimise the attack surface, IOCX intentionally uses only two runtime dependencies: +To reduce supply‑chain risk and minimise the attack surface, IOCX intentionally uses only a small set of well‑audited runtime dependencies. Each dependency is selected for deterministic behaviour, stability, and ecosystem maturity. + +Current runtime dependencies: -- pefile - PE parsing -- python-magic - file‑type detection +- **pefile** — PE parsing and structural inspection +- **python‑magic** — file‑type detection via signature analysis +- **idna** — punycode decoding and Unicode domain normalisation -No additional libraries are required for core functionality. +No additional libraries are required for core functionality. IOCX performs: + +- no dynamic execution +- no network access +- no deserialisation of untrusted data ### Automated Security Scanning All commits and pull requests undergo automated security checks: -- pip‑audit — dependency vulnerability scanning -- Bandit — static analysis of Python code -- Pytest — full test suite execution +- **pip‑audit** — dependency vulnerability scanning +- **Bandit** — static analysis of Python code +- **Pytest** — full test suite execution These checks run in CI to catch regressions early. @@ -43,12 +56,12 @@ These checks run in CI to catch regressions early. IOCX is designed to process potentially malicious files safely. To reduce risk: -- No dynamic code execution -- No deserialization of untrusted data -- No network access -- Strict parsing of binary formats -- Defensive exception handling in extractors and parsers -- No mutation of input files +- no dynamic code execution +- no deserialization of untrusted data +- no network access +- strict parsing of binary formats +- defensive exception handling in extractors and parsers +- no mutation of input files ### No Elevated Privileges Required @@ -60,21 +73,25 @@ IOCX runs entirely in user space and does not require: This reduces the impact of potential vulnerabilities. +--- + ## Threat Model (Scope & Limitations) IOCX is a static extraction tool, not a sandbox or malware analysis framework. The following are out of scope: -- Detecting or preventing active exploitation -- Executing or emulating malware -- Analysing runtime behaviour -- Guaranteeing correctness of third‑party plugins -- Protecting against malicious Python environments or compromised dependencies +- detecting or preventing active exploitation +- executing or emulating malware +- analysing runtime behaviour +- guaranteeing correctness of third‑party plugins +- protecting against malicious Python environments or compromised dependencies Users should run IOCX in a controlled environment when analysing untrusted binaries. -Refer to the [threat model overview](/docs/security/threat-model.md) for Data Flow and STRIDE‑Oriented Threat Interaction Diagrams. +Refer to the threat‑model documentation for data‑flow diagrams and STRIDE‑oriented analysis. + +--- ## Reporting a Vulnerability @@ -82,20 +99,23 @@ We appreciate responsible disclosure and welcome reports from the community. ### How to report -Please email: security@malx.io +Please email: **security@malx.io** Include: -- A clear description of the issue -- Steps to reproduce -- Potential impact -- Any suggested fixes or patches +- a clear description of the issue +- steps to reproduce +- potential impact +- any suggested fixes or patches -We aim to acknowledge reports within 72 hours. +We aim to acknowledge reports within **72 hours**. ### Do Not Open Public GitHub Issues -Please avoid filing public issues for security problems. This protects users while we investigate and patch the issue. +Please avoid filing public issues for security problems. +This protects users while we investigate and patch the issue. + +--- ## Vulnerability Disclosure Process @@ -106,13 +126,37 @@ Please avoid filing public issues for security problems. This protects users whi 5. We publish a security advisory (if applicable). 6. We credit the reporter (unless anonymity is requested). +--- + ## Responsible Disclosure We ask that reporters: -- Allow reasonable time for us to develop a fix -- Avoid exploiting the vulnerability beyond what is necessary for proof‑of‑concept -- Avoid accessing or modifying user data -- Refrain from public disclosure until a fix is released +- allow reasonable time for us to develop a fix +- avoid exploiting the vulnerability beyond what is necessary for proof‑of‑concept +- avoid accessing or modifying user data +- refrain from public disclosure until a fix is released We appreciate your help in keeping IOCX secure. + +--- + +## Commercial Customers + +Commercial licensees may receive: + +- priority security response +- extended support windows +- advance notification of critical issues +- access to patched builds before public release + +For enterprise security inquiries, contact: **security@malx.io** + +--- + +## Trademark Notice + +“IOCX” is a trademark of Peter James Weaver. +See [TRADEMARK_POLICY](TRADEMARK_POLICY.md) for permitted and restricted use of the IOCX name. + +IOCX is licensed under the Mozilla Public License 2.0 (MPL-2.0). diff --git a/TRADEMARK_POLICY.md b/TRADEMARK_POLICY.md new file mode 100644 index 0000000..5adbcf0 --- /dev/null +++ b/TRADEMARK_POLICY.md @@ -0,0 +1,37 @@ +# **IOCX Trademark Policy** +**Version 1.0** + +“IOCX” is a trademark owned by Peter James Weaver. +This policy explains how the name may be used in open‑source and commercial contexts. + +## Permitted Use + +You may use the name “IOCX” to: + +- refer to the official open‑source IOCX project +- describe compatibility (e.g., “built with IOCX”, “uses IOCX”) +- reference IOCX in documentation, research, or discussion + +These uses must be accurate and non‑misleading. + +## Restricted Use + +You may **not** use the IOCX name in a way that: + +- suggests endorsement, affiliation, or official status +- brands a fork, derivative, or competing product as “IOCX” +- creates confusion about what is or isn’t the official project +- uses confusingly similar names (e.g., “IOCX‑Pro”, “IOCX‑Plus”) + +Forks must use different names. + +## Commercial Use + +Using the IOCX name in commercial products, services, or marketing requires permission. +This includes SaaS offerings, enterprise tools, or paid products branded with “IOCX”. + +## Contact + +For trademark questions or licensing requests, contact: + +**legal@malx.io** diff --git a/docs/security/threat-model.md b/docs/security/threat-model.md index e6c2e26..b57ac62 100644 --- a/docs/security/threat-model.md +++ b/docs/security/threat-model.md @@ -160,6 +160,27 @@ flowchart TD | **D** | Denial of Service | Regex backtracking attacks | Pre‑compiled safe regex patterns | | **E** | Elevation of Privilege | Plugin system abused | ``--dev`` mode opt‑in; no auto‑loading | +#### **Third‑Party Components (Domain Decoding)** + +IOCX uses the `idna` library for punycode decoding and Unicode domain normalization. This expands the dependency surface slightly, but the risk profile remains low: + +- Pure‑Python implementation with no native extensions +- No network access or external lookups +- Deterministic transformations only +- Widely used across the Python ecosystem +- Included in adversarial testing for malformed punycode, invalid labels, and Unicode edge cases + +**Threat considerations:** + +| STRIDE | Threat | Description | Mitigation | +|--------|------------------------|---------------------------------------------------------|-----------------------------------------------------------------------------| +| **S** | Spoofing | Unicode domains crafted to appear visually similar | Confusable detection; script classification; punycode round‑trip validation | +| **T** | Tampering | Malformed punycode strings attempting to break decoding | Exception handling; bounded decoding; adversarial fixtures | +| **R** | Repudiation | Incorrect decoding results | Deterministic transformations; snapshot tests | +| **I** | Information Disclosure | None — no external communication | Local‑only processing | +| **D** | Denial of Service | Extremely long or malformed labels | Length caps; strict punycode validation | +| **E** | Elevation of Privilege | None — no execution or dynamic loading | Pure‑function transformations only | + ### 6. Local Cache | STRIDE | Threat | Description | Mitigation | diff --git a/docs/specs/reason-codes.md b/docs/specs/reason-codes.md new file mode 100644 index 0000000..4b58700 --- /dev/null +++ b/docs/specs/reason-codes.md @@ -0,0 +1,153 @@ +# **PE Structural Reason Codes** + +## **SECTION ANOMALIES** + +| Reason Code | What Triggers It | Example Malformed Pattern | Scope | +|------------|------------------|---------------------------|--------| +| **SECTION_RWX** | Section has both `MEM_EXECUTE` and `MEM_WRITE` | `.text` marked executable + writable | Per‑section | +| **SECTION_NON_EXECUTABLE_CODE_LIKE** | `CNT_CODE` flag set but section not executable | `.text` with `CNT_CODE` but missing `MEM_EXECUTE` | Per‑section | +| **SECTION_CODELIKE_NAME_NOT_EXECUTABLE** | Name looks like code (`.text`, `code`, etc.) but section not executable | `.text` with only `READ` | Per‑section | +| **SECTION_NAME_NON_ASCII** | Section name contains non‑ASCII bytes | Name = `"\xFF\xFE\xFA\x00"` | Per‑section | +| **SECTION_NAME_EMPTY_OR_PADDING** | Name is empty or only NUL/padding | Name = `"\x00\x00\x00\x00\x00\x00\x00\x00"` | Per‑section | +| **SECTION_IMPOSSIBLE_FLAGS** | Section is discardable + executable + writable | `.text` with `MEM_DISCARDABLE | MEM_EXECUTE | MEM_WRITE` | Per‑section | +| **SECTION_RAW_MISALIGNED** | `PointerToRawData % FileAlignment != 0` | Raw offset = 291, FileAlignment = 512 | Per‑section | +| **SECTION_RAW_OVERLAP** | Raw ranges of two sections intersect | `.text` raw `[0x200–0x800)` overlaps `.rdata` raw `[0x300–0x900)` | Global (pairwise) | +| **SECTION_OVERLAP** | Virtual address ranges intersect | `.text` VA `[0x1000–0x1800)` overlaps `.rdata` VA `[0x1400–0x1C00)` | Global (pairwise) | +| **SECTION_OVERLAPS_HEADERS** | `PointerToRawData < SizeOfHeaders` | `.bss` raw offset = 0, `SizeOfHeaders = 1536` | Per‑section | +| **SECTION_OUT_OF_ORDER_RAW** | Raw addresses not sorted ascending | Raw list = `[1536, 8192, 0, 19456...]` | Global | +| **SECTION_OUT_OF_ORDER_VIRTUAL** | Virtual addresses not sorted ascending | VA list = `[0x2000, 0x1000]` | Global | +| **SECTION_ZERO_LENGTH** | `virtual_size == 0` AND `raw_size == 0` | `.zero` section with no memory or file footprint | Per‑section | +| **SECTION_DISCARDABLE_CODE** | Section is executable AND discardable | `.text` with `MEM_EXECUTE | MEM_DISCARDABLE` | Per‑section | +| **SECTION_FLAGS_INCONSISTENT** | Contradictory flags: code/write/exec without read | `.text` with `EXECUTE` but missing `READ` | Per‑section | + +--- + +## **ENTRYPOINT ANOMALIES** + +| Reason Code | What Triggers It | Example Pattern | Scope | +|------------|------------------|-----------------|--------| +| **ENTRYPOINT_OUT_OF_BOUNDS** | EP does not map to any section | EP = `0x90000000`, SizeOfImage = 512 | Per‑file | +| **ENTRYPOINT_SECTION_NOT_EXECUTABLE** | EP maps to non‑executable section | EP inside `.rdata` | Per‑file | +| **ENTRYPOINT_IN_TRUNCATED_REGION** | EP beyond section’s virtual size | EP = `VA + VirtualSize + 1` | Per‑file | +| **ENTRYPOINT_IN_OVERLAY** | EP maps to file offset ≥ overlay offset | EP raw offset = 0x5000, overlay = 0x4000 | Per‑file | +| **ENTRYPOINT_ZERO_OR_NEGATIVE** | EP ≤ 0 | EP = 0 | Per‑file | +| **ENTRYPOINT_IN_HEADERS** | EP < SizeOfHeaders | EP = 0x100, SizeOfHeaders = 0x400 | Per‑file | +| **ENTRYPOINT_IN_NON_CODE_SECTION** | EP inside `.rsrc`, `.reloc`, or non‑code section | EP inside `.rsrc` | Per‑file | +| **ENTRYPOINT_IN_DISCARDABLE_SECTION** | EP inside discardable section | EP inside `.upx0` with discardable flag | Per‑file | + +--- + +## **OPTIONAL HEADER ANOMALIES** + +| Reason Code | What Triggers It | Example Malformed Pattern | Scope | +|------------|------------------|---------------------------|--------| +| **OPTIONAL_HEADER_INCONSISTENT_SIZE** | `max(section_end)` exceeds `SizeOfImage` | `.rsrc` ends at `0x3800`, `SizeOfImage = 0x2000` | Per‑file | +| **OPTIONAL_HEADER_INVALID_SIZE_OF_HEADERS** | `SizeOfHeaders` misaligned OR smaller than required header size | `SizeOfHeaders = 2048`, `FileAlignment = 16384` | Per‑file | +| **OPTIONAL_HEADER_INVALID_SECTION_ALIGNMENT** | `SectionAlignment < FileAlignment` OR not power‑of‑two | `SectionAlignment = 4096`, `FileAlignment = 16384` | Per‑file | +| **OPTIONAL_HEADER_INVALID_FILE_ALIGNMENT** | Not power‑of‑two OR outside 512–64K range | `FileAlignment = 300` | Per‑file | +| **OPTIONAL_HEADER_SIZE_FIELDS_INCONSISTENT** | SizeOfCode / SizeOfInit / SizeOfUninit smaller than section totals | `.text` raw = 0x600, `SizeOfCode = 0x200` | Per‑file | +| **OPTIONAL_HEADER_IMAGE_BASE_MISALIGNED** | `ImageBase` not 64K aligned | `ImageBase = 0x12345` | Per‑file | +| **OPTIONAL_HEADER_INVALID_NUMBER_OF_RVA_AND_SIZES** | `NumDirs` < actual directories OR > 16 | `NumDirs = 1`, actual = 3 | Per‑file | +| **OPTIONAL_HEADER_SIZE_OF_IMAGE_MISALIGNED** | `SizeOfImage % SectionAlignment != 0` | `SizeOfImage = 512`, `SectionAlignment = 4096` | Per‑file | + +--- + +## **RVA / DIRECTORY ANOMALIES** + +| Reason Code | What Triggers It | Example Pattern | Scope | +|------------|------------------|-----------------|--------| +| **DATA_DIRECTORY_INVALID_RANGE** | Directory has negative RVA or negative Size | RVA = –1, Size = 128 | Per‑directory | +| **DATA_DIRECTORY_ZERO_SIZE_UNEXPECTED** | Directory is empty *(rva=0,size=0)* but this directory type is required to be non‑empty (currently none) | Import directory empty (if required) | Per‑directory | +| **DATA_DIRECTORY_ZERO_RVA_NONZERO_SIZE** | Directory claims to exist but points to RVA 0 | Resource RVA = 0, Size = 256 | Per‑directory *(primary error, all others suppressed)* | +| **DATA_DIRECTORY_IN_HEADERS** | Directory RVA lies inside the PE headers region | RVA = 0x100, SizeOfHeaders = 0x200 | Per‑directory | +| **DATA_DIRECTORY_OUT_OF_RANGE** | Directory extends beyond `SizeOfImage` | RVA = 0x5000, Size = 0x2000, SizeOfImage = 0x4000 | Per‑directory *(primary error, mapping suppressed)* | +| **DATA_DIRECTORY_IN_OVERLAY** | Directory maps to a raw offset ≥ overlay start | RVA maps to raw offset 0x6000, overlay starts at 0x5800 | Per‑directory | +| **DATA_DIRECTORY_NOT_MAPPED_TO_SECTION** | Directory is in range but does not fall inside any section | RVA = 0x9000, Size = 0x200, no section covers it | Per‑directory *(suppressed for empty, zero‑RVA, out‑of‑range, zero‑length‑section)* | +| **DATA_DIRECTORY_SPANS_MULTIPLE_SECTIONS** | Directory range overlaps more than one section | RVA = 0x1800, Size = 0x1000 spans .text → .rdata | Per‑directory | +| **DATA_DIRECTORY_OVERLAP** | Two directories’ RVA ranges overlap | Import and IAT overlap | Global | +| **IMPORT_RVA_INVALID** | Import RVA does not map to a valid import table structure (import validator) | Import RVA = 0x9000 | Per‑directory | + +--- + +## **TLS ANOMALIES** + +| Reason Code | What Triggers It | Example Pattern | Scope | +|------------|------------------|-----------------|--------| +| **TLS_CALLBACK_OUTSIDE_RANGE** | Callback RVA not within the TLS directory’s `start, end)` range | Callback = `0x5000`, TLS range = `0x4000–0x4100` | Per‑file | +| **[TLS_MULTIPLE_DIRECTORIES** | More than one TLS directory is present in the PE | Two `tls_directory` entries in `extended` | Per‑file | +| **TLS_INVALID_RANGE** | TLS directory has `start >= end` (structurally impossible) | Start = `0x6000`, End = `0x6000` | Per‑file | +| **TLS_ZERO_LENGTH_DIRECTORY** | TLS directory exists but `start == end` (zero‑length region) | Start = `0x7000`, End = `0x7000` | Per‑file | +| **TLS_CALLBACKS_MISSING** | TLS directory is non‑empty but callback pointer is `0` | Start = `0x4000`, End = `0x4100`, Callbacks = `0` | Per‑file | +| **TLS_CALLBACK_NOT_MAPPED_TO_SECTION** | Callback RVA does not fall inside any section’s VA range | Callback = `0x90000000` (no section covers it) | Per‑file | +| **TLS_CALLBACK_IN_NON_EXECUTABLE_SECTION** | Callback RVA maps to a section lacking `IMAGE_SCN_MEM_EXECUTE` | Callback in `.data` or `.rdata` | Per‑file | +| **TLS_CALLBACK_IN_HEADERS** | Callback RVA falls inside the PE headers (`< SizeOfHeaders`) | Callback = `0x200`, SizeOfHeaders = `0x600` | Per‑file | +| **TLS_CALLBACK_IN_OVERLAY** | Callback RVA maps to a raw offset beyond the last section (overlay) | Raw offset = `0x1F000`, overlay starts at `0x1E000` | Per‑file | +| **TLS_CALLBACK_ARRAY_NOT_TERMINATED** *(optional future rule)* | Callback array exists but is not 0‑terminated | Callback list ends with non‑zero RVA | Per‑file | + +--- + +## **SIGNATURE ANOMALIES** + +| Reason Code | What Triggers It | Example Pattern | Scope | +|------------|------------------|-----------------|--------| +| **SIGNATURE_FLAG_SET_BUT_NO_METADATA** | `IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY` set but no WIN_CERTIFICATE metadata present | Flag = 1, `signatures = []` | Per‑file | +| **SIGNATURE_PRESENT_BUT_FLAG_NOT_SET** | Certificate metadata exists but the integrity flag is not set | `signatures = [ … ]`, flag = 0 | Per‑file | +| **SIGNATURE_MULTIPLE_CERTIFICATES** | More than one WIN_CERTIFICATE structure present | Two or more entries in `signatures` | Per‑file | +| **SIGNATURE_INVALID_LENGTH** | `dwLength` smaller than the WIN_CERTIFICATE header (8 bytes) or otherwise nonsensical | `dwLength = 4` | Per‑certificate | +| **SIGNATURE_INVALID_REVISION** | `wRevision` not equal to 0x0100 or 0x0200 | `wRevision = 0x9999` | Per‑certificate | +| **SIGNATURE_INVALID_TYPE** | `wCertificateType` not X.509 (1) or PKCS#7 (2) | `certificate_type = 0x1234` | Per‑certificate | +| **SIGNATURE_OUT_OF_FILE_BOUNDS** | Certificate offset + size exceeds file size or begins before 0 | Offset = 0x200000, FileSize = 0x180000 | Per‑certificate | +| **SIGNATURE_OVERLAPS_OTHER_DATA** | Certificate overlaps a section, overlay, or other critical region | Certificate at raw 0x4000 overlaps `.text` | Per‑certificate | + +--- + +## ** RESOURCE ANOMALIES** + +### **Resource Directory Anomalies** + +| Reason Code | What Triggers It | Example Pattern | Scope | +|------------|------------------|-----------------|--------| +| **RESOURCE_DIRECTORY_OUT_OF_BOUNDS** | A resource directory RVA/size lies outside the `.rsrc` section or outside `SizeOfImage` | Directory RVA = `0x90000000`, `.rsrc` ends at `0x400000` | Per‑file | +| **RESOURCE_DIRECTORY_LOOP** | Recursive directory traversal detects a cycle (malformed or malicious resource tree) | Directory A → B → A | Per‑file | +| **RESOURCE_DIRECTORY_ZERO_LENGTH** | A resource directory exists but has zero length or no valid entries | RVA = `0x3000`, size = `0` | Per‑file | + +### **Resource Entry / Data Anomalies** + +| Reason Code | What Triggers It | Example Pattern | Scope | +|------------|------------------|-----------------|--------| +| **RESOURCE_ENTRY_OUT_OF_BOUNDS** | A resource entry points to a data entry outside the `.rsrc` section or outside `SizeOfImage` | Entry RVA = `0x80000000` | Per‑file | +| **RESOURCE_DATA_OUT_OF_BOUNDS** | Resource data block lies outside the file or outside the `.rsrc` section | Data offset = `0x1F0000`, file size = `0x1E0000` | Per‑file | +| **RESOURCE_DATA_OVERLAPS_OTHER_DATA** | Two resource data blobs overlap in raw or virtual space | Data A: `0x2000–0x2400`, Data B: `0x2300–0x2500` | Per‑file | + +### **Resource String‑Table Anomalies** + +| Reason Code | What Triggers It | Example Pattern | Scope | +|------------|------------------|-----------------|--------| +| **RESOURCE_STRING_TABLE_CORRUPT** | String table length, offsets, or UTF‑16 entries are malformed or out of bounds | String count = 32 but table only contains 10 entries | Per‑file | + +--- + +## **ENTROPY ANOMALIES** + +| Reason Code | What Triggers It | Example Pattern | Scope | +|------------|------------------|-----------------|--------| +| **ENTROPY_HIGH_SECTION** | Section entropy ≥ 7.5 and size ≥ 1 KB | `.text` entropy = 7.9 | Per‑section | +| **ENTROPY_HIGH_OVERLAY** | Overlay entropy ≥ 7.5 and size ≥ 1 KB | Overlay entropy = 7.8 | Per‑file | +| **ENTROPY_UNIFORM_ACROSS_SECTIONS** | All sections have high entropy with very low variance | Mean = 7.7, stddev = 0.05 | Per‑file | +| **ENTROPY_VERY_LOW_SECTION** | Large section with entropy ≤ 0.2 (zero‑filled / padding abuse) | `.data` entropy = 0.03 | Per‑section | +| **ENTROPY_HIGH_RESOURCES** | Resource directory entropy ≥ 7.5 | `.rsrc` entropy = 7.9 | Per‑region | +| **ENTROPY_HIGH_RELOCATIONS** | Relocation table entropy ≥ 7.5 | `.reloc` entropy = 7.8 | Per‑region | +| **ENTROPY_HIGH_IMPORTS** | Import table entropy ≥ 7.5 | Import blob entropy = 7.7 | Per‑region | +| **ENTROPY_HIGH_TLS** | TLS directory entropy ≥ 7.5 | TLS entropy = 7.9 | Per‑region | +| **ENTROPY_HIGH_CERTIFICATE** | Certificate blob entropy ≥ 7.5 | WIN_CERTIFICATE entropy = 7.8 | Per‑region | + +--- + +## **PACKER HEURISTICS (Interpretation Layer)** + +| Reason Code | What Triggers It | Example Pattern | Scope | +|------------|------------------|-----------------|--------| +| **PACKER_SECTION_NAME** | Section name matches known packer patterns | `.upx0`, `.upx1`, `.aspack` | Per‑section | +| **PACKER_HIGH_ENTROPY_SECTION** | High entropy in code section | `.text` entropy = 7.8 | Per‑section | +| **PACKER_HIGH_ENTROPY_OVERLAY** | Overlay entropy high | Overlay = encrypted blob | Per‑file | +| **PACKER_UNIFORM_HIGH_ENTROPY_PATTERN** | All sections uniformly high entropy | UPX‑like packed binary | Per‑file | diff --git a/docs/specs/structural-validation-deterministic-heuristics.md b/docs/specs/structural-validation-deterministic-heuristics.md new file mode 100644 index 0000000..5dff250 --- /dev/null +++ b/docs/specs/structural-validation-deterministic-heuristics.md @@ -0,0 +1,285 @@ +# **IOCX Structural Validation & Deterministic Heuristics** +### *The Definitive Architecture of a Deterministic Static Analysis Engine* + +Modern malware analysis tools often rely on heuristics that are opaque, unstable, or dependent on runtime behaviour. IOCX takes a different approach. +It begins with a foundation of **deterministic structural validators** — pure, static, reproducible checks that establish the *truth* of a binary’s layout. +Only after structural truth is established do heuristics interpret that truth. + +This document explains the validator suite, the deterministic principles behind it, and how IOCX builds heuristics on top of a stable structural core. + +--- + +# **1. Philosophy: Deterministic Structural Truth** + +IOCX is built on a simple idea: + +> **If you cannot trust the structure of a binary, you cannot trust anything derived from it.** + +Every validator in IOCX is: + +- **Deterministic** — no randomness, no environment dependence, no external data. +- **Snapshot‑stable** — identical input → identical output, across machines and versions. +- **Adversarial‑robust** — safe under malformed, truncated, or intentionally corrupted binaries. +- **Side‑effect‑free** — pure functions, no mutation, no execution, no network. +- **Composable** — each validator produces structural truth; heuristics interpret it. + +This is the opposite of “guessing.” +This is **structural verification**. + +--- + +# **2. The Validator Suite** +Each validator inspects a different subsystem of the PE format. +Together, they form a complete structural model of the binary. + +Below is the definitive description of each validator and the structural invariants it enforces. + +--- + +# **2.1 Entropy Validator** +### *Detects anomalous entropy patterns across sections, overlays, and regions.* + +The entropy validator establishes: + +- High‑entropy sections (possible packing or encryption). +- Very low entropy in large sections (possible padding or corruption). +- High‑entropy overlays (packed payloads appended to the file). +- High entropy in specific regions (resources, relocations, imports, TLS, certificates). +- Uniform entropy across sections (indicative of packers that homogenise data). + +All thresholds are fixed constants. +All decisions are deterministic. +No entropy‑based heuristic is emitted here — only structural facts. + +--- + +# **2.2 Entrypoint Validator** +### *Verifies that the binary’s execution entrypoint is structurally valid.* + +This validator ensures: + +- EntryPoint is positive and non‑zero. +- EntryPoint is not inside headers. +- EntryPoint maps to a real section. +- EntryPoint is inside an executable section. +- EntryPoint is not inside `.rsrc`, `.reloc`, or other non‑code regions. +- EntryPoint is not inside discardable or zero‑length sections. +- EntryPoint does not map into overlay data. + +This validator is one of the strongest structural correctness checks in IOCX. +It prevents false heuristics by ensuring the EP is meaningful before interpretation. + +--- + +# **2.3 Optional Header Validator** +### *Validates the core invariants of the PE Optional Header.* + +This validator enforces: + +- `SizeOfImage` ≥ max section end. +- `SizeOfHeaders` aligned to `FileAlignment` and ≥ actual header size. +- `SectionAlignment` ≥ `FileAlignment` and power‑of‑two. +- `FileAlignment` power‑of‑two and within 512–64K. +- `SizeOfCode`, `SizeOfInitializedData`, `SizeOfUninitializedData` ≥ section totals. +- `ImageBase` 64K‑aligned. +- `NumberOfRvaAndSizes` within valid range and ≥ actual directories. +- `SizeOfImage` aligned to `SectionAlignment`. + +These checks ensure the binary’s declared layout matches its actual layout. + +--- + +# **2.4 Resources Validator** +### *Validates the entire resource tree: directories, entries, and data blobs.* + +This validator performs: + +- Recursive directory validation with loop detection. +- Bounds checking for every directory and data entry. +- Raw and virtual overlap detection with other sections. +- Overlay overlap detection. +- Zero‑length directory and zero‑length data detection. +- String table bounds validation. + +Resource trees are a common place for corruption and obfuscation. +This validator ensures the `.rsrc` section is structurally sane before heuristics interpret it. + +--- + +# **2.5 RVA Graph Validator** +### *Validates all PE data directories and their mapping to sections.* + +This validator enforces: + +- No negative RVAs or sizes. +- Zero‑RVA directories with non‑zero size are flagged. +- Directories must not lie inside headers. +- Directories must not exceed `SizeOfImage`. +- Directories must map to exactly one section. +- Directories must not span multiple sections. +- Directories must not overlap each other. +- Directories must not map into overlay data. +- Zero‑length sections are treated as invalid mapping targets. + +This validator is the backbone of structural correctness for imports, exports, resources, relocations, TLS, and security directories. + +--- + +# **2.6 Sections Validator** +### *Validates section flags, alignment, ordering, and overlap.* + +This validator enforces: + +- RWX sections (executable + writable). +- Code flag without executable flag. +- Code‑like names without executable flag. +- Non‑ASCII or padding section names. +- Impossible flag combinations (discardable + executable + writable). +- Raw alignment to `FileAlignment`. +- Sections overlapping headers. +- Zero‑length sections. +- Discardable executable sections. +- Contradictory flags (exec/write/code without read). +- Raw overlap between sections. +- Virtual overlap between sections. +- Raw and virtual ordering must be ascending. + +This validator is the structural heart of IOCX. +It ensures the section table is coherent, non‑overlapping, and meaningful. + +--- + +# **2.7 Signature Validator** +### *Validates WIN_CERTIFICATE structures.* + +This validator enforces: + +- Flag/metadata symmetry. +- Single certificate (multiple certificates flagged). +- Certificate length ≥ 8. +- Valid revision (0x0100 or 0x0200). +- Valid certificate type (0x0001 or 0x0002). +- Certificate within file bounds. +- Certificate not overlapping overlay. +- Certificate not overlapping any section. + +This ensures the Authenticode block is structurally valid before any trust decisions are made. + +--- + +# **2.8 TLS Validator** +### *Validates TLS directory and callback structure.* + +This validator enforces: + +- At most one TLS directory. +- TLS directory has valid start/end range. +- TLS callbacks pointer is non‑zero. +- TLS callbacks lie inside TLS range. +- TLS callbacks map to a real section. +- TLS callbacks lie in an executable section. +- TLS callbacks not inside headers. +- TLS callbacks not inside overlay. + +TLS callbacks are a common malware trick; this validator ensures the structure is sound before heuristics interpret it. + +--- + +# **3. Deterministic Heuristics Layer** +### *Heuristics interpret structural truth — they never override it.* + +Once validators establish structural truth, heuristics interpret that truth to produce higher‑level signals. + +Heuristics include: + +## **3.1 Packer Heuristics** +- UPX‑like section names. +- High‑entropy sections ≥ 7.5 with raw size ≥ 1 KB. + +## **3.2 Anti‑Debug Heuristics** +- Imports of anti‑debug APIs. +- Imports of timing APIs. +- RWX sections (structurally validated first). + +## **3.3 Import Anomaly Heuristics** +- Large import tables. +- High ratio of ordinal‑only imports. +- GUI subsystem importing kernel‑mode DLLs. + +## **3.4 Structural Anomaly Heuristics** +Every structural issue becomes a heuristic signal: + +``` +pe_structure_anomaly / +``` + +Except entropy‑specific issues, which are handled by packer heuristics. + +This ensures: + +- No duplication. +- No double‑counting. +- No contradictory signals. +- A clean separation between *facts* and *interpretation*. + +--- + +# **4. Why Determinism Matters** + +Most IOC extractors and static analysis tools suffer from: + +- nondeterministic regex engines +- inconsistent PE parsing +- version‑to‑version instability +- environment‑dependent behaviour +- heuristic drift +- false positives under adversarial input + +IOCX avoids all of this by design. + +### **Determinism gives you:** + +- **Snapshot‑stable output** — identical input → identical output. +- **Reproducibility** — critical for DFIR, SOC automation, and CI/CD. +- **Adversarial robustness** — malformed binaries cannot destabilise the engine. +- **Predictable heuristics** — heuristics interpret structural truth, not guesses. +- **Trustworthiness** — every detection is explainable and traceable. + +This is why IOCX is not “just another IOC extractor.” +It is a **structural correctness engine**. + +--- + +# **5. The IOCX Model: Structural Truth → Deterministic Heuristics → Reliable IOCs** + +The pipeline is: + +1. **Parse** the binary into a stable internal representation. +2. **Validate** every subsystem with deterministic structural validators. +3. **Record** structural issues in `analysis["structural"]`. +4. **Interpret** structural truth with deterministic heuristics. +5. **Extract** IOCs from a stable, verified structural model. + +This ensures IOC extraction is: + +- safe +- predictable +- automatable +- reproducible +- adversarial‑robust + +Exactly what DFIR teams, SOC pipelines, and CI/CD systems require. + +--- + +# **6. The IOCX Guarantee** + +IOCX guarantees: + +> **No nondeterminism. No hidden heuristics. No unstable behaviour. +> Just structural truth, interpreted deterministically.** + +This is the foundation of the entire engine. +This is why IOCX is trusted. +This is why its output is stable. +This is why it scales to automation. diff --git a/examples/__init__.py b/examples/__init__.py index e69de29..2d97278 100644 --- a/examples/__init__.py +++ b/examples/__init__.py @@ -0,0 +1,3 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + diff --git a/examples/generators/__init__.py b/examples/generators/__init__.py index e69de29..2d97278 100644 --- a/examples/generators/__init__.py +++ b/examples/generators/__init__.py @@ -0,0 +1,3 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + diff --git a/examples/generators/c/contract/layer3_adversarial/heuristic_rich.full.c b/examples/generators/c/contract/layer3_adversarial/heuristic_rich.full.c new file mode 100644 index 0000000..dc895c1 --- /dev/null +++ b/examples/generators/c/contract/layer3_adversarial/heuristic_rich.full.c @@ -0,0 +1,56 @@ +#include + +// --- Fake IOC-like strings in .data section (harmless) ---------------------- +__attribute__((section(".data"))) +const char fake_iocs[][64] = { + "example-malware.com", // fake domain + "192.0.2.123", // TEST-NET-1 IP (reserved, safe) + "abcd1234ef567890abcd1234ef567890", // fake MD5-like string + "FAKE-IOC-TEST-ONLY-1234567890", // generic test marker + "hxxp://not-a-real-domain.test/payload"// safe obfuscated URL +}; + +// --- RWX section ------------------------------------------------------------ +__attribute__((section(".rwx"))) +volatile unsigned char rwx_buffer[2048]; + +// --- UPX0 section with noisy data ------------------------------------------ +__attribute__((section("UPX0"))) +const unsigned char upx0_data[4096] = { + 0x13,0x37,0x42,0x99,0xDE,0xAD,0xBE,0xEF, + 0x01,0x23,0x45,0x67,0x89,0xAB,0xCD,0xEF, + #define P(x) x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x + P(0xAA), P(0x55), P(0xCC), P(0x33), + P(0xF0), P(0x0F), P(0x5A), P(0xA5), + #undef P +}; + +// --- Anti-debug + timing imports ------------------------------------------- +__declspec(dllimport) BOOL WINAPI IsDebuggerPresent(void); +__declspec(dllimport) BOOL WINAPI CheckRemoteDebuggerPresent(HANDLE, PBOOL); +__declspec(dllimport) VOID WINAPI OutputDebugStringA(LPCSTR); + +__declspec(dllimport) DWORD WINAPI GetTickCount(void); +__declspec(dllimport) ULONGLONG WINAPI GetTickCount64(void); +__declspec(dllimport) BOOL WINAPI QueryPerformanceCounter(LARGE_INTEGER*); + +// --- Minimal program logic -------------------------------------------------- +static void exercise_imports(void) { + volatile BOOL dbg = IsDebuggerPresent(); + BOOL remote_dbg = FALSE; + CheckRemoteDebuggerPresent(GetCurrentProcess(), &remote_dbg); + OutputDebugStringA("heuristic_rich.full: debug string"); + + volatile DWORD t = GetTickCount(); + volatile ULONGLONG t2 = GetTickCount64(); + LARGE_INTEGER li; + QueryPerformanceCounter(&li); + + rwx_buffer[0] = (unsigned char)t; +} + +int WINAPI WinMain(HINSTANCE h, HINSTANCE p, LPSTR c, int n) { + exercise_imports(); + MessageBoxA(NULL, "heuristic_rich.full.exe", "Heuristic Test", MB_OK); + return 0; +} diff --git a/examples/generators/go/generate_obfuscated_go_pe.py b/examples/generators/go/generate_obfuscated_go_pe.py index 091abff..e6b5f12 100644 --- a/examples/generators/go/generate_obfuscated_go_pe.py +++ b/examples/generators/go/generate_obfuscated_go_pe.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + """ generate_obfuscated_pe.py Creates a harmless Windows PE file with obfuscated IOCs for testing deobfuscation logic. diff --git a/examples/generators/go/generate_synthetic_go_pe.py b/examples/generators/go/generate_synthetic_go_pe.py index cd100d0..c5ab604 100644 --- a/examples/generators/go/generate_synthetic_go_pe.py +++ b/examples/generators/go/generate_synthetic_go_pe.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + """ generate_synthetic_pe.py Creates a harmless Windows PE file containing embedded fake IOCs. diff --git a/examples/generators/python/__init__.py b/examples/generators/python/__init__.py index e69de29..2d97278 100644 --- a/examples/generators/python/__init__.py +++ b/examples/generators/python/__init__.py @@ -0,0 +1,3 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + diff --git a/examples/generators/python/generate_analysis_fixtures.py b/examples/generators/python/generate_analysis_fixtures.py index 5667db3..30ce5d5 100644 --- a/examples/generators/python/generate_analysis_fixtures.py +++ b/examples/generators/python/generate_analysis_fixtures.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + #!/usr/bin/env python3 """ Generate synthetic PE fixtures for >=v.0.6.0 IOCX tests. diff --git a/examples/generators/python/generate_analysis_fixtures_v2.py b/examples/generators/python/generate_analysis_fixtures_v2.py index 6b75e80..87ee7be 100644 --- a/examples/generators/python/generate_analysis_fixtures_v2.py +++ b/examples/generators/python/generate_analysis_fixtures_v2.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import os import struct from pathlib import Path diff --git a/examples/generators/python/generate_obfuscated_pe.py b/examples/generators/python/generate_obfuscated_pe.py index b2d454a..3af7b49 100644 --- a/examples/generators/python/generate_obfuscated_pe.py +++ b/examples/generators/python/generate_obfuscated_pe.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + """ generate_obfuscated_pe.py Creates a harmless Windows PE file containing embedded fake IOCs. diff --git a/examples/generators/python/generate_synthetic_pe.py b/examples/generators/python/generate_synthetic_pe.py index f5e81ae..2ef074d 100644 --- a/examples/generators/python/generate_synthetic_pe.py +++ b/examples/generators/python/generate_synthetic_pe.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + """ generate_synthetic_pe.py Creates a harmless Windows PE file containing embedded fake IOCs. diff --git a/iocx/__init__.py b/iocx/__init__.py index e69de29..2d97278 100644 --- a/iocx/__init__.py +++ b/iocx/__init__.py @@ -0,0 +1,3 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + diff --git a/iocx/analysis/extended.py b/iocx/analysis/extended.py index c99afdf..e5e127e 100644 --- a/iocx/analysis/extended.py +++ b/iocx/analysis/extended.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from dataclasses import asdict from iocx.models import Detection diff --git a/iocx/analysis/heuristics.py b/iocx/analysis/heuristics.py index 6006714..e53b3e5 100644 --- a/iocx/analysis/heuristics.py +++ b/iocx/analysis/heuristics.py @@ -1,6 +1,9 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from typing import Any, Dict, List, Optional from iocx.models import Detection - +from iocx.reason_codes import ReasonCodes # Thresholds HIGH_ENTROPY_THRESHOLD = 7.5 @@ -36,6 +39,12 @@ "hal.dll", } +_SKIP_ENTROPY = { + ReasonCodes.ENTROPY_HIGH_SECTION, + ReasonCodes.ENTROPY_HIGH_OVERLAY, + ReasonCodes.ENTROPY_UNIFORM_ACROSS_SECTIONS, +} + def _det(value: str, reason: str, metadata: Optional[Dict[str, Any]] = None) -> Detection: return Detection( @@ -49,32 +58,25 @@ def _det(value: str, reason: str, metadata: Optional[Dict[str, Any]] = None) -> def _get_extended(analysis: Dict[str, Any], key: str) -> List[Dict[str, Any]]: return [ - e for e in analysis["extended"] + e for e in analysis.get("extended", []) if isinstance(e, dict) and e.get("value") == key and isinstance(e.get("metadata"), dict) ] -def _map_rva_to_section(sections: List[Dict[str, Any]], rva: int) -> Optional[Dict[str, Any]]: - for sec in sections: - va = sec.get("virtual_address") - vs = sec.get("virtual_size") - if not isinstance(va, int) or not isinstance(vs, int): - continue - if va <= rva < va + vs: - return sec - return None - - def _analyse_packer(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[Detection]: out: List[Detection] = [] - for sec in analysis["sections"]: + for sec in analysis.get("sections", []): name = (sec.get("name") or "").lower() if "upx" in name: - out.append(_det("packer_suspected", "packer_section_name", {"section": sec["name"]})) + out.append(_det( + "packer_suspected", + "packer_section_name", + {"section": sec.get("name")}, + )) entropy = sec.get("entropy") raw_size = sec.get("raw_size") @@ -85,7 +87,7 @@ def _analyse_packer(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[ "packer_suspected", "high_entropy_section", { - "section": sec["name"], + "section": sec.get("name"), "entropy": float(entropy), "raw_size": raw_size, }, @@ -94,32 +96,6 @@ def _analyse_packer(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[ return out -def _analyse_tls(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[Detection]: - out: List[Detection] = [] - - for entry in _get_extended(analysis, "tls_directory"): - meta = entry["metadata"] - start = meta.get("start_address") - end = meta.get("end_address") - callbacks = meta.get("callbacks") - - if start is None or end is None or callbacks is None: - continue - - if not (start <= callbacks < end): - out.append(_det( - "tls_callback_anomaly", - "callback_outside_tls_range", - { - "callbacks": callbacks, - "start_address": start, - "end_address": end, - }, - )) - - return out - - def _analyse_anti_debug(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[Detection]: out: List[Detection] = [] @@ -141,7 +117,8 @@ def _analyse_anti_debug(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> L {"dll": dll, "function": func}, )) - for sec in analysis["sections"]: + # RWX sections are now structurally validated, but still interesting for anti-debug + for sec in analysis.get("sections", []): chars = sec.get("characteristics") if not isinstance(chars, int): continue @@ -153,7 +130,7 @@ def _analyse_anti_debug(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> L out.append(_det( "anti_debug_heuristic", "rwx_section", - {"section": sec["name"], "characteristics": chars}, + {"section": sec.get("name"), "characteristics": chars}, )) return out @@ -202,237 +179,38 @@ def _analyse_import_anomalies(metadata: Dict[str, Any], analysis: Dict[str, Any] return out -def _analyse_signature(metadata: Dict[str, Any]) -> List[Detection]: - out: List[Detection] = [] - - has_sig = bool(metadata.get("has_signature")) - sigs = metadata.get("signatures") or [] - - if has_sig and not sigs: - out.append(_det( - "signature_anomaly", - "signature_flag_set_but_no_metadata", - )) - - return out - - -def _analyse_section_overlap(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[Detection]: +def _analyse_structural(analysis: Dict[str, Any]) -> List[Detection]: + """ + Interpret structural validator output from analysis["structural"] and + surface it as pe_structure_anomaly detections. + """ out: List[Detection] = [] - sections = analysis.get("sections", []) - for i in range(len(sections)): - a = sections[i] - va_a = a.get("virtual_address") - vs_a = a.get("virtual_size") - if not isinstance(va_a, int) or not isinstance(vs_a, int): - continue - end_a = va_a + vs_a - - for j in range(i + 1, len(sections)): - b = sections[j] - va_b = b.get("virtual_address") - vs_b = b.get("virtual_size") - if not isinstance(va_b, int) or not isinstance(vs_b, int): - continue - end_b = va_b + vs_b - - if max(va_a, va_b) < min(end_a, end_b): - out.append( - _det( - "pe_structure_anomaly", - "section_overlap", - {"section_a": a.get("name"), "section_b": b.get("name")}, - ) - ) - - return out - - -def _analyse_section_alignment(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[Detection]: - out: List[Detection] = [] - - opt = metadata.get("optional_header") or {} - file_alignment = opt.get("file_alignment") - if not isinstance(file_alignment, int) or file_alignment <= 0: + structural = analysis.get("structural") or {} + if not isinstance(structural, dict): return out - for sec in analysis.get("sections", []): - raw_addr = sec.get("raw_address") - raw_size = sec.get("raw_size") - if not isinstance(raw_addr, int) or not isinstance(raw_size, int): + for category, issues in structural.items(): + if not isinstance(issues, list): continue - if raw_addr % file_alignment != 0 or raw_size % file_alignment != 0: - out.append( - _det( - "pe_structure_anomaly", - "section_raw_misaligned", - { - "section": sec.get("name"), - "raw_address": raw_addr, - "raw_size": raw_size, - "file_alignment": file_alignment, - }, - ) - ) - - return out - - -def _analyse_optional_header_consistency(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[Detection]: - out: List[Detection] = [] - - opt = metadata.get("optional_header") or {} - size_of_image = opt.get("size_of_image") - if not isinstance(size_of_image, int) or size_of_image <= 0: - return out - - max_end = 0 - for sec in analysis.get("sections", []): - va = sec.get("virtual_address") - vs = sec.get("virtual_size") - if not isinstance(va, int) or not isinstance(vs, int): - continue - max_end = max(max_end, va + vs) - - if max_end > size_of_image: - out.append( - _det( - "pe_structure_anomaly", - "optional_header_inconsistent_size", - {"size_of_image": size_of_image, "max_section_end": max_end}, - ) - ) - - return out - - -def _analyse_entrypoint_mapping(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[Detection]: - out: List[Detection] = [] - - header_ext = _get_extended(analysis, "header") - if not header_ext: - return out - - ep = header_ext[0]["metadata"].get("entry_point") - if not isinstance(ep, int): - return out - - sections = analysis.get("sections", []) - if not sections: - return out - - if _map_rva_to_section(sections, ep) is None: - out.append( - _det( - "pe_structure_anomaly", - "entrypoint_out_of_bounds", - {"entry_point": ep}, - ) - ) - - return out - - -def _analyse_data_directory_anomalies(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[Detection]: - out: List[Detection] = [] - - dirs = analysis.get("data_directories") or metadata.get("data_directories") - opt = metadata.get("optional_header") or {} - size_of_image = opt.get("size_of_image") - - if not isinstance(size_of_image, int) or not isinstance(dirs, list): - return out - - # Out-of-range and zero/size mismatch - for d in dirs: - rva = d.get("rva") - size = d.get("size") - name = d.get("name") or d.get("index") - if not isinstance(rva, int) or not isinstance(size, int): - continue - - if size > 0 and rva == 0: - out.append( - _det( - "pe_structure_anomaly", - "data_directory_zero_rva_nonzero_size", - {"directory": name, "rva": rva, "size": size}, - ) - ) - - if rva + size > size_of_image: - out.append( - _det( - "pe_structure_anomaly", - "data_directory_out_of_range", - { - "directory": name, - "rva": rva, - "size": size, - "size_of_image": size_of_image, - }, - ) - ) - - # Overlaps - for i in range(len(dirs)): - a = dirs[i] - rva_a = a.get("rva") - size_a = a.get("size") - if not isinstance(rva_a, int) or not isinstance(size_a, int): - continue - end_a = rva_a + size_a - - for j in range(i + 1, len(dirs)): - b = dirs[j] - rva_b = b.get("rva") - size_b = b.get("size") - if not isinstance(rva_b, int) or not isinstance(size_b, int): + for issue in issues: + if not isinstance(issue, dict): continue - end_b = rva_b + size_b - - if max(rva_a, rva_b) < min(end_a, end_b): - out.append( - _det( - "pe_structure_anomaly", - "data_directory_overlap", - { - "directory_a": a.get("name") or a.get("index"), - "directory_b": b.get("name") or b.get("index"), - }, - ) - ) - - return out - - -def _analyse_import_directory_validity(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[Detection]: - out: List[Detection] = [] - dirs = analysis.get("data_directories") or metadata.get("data_directories") - sections = analysis.get("sections", []) - if not isinstance(dirs, list) or not sections: - return out + reason = issue.get("issue") + details = issue.get("details") or {} - for d in dirs: - name = (d.get("name") or "").lower() - idx = d.get("index") - if name == "import" or idx == 1: - rva = d.get("rva") - size = d.get("size") - if not isinstance(rva, int) or not isinstance(size, int): + if reason in _SKIP_ENTROPY: continue - if _map_rva_to_section(sections, rva) is None: - out.append( - _det( - "pe_structure_anomaly", - "import_rva_invalid", - {"rva": rva, "size": size}, - ) - ) + metadata = {**details} + + out.append(_det( + "pe_structure_anomaly", + reason or "unknown_structural_issue", + metadata, + )) return out @@ -440,17 +218,12 @@ def _analyse_import_directory_validity(metadata: Dict[str, Any], analysis: Dict[ def analyse_pe_heuristics(metadata: Dict[str, Any], analysis: Dict[str, Any]) -> List[Detection]: out: List[Detection] = [] + # Behavioural / semantic heuristics out.extend(_analyse_packer(metadata, analysis)) - out.extend(_analyse_tls(metadata, analysis)) out.extend(_analyse_anti_debug(metadata, analysis)) out.extend(_analyse_import_anomalies(metadata, analysis)) - out.extend(_analyse_signature(metadata)) - - out.extend(_analyse_section_overlap(metadata, analysis)) - out.extend(_analyse_section_alignment(metadata, analysis)) - out.extend(_analyse_optional_header_consistency(metadata, analysis)) - out.extend(_analyse_entrypoint_mapping(metadata, analysis)) - out.extend(_analyse_data_directory_anomalies(metadata, analysis)) - out.extend(_analyse_import_directory_validity(metadata, analysis)) + + # Structural anomalies + out.extend(_analyse_structural(analysis)) return out diff --git a/iocx/analysis/obfuscation.py b/iocx/analysis/obfuscation.py index 0f03675..aae13d2 100644 --- a/iocx/analysis/obfuscation.py +++ b/iocx/analysis/obfuscation.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from __future__ import annotations import math diff --git a/iocx/cli/__init__.py b/iocx/cli/__init__.py index fa81ada..972e120 100644 --- a/iocx/cli/__init__.py +++ b/iocx/cli/__init__.py @@ -1 +1,4 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + # empty file diff --git a/iocx/cli/main.py b/iocx/cli/main.py index a568a0d..c5eb842 100644 --- a/iocx/cli/main.py +++ b/iocx/cli/main.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import argparse import json import sys diff --git a/iocx/detectors/__init__.py b/iocx/detectors/__init__.py index 6bb8044..28aad08 100644 --- a/iocx/detectors/__init__.py +++ b/iocx/detectors/__init__.py @@ -1,2 +1,5 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from .registry import register_detector, get_detector, all_detectors from . import extractors diff --git a/iocx/detectors/extractors/__init__.py b/iocx/detectors/extractors/__init__.py index 3b180b5..c3918b2 100644 --- a/iocx/detectors/extractors/__init__.py +++ b/iocx/detectors/extractors/__init__.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + # Manually import each detector module so they register themselves from . import urls from . import ips diff --git a/iocx/detectors/extractors/base64.py b/iocx/detectors/extractors/base64.py index 3991d22..d116f38 100644 --- a/iocx/detectors/extractors/base64.py +++ b/iocx/detectors/extractors/base64.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import re import base64 import string diff --git a/iocx/detectors/extractors/crypto.py b/iocx/detectors/extractors/crypto.py index 29df267..9c5ef1f 100644 --- a/iocx/detectors/extractors/crypto.py +++ b/iocx/detectors/extractors/crypto.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import re import hashlib from ..registry import register_detector diff --git a/iocx/detectors/extractors/emails.py b/iocx/detectors/extractors/emails.py index 4031275..b4f2610 100644 --- a/iocx/detectors/extractors/emails.py +++ b/iocx/detectors/extractors/emails.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import re from ..registry import register_detector from ...models import Detection diff --git a/iocx/detectors/extractors/filepaths.py b/iocx/detectors/extractors/filepaths.py index 9f55121..4e3d792 100644 --- a/iocx/detectors/extractors/filepaths.py +++ b/iocx/detectors/extractors/filepaths.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import re from ..registry import register_detector from ...models import Detection diff --git a/iocx/detectors/extractors/hashes.py b/iocx/detectors/extractors/hashes.py index d63ff51..a7488c8 100644 --- a/iocx/detectors/extractors/hashes.py +++ b/iocx/detectors/extractors/hashes.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import re from ..registry import register_detector from ...models import Detection diff --git a/iocx/detectors/extractors/ips.py b/iocx/detectors/extractors/ips.py index 0cd92b0..4220e18 100644 --- a/iocx/detectors/extractors/ips.py +++ b/iocx/detectors/extractors/ips.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import re import ipaddress from ..registry import register_detector diff --git a/iocx/detectors/extractors/urls/__init__.py b/iocx/detectors/extractors/urls/__init__.py index ad6527b..549123d 100644 --- a/iocx/detectors/extractors/urls/__init__.py +++ b/iocx/detectors/extractors/urls/__init__.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from ...registry import register_detector from ....models import Detection from .strict_url import extract_strict_urls diff --git a/iocx/detectors/extractors/urls/bare_domain.py b/iocx/detectors/extractors/urls/bare_domain.py index 70a5608..e37618d 100644 --- a/iocx/detectors/extractors/urls/bare_domain.py +++ b/iocx/detectors/extractors/urls/bare_domain.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import re from ....models import Detection from .homoglyph_punycode import _punycode_decodes_to_unicode, _decode_punycode, _detect_script, _contains_confusables diff --git a/iocx/detectors/extractors/urls/deobfuscate.py b/iocx/detectors/extractors/urls/deobfuscate.py index df9856c..4ecd0be 100644 --- a/iocx/detectors/extractors/urls/deobfuscate.py +++ b/iocx/detectors/extractors/urls/deobfuscate.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import re PATTERNS = [ diff --git a/iocx/detectors/extractors/urls/homoglyph_punycode.py b/iocx/detectors/extractors/urls/homoglyph_punycode.py index b8c6b01..5d7b3d5 100644 --- a/iocx/detectors/extractors/urls/homoglyph_punycode.py +++ b/iocx/detectors/extractors/urls/homoglyph_punycode.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import functools import idna import unicodedata diff --git a/iocx/detectors/extractors/urls/normalise.py b/iocx/detectors/extractors/urls/normalise.py index ae3486f..1f71c35 100644 --- a/iocx/detectors/extractors/urls/normalise.py +++ b/iocx/detectors/extractors/urls/normalise.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from urllib.parse import urlparse, urlunparse def normalise_url(url: str) -> str: diff --git a/iocx/detectors/extractors/urls/strict_url.py b/iocx/detectors/extractors/urls/strict_url.py index 779f8f4..710e617 100644 --- a/iocx/detectors/extractors/urls/strict_url.py +++ b/iocx/detectors/extractors/urls/strict_url.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import re from ....models import Detection diff --git a/iocx/detectors/registry.py b/iocx/detectors/registry.py index e8b8df9..a2c0a59 100644 --- a/iocx/detectors/registry.py +++ b/iocx/detectors/registry.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from importlib import import_module _DETECTORS = {} diff --git a/iocx/engine.py b/iocx/engine.py index 4190fd9..ca21cc7 100644 --- a/iocx/engine.py +++ b/iocx/engine.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from __future__ import annotations import os import logging @@ -7,12 +10,17 @@ from .utils import detect_file_type, FileType from .parsers.pe_parser import parse_pe, analyse_pe_sections, analyse_data_directories, sanitize_sections from .parsers.string_extractor import extract_strings +from .parsers.pe_resources import build_resource_structure from .detectors import all_detectors from .models import Detection, PluginContext from .plugins.loader import PluginLoader from .analysis.obfuscation import analyse_obfuscation from .analysis.extended import analyse_extended +from .validators import run_structural_validators from .analysis.heuristics import analyse_pe_heuristics +from .schemas.internal_schema import InternalMetadata +from .schemas.analysis import AnalysisDict +from .schemas.public_metadata import PublicMetadata @dataclass class EngineConfig: @@ -62,6 +70,8 @@ def __init__(self, config: Optional[EngineConfig] = None): self._analysis_level = self.config.analysis_level + self._internal_metadata: InternalMetadata = InternalMetadata() + # ---------- Public API ---------- def extract(self, path_or_text: str) -> Dict[str, Any]: @@ -88,7 +98,7 @@ def extract_from_text(self, text: str) -> Dict[str, Any]: # ---------- Pipeline stages ---------- - def _get_pe_metadata(self, path: str): + def _get_pe_metadata(self, path: str) -> Tuple[Any, PublicMetadata]: if not self.config.enable_cache: return parse_pe(path) if path not in self.cache.pe_metadata: @@ -106,6 +116,7 @@ def _get_strings(self, path: str) -> List[str]: def _pipeline_pe(self, path: str) -> Dict[str, Any]: pe, metadata = self._get_pe_metadata(path) + metadata: PublicMetadata strings = self._get_strings(path) strings.extend(metadata.get("resource_strings", [])) text = "\n".join(strings) @@ -115,6 +126,7 @@ def _pipeline_pe(self, path: str) -> Dict[str, Any]: obf = [] extended = None heuristics = [] + structural = [] # BASIC: section layout + entropy if analysis_level in ("basic", "deep", "full"): @@ -131,13 +143,25 @@ def _pipeline_pe(self, path: str) -> Dict[str, Any]: if analysis_level == "full": extended = analyse_extended(pe, metadata, text) - analysis_dict = { + file_size = len(pe.__data__) + overlay_offset = pe.get_overlay_data_start_offset() + if overlay_offset is None: + # No overlay → treat overlay as starting at EOF + overlay_offset = file_size + + analysis_dict: AnalysisDict = { "sections": section_analysis["sections"], "data_directories": section_analysis["data_directories"], "extended": extended or [], "obfuscation": [asdict(d) for d in obf], + "file_size": file_size, + "overlay_offset": overlay_offset, } + self._internal_metadata["resources_struct"] = build_resource_structure(pe) + internal: InternalMetadata = self._internal_metadata + structural = run_structural_validators(internal, metadata, analysis_dict) + analysis_dict["structural"] = structural heuristics = analyse_pe_heuristics(metadata, analysis_dict) raw = self._run_detectors(path, text) diff --git a/iocx/models.py b/iocx/models.py index b25e472..8979105 100644 --- a/iocx/models.py +++ b/iocx/models.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from dataclasses import dataclass, field from pathlib import Path from typing import Any, Dict diff --git a/iocx/parsers/__init__.py b/iocx/parsers/__init__.py index 0449216..fc8d88e 100644 --- a/iocx/parsers/__init__.py +++ b/iocx/parsers/__init__.py @@ -1,2 +1,5 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from .pe_parser import parse_pe from .string_extractor import extract_strings diff --git a/iocx/parsers/language_map.py b/iocx/parsers/language_map.py index a756ff0..18068a8 100644 --- a/iocx/parsers/language_map.py +++ b/iocx/parsers/language_map.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + PRIMARY_LANG = { 0x01: "ar", 0x02: "bg", 0x03: "ca", 0x04: "zh", 0x05: "cs", 0x06: "da", 0x07: "de", 0x08: "el", 0x09: "en", 0x0A: "es", diff --git a/iocx/parsers/pe_parser.py b/iocx/parsers/pe_parser.py index d870ec6..7c76d38 100644 --- a/iocx/parsers/pe_parser.py +++ b/iocx/parsers/pe_parser.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pefile import math import base64 diff --git a/iocx/parsers/pe_resources.py b/iocx/parsers/pe_resources.py new file mode 100644 index 0000000..198c1c5 --- /dev/null +++ b/iocx/parsers/pe_resources.py @@ -0,0 +1,108 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import Dict, Any + + +def build_resource_structure(pe) -> Dict[str, Any]: + """ + Build a structural resource tree suitable for validation. + """ + if not hasattr(pe, "DIRECTORY_ENTRY_RESOURCE"): + return None + + # Resource directory entry index (IMAGE_DIRECTORY_ENTRY_RESOURCE = 2) + res_dir = pe.OPTIONAL_HEADER.DATA_DIRECTORY[2] + base_rva = res_dir.VirtualAddress + + root_dir = pe.DIRECTORY_ENTRY_RESOURCE + + def build_directory(node, entry_struct=None) -> Dict[str, Any]: + """ + node: pefile.ResourceDirData + entry_struct: the IMAGE_RESOURCE_DIRECTORY_ENTRY struct that pointed to this directory + """ + + # Directory RVA is derived from the entry that referenced it + if entry_struct: + # Mask off high bit (0x80000000) which indicates "is directory" + offset = entry_struct.OffsetToData & 0x7FFFFFFF + rva = base_rva + offset + else: + # Root directory: RVA is simply the base RVA + rva = base_rva + + # Directory size = 16-byte header + 8 bytes per entry + size = 16 + 8 * len(node.entries) + + entries = [] + for e in node.entries: + name = str(e.name) if getattr(e, "name", None) is not None else None + entry_id = getattr(e, "id", None) + + if hasattr(e, "directory") and e.directory is not None: + # Subdirectory + subdir = build_directory(e.directory, e.struct) + entries.append( + { + "name": name, + "id": entry_id, + "is_directory": True, + "directory": subdir, + "data_rva": None, + "data_size": None, + "raw_offset": None, + } + ) + else: + # Data entry + data = e.data + d = data.struct + data_rva = d.OffsetToData + data_size = d.Size + raw_offset = pe.get_offset_from_rva(data_rva) + + entries.append( + { + "name": name, + "id": entry_id, + "is_directory": False, + "directory": None, + "data_rva": data_rva, + "data_size": data_size, + "raw_offset": raw_offset, + } + ) + + return { + "rva": rva, + "size": size, + "entries": entries, + } + + root = build_directory(root_dir) + + # Collect string table entries (RT_STRING = 6) + string_tables = [] + try: + RT_STRING = 6 + for type_entry in root_dir.entries: + if getattr(type_entry, "id", None) == RT_STRING and hasattr(type_entry, "directory"): + for name_entry in type_entry.directory.entries: + if hasattr(name_entry, "directory"): + for lang_entry in name_entry.directory.entries: + if hasattr(lang_entry, "data"): + d = lang_entry.data.struct + string_tables.append( + { + "rva": d.OffsetToData, + "size": d.Size, + } + ) + except Exception: # pragma: no cover + pass + + return { + "root": root, + "string_tables": string_tables, + } diff --git a/iocx/parsers/string_extractor.py b/iocx/parsers/string_extractor.py index 6f02a4d..9dacac4 100644 --- a/iocx/parsers/string_extractor.py +++ b/iocx/parsers/string_extractor.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import re ASCII_RE = rb"[ -~]{%d,}" diff --git a/iocx/plugins/api.py b/iocx/plugins/api.py index 735aa62..07ddb90 100644 --- a/iocx/plugins/api.py +++ b/iocx/plugins/api.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from typing import Protocol, List from .metadata import PluginMetadata from iocx.models import Detection, PluginContext diff --git a/iocx/plugins/loader.py b/iocx/plugins/loader.py index f2a26d7..a306ab1 100644 --- a/iocx/plugins/loader.py +++ b/iocx/plugins/loader.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import importlib.metadata import importlib.util from pathlib import Path diff --git a/iocx/plugins/metadata.py b/iocx/plugins/metadata.py index c6bb35f..b1fc10c 100644 --- a/iocx/plugins/metadata.py +++ b/iocx/plugins/metadata.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from dataclasses import dataclass from typing import List diff --git a/iocx/plugins/registry.py b/iocx/plugins/registry.py index 6a3cd7b..b793b74 100644 --- a/iocx/plugins/registry.py +++ b/iocx/plugins/registry.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from typing import List, Dict from .api import IOCXPlugin diff --git a/iocx/reason_codes.py b/iocx/reason_codes.py new file mode 100644 index 0000000..6ff0d79 --- /dev/null +++ b/iocx/reason_codes.py @@ -0,0 +1,114 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +class ReasonCodes: + # --- Section anomalies --- + SECTION_RWX = "section_rwx" + SECTION_NON_EXECUTABLE_CODE_LIKE = "section_non_executable_code_like" + SECTION_NAME_NON_ASCII = "section_name_non_ascii" + SECTION_NAME_EMPTY_OR_PADDING = "section_name_empty_or_padding" + SECTION_IMPOSSIBLE_FLAGS = "section_impossible_flags" + SECTION_OVERLAP = "section_overlap" + SECTION_RAW_MISALIGNED = "section_raw_misaligned" + + SECTION_RAW_OVERLAP = "section_raw_overlap" + SECTION_OVERLAPS_HEADERS = "section_overlaps_headers" + SECTION_OUT_OF_ORDER_RAW = "section_out_of_order_raw" + SECTION_OUT_OF_ORDER_VIRTUAL = "section_out_of_order_virtual" + SECTION_ZERO_LENGTH = "section_zero_length" + SECTION_DISCARDABLE_CODE = "section_discardable_code" + SECTION_FLAGS_INCONSISTENT = "section_flags_inconsistent" + SECTION_CODELIKE_NAME_NOT_EXECUTABLE = "section_codelike_name_not_executable" + + # --- Entrypoint issues --- + ENTRYPOINT_OUT_OF_BOUNDS = "entrypoint_out_of_bounds" + ENTRYPOINT_SECTION_NOT_EXECUTABLE = "entrypoint_section_not_executable" + ENTRYPOINT_IN_TRUNCATED_REGION = "entrypoint_in_truncated_region" + ENTRYPOINT_IN_OVERLAY = "entrypoint_in_overlay" + + ENTRYPOINT_ZERO_OR_NEGATIVE = "entrypoint_zero_or_negative" + ENTRYPOINT_IN_HEADERS = "entrypoint_in_headers" + ENTRYPOINT_IN_NON_CODE_SECTION = "entrypoint_in_non_code_section" + ENTRYPOINT_IN_DISCARDABLE_SECTION = "entrypoint_in_discardable_section" + + # --- RVA / directory inconsistencies --- + DATA_DIRECTORY_ZERO_RVA_NONZERO_SIZE = "data_directory_zero_rva_nonzero_size" + DATA_DIRECTORY_OUT_OF_RANGE = "data_directory_out_of_range" + DATA_DIRECTORY_OVERLAP = "data_directory_overlap" + DATA_DIRECTORY_ZERO_SIZE_UNEXPECTED = "data_directory_zero_size_unexpected" + DATA_DIRECTORY_INVALID_RANGE = "data_directory_invalid_range" + DATA_DIRECTORY_IN_HEADERS = "data_directory_in_headers" + DATA_DIRECTORY_IN_OVERLAY = "data_directory_in_overlay" + DATA_DIRECTORY_NOT_MAPPED_TO_SECTION = "data_directory_not_mapped_to_section" + DATA_DIRECTORY_SPANS_MULTIPLE_SECTIONS = "data_directory_spans_multiple_sections" + IMPORT_RVA_INVALID = "import_rva_invalid" + + # --- Optional header anomalies --- + OPTIONAL_HEADER_INCONSISTENT_SIZE = "optional_header_inconsistent_size" + OPTIONAL_HEADER_INVALID_SIZE_OF_HEADERS = "optional_header_invalid_size_of_headers" + OPTIONAL_HEADER_INVALID_SECTION_ALIGNMENT = "optional_header_invalid_section_alignment" + OPTIONAL_HEADER_INVALID_FILE_ALIGNMENT = "optional_header_invalid_file_alignment" + OPTIONAL_HEADER_SIZE_FIELDS_INCONSISTENT = "optional_header_size_fields_inconsistent" + OPTIONAL_HEADER_IMAGE_BASE_MISALIGNED = "optional_header_image_base_misaligned" + OPTIONAL_HEADER_INVALID_NUMBER_OF_RVA_AND_SIZES = "optional_header_invalid_number_of_rva_and_sizes" + OPTIONAL_HEADER_SIZE_OF_IMAGE_MISALIGNED = "optional_header_size_of_image_misaligned" + + # --- TLS anomalies --- + TLS_MULTIPLE_DIRECTORIES = "tls_multiple_directories" + TLS_INVALID_RANGE = "tls_invalid_range" + TLS_ZERO_LENGTH_DIRECTORY = "tls_zero_length_directory" + TLS_CALLBACKS_MISSING = "tls_callbacks_missing" + + TLS_CALLBACK_OUTSIDE_RANGE = "callback_outside_tls_range" + TLS_CALLBACK_NOT_MAPPED_TO_SECTION = "tls_callback_not_mapped_to_section" + TLS_CALLBACK_IN_NON_EXECUTABLE_SECTION = "tls_callback_in_non_executable_section" + TLS_CALLBACK_IN_HEADERS = "tls_callback_in_headers" + TLS_CALLBACK_IN_OVERLAY = "tls_callback_in_overlay" + + # (future extension) + TLS_CALLBACK_ARRAY_NOT_TERMINATED = "tls_callback_array_not_terminated" + + # --- Signature anomalies --- + SIGNATURE_FLAG_SET_BUT_NO_METADATA = "signature_flag_set_but_no_metadata" + SIGNATURE_PRESENT_BUT_FLAG_NOT_SET = "signature_present_but_flag_not_set" + + SIGNATURE_MULTIPLE_CERTIFICATES = "signature_multiple_certificates" + + SIGNATURE_INVALID_LENGTH = "signature_invalid_length" + SIGNATURE_INVALID_REVISION = "signature_invalid_revision" + SIGNATURE_INVALID_TYPE = "signature_invalid_type" + + SIGNATURE_OUT_OF_FILE_BOUNDS = "signature_out_of_file_bounds" + SIGNATURE_OVERLAPS_OTHER_DATA = "signature_overlaps_other_data" + + # --- Entropy anomalies --- + ENTROPY_HIGH_SECTION = "entropy_high_section" + ENTROPY_HIGH_OVERLAY = "entropy_high_overlay" + ENTROPY_UNIFORM_ACROSS_SECTIONS = "entropy_uniform_across_sections" + + ENTROPY_VERY_LOW_SECTION = "entropy_very_low_section" + + ENTROPY_HIGH_RESOURCES = "entropy_high_resources" + ENTROPY_HIGH_RELOCATIONS = "entropy_high_relocations" + ENTROPY_HIGH_IMPORTS = "entropy_high_imports" + ENTROPY_HIGH_TLS = "entropy_high_tls" + ENTROPY_HIGH_CERTIFICATE = "entropy_high_certificate" + + # --- Resource directory anomalies --- + RESOURCE_DIRECTORY_OUT_OF_BOUNDS = "resource_directory_out_of_bounds" + RESOURCE_DIRECTORY_LOOP = "resource_directory_loop" + RESOURCE_ENTRY_OUT_OF_BOUNDS = "resource_entry_out_of_bounds" + RESOURCE_DIRECTORY_ZERO_LENGTH = "resource_directory_zero_length" + + # --- Resource data anomalies --- + RESOURCE_DATA_OUT_OF_BOUNDS = "resource_data_out_of_bounds" + RESOURCE_DATA_OVERLAPS_OTHER_DATA = "resource_data_overlaps_other_data" + + # --- Resource string-table anomalies --- + RESOURCE_STRING_TABLE_CORRUPT = "resource_string_table_corrupt" + + # --- Packer heuristics (interpretation layer) --- + PACKER_SECTION_NAME = "packer_section_name" + PACKER_HIGH_ENTROPY_SECTION = "high_entropy_section" + PACKER_HIGH_ENTROPY_OVERLAY = "high_entropy_overlay" + PACKER_UNIFORM_HIGH_ENTROPY_PATTERN = "uniform_high_entropy_pattern" diff --git a/iocx/schemas/__init__.py b/iocx/schemas/__init__.py new file mode 100644 index 0000000..2d97278 --- /dev/null +++ b/iocx/schemas/__init__.py @@ -0,0 +1,3 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + diff --git a/iocx/schemas/analysis.py b/iocx/schemas/analysis.py new file mode 100644 index 0000000..da95765 --- /dev/null +++ b/iocx/schemas/analysis.py @@ -0,0 +1,41 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import TypedDict, List, Dict, Any + +class SectionInfo(TypedDict): + name: str + raw_size: int + virtual_size: int + characteristics: int + entropy: float + raw_address: int + virtual_address: int + +class DataDirectoryInfo(TypedDict): + index: int + name: str | None + rva: int + size: int + +class ObfuscationHint(TypedDict): + value: str + start: int + end: int + category: str + metadata: Dict[str, Any] + +class ExtendedDetection(TypedDict): + value: str + start: int + end: int + category: str + metadata: Dict[str, Any] + +class AnalysisDict(TypedDict): + sections: List[SectionInfo] + data_directories: List[DataDirectoryInfo] + extended: List[ExtendedDetection] + obfuscation: List[ObfuscationHint] + file_size: int + overlay_offset: int diff --git a/iocx/schemas/internal_schema.py b/iocx/schemas/internal_schema.py new file mode 100644 index 0000000..6df9ef4 --- /dev/null +++ b/iocx/schemas/internal_schema.py @@ -0,0 +1,43 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import TypedDict, List, Dict, Any + +# ------------------------- +# Resource directory schema +# ------------------------- + +class ResourceDirectoryNode(TypedDict): + rva: int + size: int + entries: List[Any] # directory or data entries + + +class ResourceDataEntry(TypedDict): + is_directory: bool + data_rva: int + data_size: int + raw_offset: int + + +class ResourceDirectoryEntry(TypedDict): + is_directory: bool + directory: ResourceDirectoryNode + + +class ResourceStringTable(TypedDict): + rva: int + size: int + + +class ResourcesStruct(TypedDict): + root: ResourceDirectoryNode + string_tables: List[ResourceStringTable] + + +# ------------------------- +# Internal metadata schema +# ------------------------- + +class InternalMetadata(TypedDict, total=False): + resources_struct: ResourcesStruct diff --git a/iocx/schemas/public_metadata.py b/iocx/schemas/public_metadata.py new file mode 100644 index 0000000..8669915 --- /dev/null +++ b/iocx/schemas/public_metadata.py @@ -0,0 +1,87 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import TypedDict, List, Dict, Any, Optional + + +class TLSInfo(TypedDict, total=False): + start_address: Optional[int] + end_address: Optional[int] + callbacks: Optional[List[int]] + + +class HeaderInfo(TypedDict, total=False): + entry_point: Optional[int] + image_base: Optional[int] + subsystem: Optional[int] + timestamp: Optional[int] + machine: Optional[int] + characteristics: Optional[int] + + +class OptionalHeaderInfo(TypedDict, total=False): + section_alignment: Optional[int] + file_alignment: Optional[int] + size_of_image: Optional[int] + size_of_headers: Optional[int] + linker_version: Optional[str] + os_version: Optional[str] + subsystem_version: Optional[str] + + +class RichHeaderInfo(TypedDict, total=False): + # Rich headers vary widely; keep flexible + raw: Any + decoded: Any + + +class ImportEntry(TypedDict): + dll: str + function: Optional[str] + ordinal: Optional[int] + + +class ExportEntry(TypedDict): + name: Optional[str] + ordinal: Optional[int] + forwarder: Optional[str] + + +class ResourceEntry(TypedDict): + type: str + name: Optional[str] + language: Optional[str] + size: int + entropy: float + rva: int + raw_offset: int + + +class PublicMetadata(TypedDict, total=False): + file_type: str + + # High‑level lists + imports: List[str] + sections: List[Dict[str, Any]] + resources: List[ResourceEntry] + resource_strings: List[str] + + # Detailed import structures + import_details: List[ImportEntry] + delayed_imports: List[ImportEntry] + bound_imports: List[Dict[str, Any]] + + # Exports + exports: List[ExportEntry] + + # TLS + tls: TLSInfo + + # Headers + header: HeaderInfo + optional_header: OptionalHeaderInfo + rich_header: Optional[RichHeaderInfo] + + # Signatures + signatures: List[Dict[str, Any]] + has_signature: bool diff --git a/iocx/utils.py b/iocx/utils.py index cc0201b..4c322fa 100644 --- a/iocx/utils.py +++ b/iocx/utils.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import magic class FileType: diff --git a/iocx/validators/__init__.py b/iocx/validators/__init__.py new file mode 100644 index 0000000..8422aeb --- /dev/null +++ b/iocx/validators/__init__.py @@ -0,0 +1,55 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import Dict, Any + +from .sections import validate_sections +from .entrypoint import validate_entrypoint +from .rva_graph import validate_rva_graph +from .optional_header import validate_optional_header +from .tls import validate_tls +from .signature import validate_signature +from .resources import validate_resources +from .entropy import validate_entropy + +STRUCTURAL_VALIDATORS = { + # Entrypoint mapping correctness + "entrypoint": validate_entrypoint, + # Section flags, names, alignment, overlap, impossible combinations + "sections": validate_sections, + # Optional header consistency (e.g., SizeOfImage) + "optional_header": validate_optional_header, + # RVA graph consistency (directory bounds, overlaps, out-of-range) + "data_directories": validate_rva_graph, + # TLS callback range correctness + "tls": validate_tls, + # Signature directory correctness + "signature": validate_signature, + # Resource directory correctness + "resources": validate_resources, + # Entropy metrics (high entropy sections, overlays, uniform patterns) + "entropy": validate_entropy, +} + +def run_structural_validators(internal, metadata, analysis): + """ + Run all structural validators in a deterministic order and return the + complete structural analysis dictionary. This output is attached to + analysis["structural"] and consumed by the heuristics layer. + + Each validator must return a List[StructuralIssue]. + """ + def call(validator): + deps = getattr(validator, "_depends_on", ("metadata", "analysis")) + + args = [] + if "internal" in deps: + args.append(internal) + if "metadata" in deps: + args.append(metadata) + if "analysis" in deps: + args.append(analysis) + + return validator(*args) + + return {name: call(fn) for name, fn in STRUCTURAL_VALIDATORS.items()} diff --git a/iocx/validators/decorators.py b/iocx/validators/decorators.py new file mode 100644 index 0000000..672cccf --- /dev/null +++ b/iocx/validators/decorators.py @@ -0,0 +1,16 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import Literal, Tuple + +Layer = Literal["internal", "metadata", "analysis"] + +def depends_on(*layers: Layer): + """ + Annotate a validator with the layers it requires. + Valid layers: "internal", "metadata", "analysis". + """ + def wrap(fn): + fn._depends_on: Tuple[Layer, ...] = layers + return fn + return wrap diff --git a/iocx/validators/entropy.py b/iocx/validators/entropy.py new file mode 100644 index 0000000..ea78300 --- /dev/null +++ b/iocx/validators/entropy.py @@ -0,0 +1,110 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import Dict, Any, List +from iocx.reason_codes import ReasonCodes +from iocx.validators.schema import StructuralIssue +from iocx.schemas.public_metadata import PublicMetadata +from iocx.schemas.analysis import AnalysisDict +from .decorators import depends_on + +HIGH_ENTROPY_THRESHOLD = 7.5 +LOW_ENTROPY_THRESHOLD = 0.2 +MIN_SECTION_SIZE_FOR_ENTROPY = 1024 +MIN_SECTION_SIZE_FOR_LOW_ENTROPY = 16384 # 16 KB – very conservative +MIN_OVERLAY_SIZE_FOR_ENTROPY = 1024 +UNIFORM_STDDEV_THRESHOLD = 0.15 + +@depends_on("metadata", "analysis") +def validate_entropy(metadata: PublicMetadata, analysis: AnalysisDict) -> List[StructuralIssue]: + issues: List[StructuralIssue] = [] + sections: List[Dict[str, Any]] = analysis.get("sections", []) or [] + + entropies: List[float] = [] + + # --------------------------------------------------------- + # 1) Per-section entropy checks + # --------------------------------------------------------- + for sec in sections: + name = sec.get("name") or "" + entropy = sec.get("entropy") + raw_size = sec.get("raw_size") + + if not isinstance(entropy, (int, float)) or not isinstance(raw_size, int): + continue + + e = float(entropy) + + if raw_size >= MIN_SECTION_SIZE_FOR_ENTROPY: + entropies.append(e) + + # High entropy + if e >= HIGH_ENTROPY_THRESHOLD: + issues.append(StructuralIssue( + issue=ReasonCodes.ENTROPY_HIGH_SECTION, + details={"section": name, "entropy": e, "raw_size": raw_size}, + )) + + # Very low entropy + if raw_size >= MIN_SECTION_SIZE_FOR_LOW_ENTROPY and e <= LOW_ENTROPY_THRESHOLD: + issues.append(StructuralIssue( + issue=ReasonCodes.ENTROPY_VERY_LOW_SECTION, + details={"section": name, "entropy": e, "raw_size": raw_size}, + )) + + # --------------------------------------------------------- + # 2) Overlay entropy + # --------------------------------------------------------- + overlay_info = analysis.get("overlay") + if isinstance(overlay_info, dict): + ov_entropy = overlay_info.get("entropy") + ov_size = overlay_info.get("size") + + if isinstance(ov_entropy, (int, float)) and isinstance(ov_size, int): + e = float(ov_entropy) + if ov_size >= MIN_OVERLAY_SIZE_FOR_ENTROPY and e >= HIGH_ENTROPY_THRESHOLD: + issues.append(StructuralIssue( + issue=ReasonCodes.ENTROPY_HIGH_OVERLAY, + details={"entropy": e, "size": ov_size}, + )) + + # --------------------------------------------------------- + # 3) Region-specific entropy (optional) + # --------------------------------------------------------- + region_entropy = analysis.get("region_entropy") or {} + + region_map = { + "resources": ReasonCodes.ENTROPY_HIGH_RESOURCES, + "relocations": ReasonCodes.ENTROPY_HIGH_RELOCATIONS, + "imports": ReasonCodes.ENTROPY_HIGH_IMPORTS, + "tls": ReasonCodes.ENTROPY_HIGH_TLS, + "certificate": ReasonCodes.ENTROPY_HIGH_CERTIFICATE, + } + + for region, reason in region_map.items(): + info = region_entropy.get(region) + if isinstance(info, dict): + e = info.get("entropy") + size = info.get("size") + if isinstance(e, (int, float)) and isinstance(size, int): + if size >= MIN_SECTION_SIZE_FOR_ENTROPY and e >= HIGH_ENTROPY_THRESHOLD: + issues.append(StructuralIssue( + issue=reason, + details={"entropy": float(e), "size": size}, + )) + + # --------------------------------------------------------- + # 4) Uniform entropy across sections + # --------------------------------------------------------- + if len(entropies) >= 2: + mean = sum(entropies) / len(entropies) + var = sum((e - mean) ** 2 for e in entropies) / len(entropies) + stddev = var ** 0.5 + + if mean >= HIGH_ENTROPY_THRESHOLD and stddev <= UNIFORM_STDDEV_THRESHOLD: + issues.append(StructuralIssue( + issue=ReasonCodes.ENTROPY_UNIFORM_ACROSS_SECTIONS, + details={"mean_entropy": mean, "stddev_entropy": stddev, "count": len(entropies)}, + )) + + return issues diff --git a/iocx/validators/entrypoint.py b/iocx/validators/entrypoint.py new file mode 100644 index 0000000..78dc9df --- /dev/null +++ b/iocx/validators/entrypoint.py @@ -0,0 +1,187 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import Dict, Any, List, Optional +from iocx.reason_codes import ReasonCodes +from iocx.validators.schema import StructuralIssue +from iocx.schemas.public_metadata import PublicMetadata +from iocx.schemas.analysis import AnalysisDict +from .decorators import depends_on + +IMAGE_SCN_CNT_CODE = 0x00000020 +IMAGE_SCN_MEM_EXECUTE = 0x20000000 +IMAGE_SCN_MEM_WRITE = 0x80000000 +IMAGE_SCN_MEM_DISCARDABLE = 0x02000000 + + +def _map_rva_to_section(sections: List[Dict[str, Any]], rva: int) -> Optional[Dict[str, Any]]: + """ + Map an RVA to a section. + + Prefer raw-backed mapping when raw fields are available (so truncated or + zero-length virtual regions can still be associated with a section), but + fall back to virtual-address mapping when only VA/VS are present. + """ + for sec in sections: + va = sec.get("virtual_address") + vs = sec.get("virtual_size") + raw = sec.get("raw_address") + raw_size = sec.get("raw_size") + + # 1) If we have raw info, use raw-backed mapping + if isinstance(va, int) and isinstance(raw, int) and isinstance(raw_size, int): + delta = rva - va + if 0 <= delta < raw_size: + return sec + # continue to next section; no VA fallback here to avoid ambiguity + continue + + # 2) Fallback: pure VA/VS mapping when no raw info + if isinstance(va, int) and isinstance(vs, int): + if va <= rva < va + vs: + return sec + + return None + + +def _map_rva_to_file_offset(sections: List[Dict[str, Any]], rva: int) -> Optional[int]: + """ + Map an RVA to a file offset using section table. + Returns None if the RVA does not fall into any section or + if required fields are missing. + """ + for sec in sections: + va = sec.get("virtual_address") + vs = sec.get("virtual_size") + raw = sec.get("raw_address") + raw_size = sec.get("raw_size") + + if not (isinstance(va, int) and isinstance(vs, int) and isinstance(raw, int) and isinstance(raw_size, int)): + continue + + if va <= rva < va + vs: + # Map RVA into the section's raw range + delta = rva - va + if 0 <= delta < raw_size: + return raw + delta + + return None + + +@depends_on("metadata", "analysis") +def validate_entrypoint(metadata: PublicMetadata, analysis: AnalysisDict) -> List[StructuralIssue]: + issues: List[StructuralIssue] = [] + + # --- Extract entrypoint from extended header --- + header_ext = [ + e for e in analysis.get("extended", []) + if isinstance(e, dict) and e.get("value") == "header" + ] + if not header_ext: + return issues + + header_meta = header_ext[0].get("metadata") or {} + ep = header_meta.get("entry_point") + if not isinstance(ep, int): + return issues + + # --- Optional header context (for headers / image bounds) --- + opt = metadata.get("optional_header") or {} + size_of_headers = opt.get("size_of_headers") + size_of_image = opt.get("size_of_image") + + # EP obviously bogus (zero or negative) + if ep <= 0: + issues.append(StructuralIssue( + issue=ReasonCodes.ENTRYPOINT_ZERO_OR_NEGATIVE, + details={"entry_point": ep}, + )) + + # EP inside headers (if we know SizeOfHeaders) + if isinstance(size_of_headers, int) and size_of_headers > 0 and ep < size_of_headers: + issues.append(StructuralIssue( + issue=ReasonCodes.ENTRYPOINT_IN_HEADERS, + details={"entry_point": ep, "size_of_headers": size_of_headers}, + )) + + sections = analysis.get("sections", []) or [] + if not sections: + return issues + + # --- A. EP must map to a valid section --- + sec = _map_rva_to_section(sections, ep) + if sec is None: + # If we know SizeOfImage, make it explicit that EP is within or beyond it + details: Dict[str, Any] = {"entry_point": ep} + if isinstance(size_of_image, int) and size_of_image > 0: + details["size_of_image"] = size_of_image + if ep >= size_of_image: + details["position"] = "beyond_size_of_image" + else: + details["position"] = "within_size_of_image_but_no_section" + issues.append(StructuralIssue( + issue=ReasonCodes.ENTRYPOINT_OUT_OF_BOUNDS, + details=details, + )) + return issues # cannot continue without a section + + name = (sec.get("name") or "").strip() + chars = sec.get("characteristics", 0) + + executable = bool(isinstance(chars, int) and (chars & IMAGE_SCN_MEM_EXECUTE)) + has_code = bool(isinstance(chars, int) and (chars & IMAGE_SCN_CNT_CODE)) + discardable = bool(isinstance(chars, int) and (chars & IMAGE_SCN_MEM_DISCARDABLE)) + + # --- B. Section must be executable --- + if not executable: + issues.append(StructuralIssue( + issue=ReasonCodes.ENTRYPOINT_SECTION_NOT_EXECUTABLE, + details={"entry_point": ep, "section": name, "characteristics": chars}, + )) + + # EP in non-code / non-standard section types (resources, relocations, etc.) + lower_name = name.lower() + if lower_name in {".rsrc", ".reloc"} or (not has_code and not executable): + issues.append(StructuralIssue( + issue=ReasonCodes.ENTRYPOINT_IN_NON_CODE_SECTION, + details={"entry_point": ep, "section": name, "characteristics": chars}, + )) + + # EP in discardable section + if discardable: + issues.append(StructuralIssue( + issue=ReasonCodes.ENTRYPOINT_IN_DISCARDABLE_SECTION, + details={"entry_point": ep, "section": name, "characteristics": chars}, + )) + + # --- C. EP must not fall into truncated or zero-length regions --- + va = sec.get("virtual_address") + vs = sec.get("virtual_size") + + if isinstance(vs, int) and vs == 0: + issues.append(StructuralIssue( + issue=ReasonCodes.ENTRYPOINT_IN_TRUNCATED_REGION, + details={"entry_point": ep, "section": name, "reason": "zero_length_section"}, + )) + elif isinstance(va, int) and isinstance(vs, int) and ep >= va + vs: + # Only emit the "beyond_virtual_size" variant if we didn't already flag zero-length + issues.append(StructuralIssue( + issue=ReasonCodes.ENTRYPOINT_IN_TRUNCATED_REGION, + details={"entry_point": ep, "section": name, "reason": "beyond_virtual_size"}, + )) + + # --- D. EP must not point into overlays (RVA → file offset) --- + overlay_offset = analysis.get("overlay_offset") + if isinstance(overlay_offset, int): + ep_file_offset = _map_rva_to_file_offset(sections, ep) + if isinstance(ep_file_offset, int) and ep_file_offset >= overlay_offset: + issues.append(StructuralIssue( + issue=ReasonCodes.ENTRYPOINT_IN_OVERLAY, + details={ + "entry_point": ep, + "entry_point_file_offset": ep_file_offset, + "overlay_offset": overlay_offset, + }, + )) + + return issues diff --git a/iocx/validators/optional_header.py b/iocx/validators/optional_header.py new file mode 100644 index 0000000..f70ad15 --- /dev/null +++ b/iocx/validators/optional_header.py @@ -0,0 +1,177 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import Dict, Any, List +from iocx.reason_codes import ReasonCodes +from iocx.validators.schema import StructuralIssue +from iocx.schemas.public_metadata import PublicMetadata +from iocx.schemas.analysis import AnalysisDict +from .decorators import depends_on + + +def _is_power_of_two(x: int) -> bool: + return x > 0 and (x & (x - 1)) == 0 + + +@depends_on("metadata", "analysis") +def validate_optional_header(metadata: PublicMetadata, analysis: AnalysisDict) -> List[StructuralIssue]: + issues: List[StructuralIssue] = [] + + opt = metadata.get("optional_header") or {} + sections = analysis.get("sections", []) or [] + + # Extract fields + size_of_image = opt.get("size_of_image") + size_of_headers = opt.get("size_of_headers") + section_alignment = opt.get("section_alignment") + file_alignment = opt.get("file_alignment") + size_of_code = opt.get("size_of_code") + size_of_init = opt.get("size_of_initialized_data") + size_of_uninit = opt.get("size_of_uninitialized_data") + image_base = opt.get("image_base") + num_dirs = opt.get("number_of_rva_and_sizes") + + # --------------------------------------------------------- + # 1) SizeOfImage vs max section end + # --------------------------------------------------------- + if isinstance(size_of_image, int) and size_of_image > 0: + max_end = 0 + for sec in sections: + va = sec.get("virtual_address") + vs = sec.get("virtual_size") + if isinstance(va, int) and isinstance(vs, int): + max_end = max(max_end, va + vs) + + if max_end > size_of_image: + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_INCONSISTENT_SIZE, + details={"size_of_image": size_of_image, "max_section_end": max_end}, + )) + + # --------------------------------------------------------- + # 2) SizeOfHeaders checks + # --------------------------------------------------------- + if isinstance(size_of_headers, int) and isinstance(file_alignment, int) and file_alignment > 0: + # Must be aligned to FileAlignment + if size_of_headers % file_alignment != 0: + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_INVALID_SIZE_OF_HEADERS, + details={"size_of_headers": size_of_headers, "file_alignment": file_alignment}, + )) + + # Must be >= end of headers + section table + # Compute minimal header size: DOS + PE + COFF + optional + section table + header_end = metadata.get("header_end") # Provided by parser if available + if isinstance(header_end, int) and size_of_headers < header_end: + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_INVALID_SIZE_OF_HEADERS, + details={"size_of_headers": size_of_headers, "required_minimum": header_end}, + )) + + # --------------------------------------------------------- + # 3) SectionAlignment checks + # --------------------------------------------------------- + if isinstance(section_alignment, int) and isinstance(file_alignment, int): + if section_alignment < file_alignment: + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_INVALID_SECTION_ALIGNMENT, + details={"section_alignment": section_alignment, "file_alignment": file_alignment}, + )) + + if not _is_power_of_two(section_alignment): + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_INVALID_SECTION_ALIGNMENT, + details={"section_alignment": section_alignment, "reason": "not_power_of_two"}, + )) + + # --------------------------------------------------------- + # 4) FileAlignment checks + # --------------------------------------------------------- + if isinstance(file_alignment, int): + if not _is_power_of_two(file_alignment): + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_INVALID_FILE_ALIGNMENT, + details={"file_alignment": file_alignment, "reason": "not_power_of_two"}, + )) + + # Microsoft recommends 512–64K + if file_alignment < 512 or file_alignment > 65536: + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_INVALID_FILE_ALIGNMENT, + details={"file_alignment": file_alignment, "reason": "out_of_range"}, + )) + + # --------------------------------------------------------- + # 5) SizeOfCode / SizeOfInitializedData / SizeOfUninitializedData consistency + # --------------------------------------------------------- + if isinstance(size_of_code, int) and isinstance(size_of_init, int) and isinstance(size_of_uninit, int): + total_code = 0 + total_init = 0 + total_uninit = 0 + + for sec in sections: + chars = sec.get("characteristics", 0) + raw = sec.get("raw_size") or 0 + vs = sec.get("virtual_size") or 0 + + if chars & 0x20: # CNT_CODE + total_code += raw + + if chars & 0x40: # CNT_INITIALIZED_DATA + total_init += raw + + if chars & 0x80: # CNT_UNINITIALIZED_DATA + total_uninit += vs + + if size_of_code < total_code or size_of_init < total_init or size_of_uninit < total_uninit: + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_SIZE_FIELDS_INCONSISTENT, + details={ + "size_of_code": size_of_code, + "computed_code": total_code, + "size_of_initialized_data": size_of_init, + "computed_initialized": total_init, + "size_of_uninitialized_data": size_of_uninit, + "computed_uninitialized": total_uninit, + }, + )) + + # --------------------------------------------------------- + # 6) ImageBase alignment (must be 64K aligned) + # --------------------------------------------------------- + if isinstance(image_base, int): + if image_base % 0x10000 != 0: + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_IMAGE_BASE_MISALIGNED, + details={"image_base": image_base}, + )) + + # --------------------------------------------------------- + # 7) NumberOfRvaAndSizes checks + # --------------------------------------------------------- + if isinstance(num_dirs, int): + if num_dirs < 0 or num_dirs > 16: + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_INVALID_NUMBER_OF_RVA_AND_SIZES, + details={"number_of_rva_and_sizes": num_dirs}, + )) + + # Ensure it covers all directories actually present + dirs = opt.get("data_directories") or [] + if len(dirs) > num_dirs: + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_INVALID_NUMBER_OF_RVA_AND_SIZES, + details={"number_of_rva_and_sizes": num_dirs, "actual_directories": len(dirs)}, + )) + + # --------------------------------------------------------- + # 8) SizeOfImage alignment + # --------------------------------------------------------- + if isinstance(size_of_image, int) and isinstance(section_alignment, int) and section_alignment > 0: + if size_of_image % section_alignment != 0: + issues.append(StructuralIssue( + issue=ReasonCodes.OPTIONAL_HEADER_SIZE_OF_IMAGE_MISALIGNED, + details={"size_of_image": size_of_image, "section_alignment": section_alignment}, + )) + + return issues diff --git a/iocx/validators/resources.py b/iocx/validators/resources.py new file mode 100644 index 0000000..55d2678 --- /dev/null +++ b/iocx/validators/resources.py @@ -0,0 +1,176 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import Dict, Any, List, Set +from iocx.reason_codes import ReasonCodes +from iocx.validators.schema import StructuralIssue +from iocx.schemas.internal_schema import InternalMetadata +from iocx.schemas.analysis import AnalysisDict +from .decorators import depends_on + +@depends_on("internal", "analysis") +def validate_resources(metadata: InternalMetadata, analysis: AnalysisDict) -> List[StructuralIssue]: + issues: List[StructuralIssue] = [] + + resources = metadata.get("resources_struct") + if not resources: + return issues # No resource directory → no issues + + sections = analysis["sections"] + file_size = analysis["file_size"] + overlay_offset = analysis["overlay_offset"] + + # --------------------------------------------------------- + # Locate .rsrc section + # --------------------------------------------------------- + rsrc_section = next((sec for sec in sections if sec["name"].lower() == ".rsrc"), None) + if rsrc_section is None: + return issues # No resource section → nothing to validate + + rsrc_va = rsrc_section["virtual_address"] + rsrc_vs = rsrc_section["virtual_size"] + rsrc_raw = rsrc_section["raw_address"] + rsrc_raw_size = rsrc_section["raw_size"] + + def rva_in_rsrc(rva: int, size: int = 0) -> bool: + return rsrc_va <= rva and (rva + size) <= (rsrc_va + rsrc_vs) + + def va_overlaps_section(start: int, size: int, sec: Dict[str, Any]) -> bool: + end = start + size + sec_start = sec["virtual_address"] + sec_end = sec_start + sec["virtual_size"] + return max(start, sec_start) < min(end, sec_end) + + def raw_overlaps_section(raw_start: int, size: int, sec: Dict[str, Any]) -> bool: + end = raw_start + size + sec_start = sec["raw_address"] + sec_end = sec_start + sec["raw_size"] + return max(raw_start, sec_start) < min(end, sec_end) + + visited_dirs: Set[int] = set() + + # --------------------------------------------------------- + # Recursive directory validation + # --------------------------------------------------------- + def validate_directory(dir_node: Dict[str, Any]) -> None: + rva = dir_node["rva"] + size = dir_node["size"] + + # Skip if the directory is not inside .rsrc + if not rva_in_rsrc(rva, size): + return + + entries = dir_node["entries"] + + # Zero-length directory + if size == 0: + issues.append(StructuralIssue( + issue=ReasonCodes.RESOURCE_DIRECTORY_ZERO_LENGTH, + details={"rva": rva}, + )) + return + + # Loop detection + if rva in visited_dirs: + issues.append(StructuralIssue( + issue=ReasonCodes.RESOURCE_DIRECTORY_LOOP, + details={"rva": rva}, + )) + return + visited_dirs.add(rva) + + # Entries + for entry in entries: + if entry["is_directory"]: + target = entry["directory"] + target_rva = target["rva"] + + if not rva_in_rsrc(target_rva): + issues.append(StructuralIssue( + issue=ReasonCodes.RESOURCE_ENTRY_OUT_OF_BOUNDS, + details={"directory_rva": rva, "target_rva": target_rva}, + )) + continue + + validate_directory(target) + continue + + # ------------------------------ + # Data entry + # ------------------------------ + data_rva = entry["data_rva"] + data_size = entry["data_size"] + data_raw = entry["raw_offset"] + + # Zero-size data + if data_size == 0: + issues.append(StructuralIssue( + issue=ReasonCodes.RESOURCE_DATA_OUT_OF_BOUNDS, + details={"data_rva": data_rva, "data_size": data_size}, + )) + continue + + # RVA bounds + if not rva_in_rsrc(data_rva, data_size): + issues.append(StructuralIssue( + issue=ReasonCodes.RESOURCE_DATA_OUT_OF_BOUNDS, + details={"data_rva": data_rva, "data_size": data_size}, + )) + continue + + # Raw bounds + if data_raw < 0 or data_raw + data_size > file_size: + issues.append(StructuralIssue( + issue=ReasonCodes.RESOURCE_DATA_OUT_OF_BOUNDS, + details={"data_raw": data_raw, "data_size": data_size, "file_size": file_size}, + )) + continue + + # Overlay overlap (inclusive check) + if data_raw <= overlay_offset < data_raw + data_size: + issues.append(StructuralIssue( + issue=ReasonCodes.RESOURCE_DATA_OVERLAPS_OTHER_DATA, + details={"data_raw": data_raw, "data_size": data_size, "overlay_offset": overlay_offset}, + )) + + # Raw overlap with other sections + for sec in sections: + if sec is rsrc_section: + continue + if raw_overlaps_section(data_raw, data_size, sec): + issues.append(StructuralIssue( + issue=ReasonCodes.RESOURCE_DATA_OVERLAPS_OTHER_DATA, + details={"data_raw": data_raw, "data_size": data_size, "section": sec["name"]}, + )) + break + + # VA overlap with other sections + for sec in sections: + if sec is rsrc_section: + continue + if va_overlaps_section(data_rva, data_size, sec): + issues.append(StructuralIssue( + issue=ReasonCodes.RESOURCE_DATA_OVERLAPS_OTHER_DATA, + details={"data_rva": data_rva, "data_size": data_size, "section": sec["name"]}, + )) + break + + # --------------------------------------------------------- + # Validate root directory + # --------------------------------------------------------- + validate_directory(resources["root"]) + + # --------------------------------------------------------- + # String table validation + # --------------------------------------------------------- + for st in resources.get("string_tables", []): + rva = st["rva"] + size = st["size"] + if not rva_in_rsrc(rva, size): + issues.append(StructuralIssue( + issue=ReasonCodes.RESOURCE_STRING_TABLE_CORRUPT, + details={"rva": rva, "size": size}, + )) + break + + return issues diff --git a/iocx/validators/rva_graph.py b/iocx/validators/rva_graph.py new file mode 100644 index 0000000..239c7ab --- /dev/null +++ b/iocx/validators/rva_graph.py @@ -0,0 +1,170 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import Dict, Any, List + +from iocx.reason_codes import ReasonCodes +from iocx.validators.schema import StructuralIssue +from iocx.schemas.public_metadata import PublicMetadata +from iocx.schemas.analysis import AnalysisDict +from .decorators import depends_on + + +# No directories are strictly required to be non-zero. +REQUIRED_NONZERO_DIRS: set[str] = set() + + +@depends_on("metadata", "analysis") +def validate_rva_graph(metadata: PublicMetadata, analysis: AnalysisDict) -> List[StructuralIssue]: + issues: List[StructuralIssue] = [] + + dirs = analysis.get("data_directories") or metadata.get("data_directories") or [] + opt = metadata.get("optional_header") or {} + sections = analysis.get("sections", []) or [] + + size_of_image = opt.get("size_of_image") + size_of_headers = opt.get("size_of_headers") + overlay_offset = analysis.get("overlay_offset") + + if not isinstance(size_of_image, int): + return issues + + # Build section ranges + section_ranges = [] + zero_length_sections = set() + for sec in sections: + va = sec.get("virtual_address") + vs = sec.get("virtual_size") + name = sec.get("name") + if isinstance(va, int) and isinstance(vs, int): + section_ranges.append((va, va + vs, name)) + if vs == 0: + zero_length_sections.add(name) + + # --------------------------------------------------------- + # Directory validation + # --------------------------------------------------------- + for d in dirs: + rva = d.get("rva") + size = d.get("size") + name = d.get("name") or d.get("index") + + if not isinstance(rva, int) or not isinstance(size, int): + continue + + # 1) Negative values + if rva < 0 or size < 0: + issues.append(StructuralIssue( + issue=ReasonCodes.DATA_DIRECTORY_INVALID_RANGE, + details={"directory": name, "rva": rva, "size": size}, + )) + continue + + # 2) Empty directory (rva=0, size=0) + if rva == 0 and size == 0: + if name in REQUIRED_NONZERO_DIRS: + issues.append(StructuralIssue( + issue=ReasonCodes.DATA_DIRECTORY_ZERO_SIZE_UNEXPECTED, + details={"directory": name}, + )) + continue + + # 3) Zero-RVA + non-zero size → primary anomaly only + if rva == 0 and size > 0: + issues.append(StructuralIssue( + issue=ReasonCodes.DATA_DIRECTORY_ZERO_RVA_NONZERO_SIZE, + details={"directory": name, "rva": rva, "size": size}, + )) + continue + + # 4) Directory in headers + if isinstance(size_of_headers, int) and rva < size_of_headers: + issues.append(StructuralIssue( + issue=ReasonCodes.DATA_DIRECTORY_IN_HEADERS, + details={"directory": name, "rva": rva, "size_of_headers": size_of_headers}, + )) + + # 5) Out-of-range + out_of_range = False + if rva + size > size_of_image: + out_of_range = True + issues.append(StructuralIssue( + issue=ReasonCodes.DATA_DIRECTORY_OUT_OF_RANGE, + details={"directory": name, "rva": rva, "size": size, "size_of_image": size_of_image}, + )) + + # Skip mapping if out-of-range + if out_of_range: + continue + + # 6) Overlay detection + if isinstance(overlay_offset, int): + raw_offset = None + for va_start, va_end, sec_name in section_ranges: + if va_start <= rva < va_end: + sec = next(s for s in sections if s.get("name") == sec_name) + raw_offset = sec.get("raw_address") + (rva - va_start) + break + + if isinstance(raw_offset, int) and raw_offset >= overlay_offset: + issues.append(StructuralIssue( + issue=ReasonCodes.DATA_DIRECTORY_IN_OVERLAY, + details={"directory": name, "rva": rva, "raw_offset": raw_offset}, + )) + + # 7) Skip mapping if directory lands on a zero-length section + zero_length_hit = False + for va_start, va_end, sec_name in section_ranges: + if va_start == rva and va_start == va_end: + zero_length_hit = True + break + + if zero_length_hit: + continue + + # 8) Section mapping + mapped_sections = [] + for va_start, va_end, sec_name in section_ranges: + if rva < va_end and (rva + size) > va_start: + mapped_sections.append(sec_name) + + if not mapped_sections: + issues.append(StructuralIssue( + issue=ReasonCodes.DATA_DIRECTORY_NOT_MAPPED_TO_SECTION, + details={"directory": name, "rva": rva, "size": size}, + )) + elif len(mapped_sections) > 1: + issues.append(StructuralIssue( + issue=ReasonCodes.DATA_DIRECTORY_SPANS_MULTIPLE_SECTIONS, + details={"directory": name, "sections": mapped_sections}, + )) + + # --------------------------------------------------------- + # Directory overlap detection + # --------------------------------------------------------- + for i in range(len(dirs)): + a = dirs[i] + rva_a = a.get("rva") + size_a = a.get("size") + if not isinstance(rva_a, int) or not isinstance(size_a, int): + continue + end_a = rva_a + size_a + + for j in range(i + 1, len(dirs)): + b = dirs[j] + rva_b = b.get("rva") + size_b = b.get("size") + if not isinstance(rva_b, int) or not isinstance(size_b, int): + continue + end_b = rva_b + size_b + + if max(rva_a, rva_b) < min(end_a, end_b): + issues.append(StructuralIssue( + issue=ReasonCodes.DATA_DIRECTORY_OVERLAP, + details={ + "directory_a": a.get("name") or a.get("index"), + "directory_b": b.get("name") or b.get("index"), + }, + )) + + return issues diff --git a/iocx/validators/schema.py b/iocx/validators/schema.py new file mode 100644 index 0000000..8de1df8 --- /dev/null +++ b/iocx/validators/schema.py @@ -0,0 +1,29 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import TypedDict, Dict, List, Any + + +class StructuralIssue(TypedDict, total=False): + """ + A single structural anomaly detected by a validator. + """ + issue: str # canonical reason code (from ReasonCodes) + details: Dict[str, Any] # structured metadata describing the anomaly + + +class StructuralAnalysis(TypedDict): + """ + The complete structural validation output attached to analysis["structural"]. + + Each key corresponds to a validator category and contains a list of + StructuralIssue objects. Validators must populate these lists deterministically. + """ + entrypoint: List[StructuralIssue] + sections: List[StructuralIssue] + optional_header: List[StructuralIssue] + data_directories: List[StructuralIssue] + tls: List[StructuralIssue] + signature: List[StructuralIssue] + imports: List[StructuralIssue] + entropy: List[StructuralIssue] diff --git a/iocx/validators/sections.py b/iocx/validators/sections.py new file mode 100644 index 0000000..6a4a27c --- /dev/null +++ b/iocx/validators/sections.py @@ -0,0 +1,239 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import Dict, Any, List +from iocx.reason_codes import ReasonCodes +from iocx.validators.schema import StructuralIssue +from iocx.schemas.public_metadata import PublicMetadata +from iocx.schemas.analysis import AnalysisDict +from .decorators import depends_on + +# PE section characteristics flags (subset) +IMAGE_SCN_CNT_CODE = 0x00000020 +IMAGE_SCN_MEM_EXECUTE = 0x20000000 +IMAGE_SCN_MEM_WRITE = 0x80000000 +IMAGE_SCN_MEM_DISCARDABLE = 0x02000000 +IMAGE_SCN_MEM_READ = 0x40000000 # needed for contradictory flag checks + +CODE_LIKE_NAMES = {".text", "text", "code"} + + +def _is_ascii_printable(name: str) -> bool: + try: + return all(32 <= ord(ch) < 127 for ch in name) + except TypeError: + return False + + +def _is_padding_name(name: str) -> bool: + stripped = name.strip("\x00").strip() + return stripped == "" + + +@depends_on("metadata", "analysis") +def validate_sections(metadata: PublicMetadata, analysis: AnalysisDict) -> List[StructuralIssue]: + issues: List[StructuralIssue] = [] + sections: List[Dict[str, Any]] = analysis.get("sections", []) or [] + + opt = metadata.get("optional_header") or {} + file_alignment = opt.get("file_alignment") + size_of_headers = opt.get("size_of_headers") + + # --------------------------------------------------------- + # Per‑section checks + # --------------------------------------------------------- + for sec in sections: + name = (sec.get("name") or "").strip() + chars = sec.get("characteristics") + + if not isinstance(chars, int): + continue + + executable = bool(chars & IMAGE_SCN_MEM_EXECUTE) + writable = bool(chars & IMAGE_SCN_MEM_WRITE) + readable = bool(chars & IMAGE_SCN_MEM_READ) + has_code = bool(chars & IMAGE_SCN_CNT_CODE) + discardable = bool(chars & IMAGE_SCN_MEM_DISCARDABLE) + + raw_addr = sec.get("raw_address") + raw_size = sec.get("raw_size") + va = sec.get("virtual_address") + vs = sec.get("virtual_size") + + # 1) RWX sections + if executable and writable: + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_RWX, + details={"section": name, "characteristics": chars}, + )) + + # 2) Code flag but not executable + if has_code and not executable: + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_NON_EXECUTABLE_CODE_LIKE, + details={"section": name, "characteristics": chars}, + )) + + # 3) Code-like name but not executable + if name.lower() in CODE_LIKE_NAMES and not executable: + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_CODELIKE_NAME_NOT_EXECUTABLE, + details={"section": name, "characteristics": chars}, + )) + + # 4) Non-ASCII or deceptive names + if not _is_ascii_printable(name): + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_NAME_NON_ASCII, + details={"section": name}, + )) + elif _is_padding_name(name): + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_NAME_EMPTY_OR_PADDING, + details={"section": name}, + )) + + # 5) Impossible flag combinations + if discardable and executable and writable: + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_IMPOSSIBLE_FLAGS, + details={"section": name, "characteristics": chars}, + )) + + # 6) Raw alignment check + if ( + isinstance(file_alignment, int) + and isinstance(raw_addr, int) + and isinstance(raw_size, int) + and file_alignment > 0 + ): + if raw_addr % file_alignment != 0: + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_RAW_MISALIGNED, + details={ + "section": name, + "raw_address": raw_addr, + "raw_size": raw_size, + "file_alignment": file_alignment, + }, + )) + + # 7) Section overlaps headers + if ( + isinstance(size_of_headers, int) + and isinstance(raw_addr, int) + and raw_addr < size_of_headers + ): + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_OVERLAPS_HEADERS, + details={"section": name, "raw_address": raw_addr, "size_of_headers": size_of_headers}, + )) + + # 8) Zero-length section + if ( + isinstance(vs, int) + and isinstance(raw_size, int) + and vs == 0 + and raw_size == 0 + ): + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_ZERO_LENGTH, + details={"section": name}, + )) + + # 9) Discardable + executable (even without writable) + if discardable and executable: + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_DISCARDABLE_CODE, + details={"section": name, "characteristics": chars}, + )) + + # 10) Contradictory flags + if has_code and not readable: + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_FLAGS_INCONSISTENT, + details={"section": name, "reason": "code_without_read"}, + )) + if writable and not readable: + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_FLAGS_INCONSISTENT, + details={"section": name, "reason": "write_without_read"}, + )) + if executable and not readable: + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_FLAGS_INCONSISTENT, + details={"section": name, "reason": "exec_without_read"}, + )) + + # --------------------------------------------------------- + # Raw overlap detection + # --------------------------------------------------------- + for i in range(len(sections)): + a = sections[i] + raw_a = a.get("raw_address") + size_a = a.get("raw_size") + if not isinstance(raw_a, int) or not isinstance(size_a, int): + continue + end_a = raw_a + size_a + + for j in range(i + 1, len(sections)): + b = sections[j] + raw_b = b.get("raw_address") + size_b = b.get("raw_size") + if not isinstance(raw_b, int) or not isinstance(size_b, int): + continue + end_b = raw_b + size_b + + if max(raw_a, raw_b) < min(end_a, end_b): + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_RAW_OVERLAP, + details={"section_a": a.get("name"), "section_b": b.get("name")}, + )) + + # --------------------------------------------------------- + # Virtual overlap detection + # --------------------------------------------------------- + for i in range(len(sections)): + a = sections[i] + va_a = a.get("virtual_address") + vs_a = a.get("virtual_size") + if not isinstance(va_a, int) or not isinstance(vs_a, int): + continue + end_a = va_a + vs_a + + for j in range(i + 1, len(sections)): + b = sections[j] + va_b = b.get("virtual_address") + vs_b = b.get("virtual_size") + if not isinstance(va_b, int) or not isinstance(vs_b, int): + continue + end_b = va_b + vs_b + + if max(va_a, va_b) < min(end_a, end_b): + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_OVERLAP, + details={"section_a": a.get("name"), "section_b": b.get("name")}, + )) + + # --------------------------------------------------------- + # Ordering checks + # --------------------------------------------------------- + # Raw order + raw_addrs = [sec.get("raw_address") for sec in sections] + if all(isinstance(x, int) for x in raw_addrs): + if raw_addrs != sorted(raw_addrs): + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_OUT_OF_ORDER_RAW, + details={"raw_addresses": raw_addrs}, + )) + + # Virtual order + vas = [sec.get("virtual_address") for sec in sections] + if all(isinstance(x, int) for x in vas): + if vas != sorted(vas): + issues.append(StructuralIssue( + issue=ReasonCodes.SECTION_OUT_OF_ORDER_VIRTUAL, + details={"virtual_addresses": vas}, + )) + + return issues diff --git a/iocx/validators/signature.py b/iocx/validators/signature.py new file mode 100644 index 0000000..d8e0d82 --- /dev/null +++ b/iocx/validators/signature.py @@ -0,0 +1,117 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import Dict, Any, List +from iocx.reason_codes import ReasonCodes +from iocx.validators.schema import StructuralIssue +from iocx.schemas.public_metadata import PublicMetadata +from iocx.schemas.analysis import AnalysisDict +from .decorators import depends_on + + +@depends_on("metadata", "analysis") +def validate_signature(metadata: PublicMetadata, analysis: AnalysisDict) -> List[StructuralIssue]: + issues: List[StructuralIssue] = [] + + has_sig = bool(metadata.get("has_signature")) + sigs = metadata.get("signatures") or [] + + # --------------------------------------------------------- + # 1) Flag/metadata symmetry + # --------------------------------------------------------- + if has_sig and not sigs: + issues.append(StructuralIssue( + issue=ReasonCodes.SIGNATURE_FLAG_SET_BUT_NO_METADATA, + details={}, + )) + return issues + + if not has_sig and sigs: + issues.append(StructuralIssue( + issue=ReasonCodes.SIGNATURE_PRESENT_BUT_FLAG_NOT_SET, + details={"count": len(sigs)}, + )) + # Continue validating the certificates anyway + + if not sigs: + return issues + + # --------------------------------------------------------- + # 2) Multiplicity + # --------------------------------------------------------- + if len(sigs) > 1: + issues.append(StructuralIssue( + issue=ReasonCodes.SIGNATURE_MULTIPLE_CERTIFICATES, + details={"count": len(sigs)}, + )) + + # --------------------------------------------------------- + # 3) Certificate sanity checks + # --------------------------------------------------------- + file_size = analysis.get("file_size") + sections = analysis.get("sections", []) or [] + overlay_offset = analysis.get("overlay_offset") + + for sig in sigs: + offset = sig.get("file_offset") + size = sig.get("length") + revision = sig.get("revision") + cert_type = sig.get("certificate_type") + + # Skip malformed metadata + if not isinstance(offset, int) or not isinstance(size, int): + continue + + # Length sanity + if size < 8: + issues.append(StructuralIssue( + issue=ReasonCodes.SIGNATURE_INVALID_LENGTH, + details={"length": size}, + )) + continue + + # Revision sanity + if revision not in (0x0100, 0x0200): + issues.append(StructuralIssue( + issue=ReasonCodes.SIGNATURE_INVALID_REVISION, + details={"revision": revision}, + )) + + # Type sanity + if cert_type not in (0x0001, 0x0002): + issues.append(StructuralIssue( + issue=ReasonCodes.SIGNATURE_INVALID_TYPE, + details={"certificate_type": cert_type}, + )) + + # --------------------------------------------------------- + # 4) Bounds checks + # --------------------------------------------------------- + if isinstance(file_size, int): + if offset < 0 or offset + size > file_size: + issues.append(StructuralIssue( + issue=ReasonCodes.SIGNATURE_OUT_OF_FILE_BOUNDS, + details={"offset": offset, "length": size, "file_size": file_size}, + )) + continue + + # Overlay check + if isinstance(overlay_offset, int) and offset < overlay_offset < offset + size: + issues.append(StructuralIssue( + issue=ReasonCodes.SIGNATURE_OVERLAPS_OTHER_DATA, + details={"offset": offset, "length": size, "overlay_offset": overlay_offset}, + )) + + # Section overlap check + for sec in sections: + raw = sec.get("raw_address") + raw_size = sec.get("raw_size") + if isinstance(raw, int) and isinstance(raw_size, int): + if max(offset, raw) < min(offset + size, raw + raw_size): + issues.append(StructuralIssue( + issue=ReasonCodes.SIGNATURE_OVERLAPS_OTHER_DATA, + details={"offset": offset, "length": size, "section": sec.get("name")}, + )) + break + + return issues diff --git a/iocx/validators/tls.py b/iocx/validators/tls.py new file mode 100644 index 0000000..7b9c0e1 --- /dev/null +++ b/iocx/validators/tls.py @@ -0,0 +1,137 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from typing import Dict, Any, List, Optional +from iocx.reason_codes import ReasonCodes +from iocx.validators.schema import StructuralIssue +from iocx.schemas.public_metadata import PublicMetadata +from iocx.schemas.analysis import AnalysisDict +from .decorators import depends_on + + +def _map_rva_to_section(sections, rva) -> Optional[Dict[str, Any]]: + for sec in sections: + va = sec.get("virtual_address") + vs = sec.get("virtual_size") + if isinstance(va, int) and isinstance(vs, int): + if va <= rva < va + vs: + return sec + return None + + +@depends_on("metadata", "analysis") +def validate_tls(metadata: PublicMetadata, analysis: AnalysisDict) -> List[StructuralIssue]: + issues: List[StructuralIssue] = [] + + tls_entries = [ + e for e in analysis.get("extended", []) + if isinstance(e, dict) and e.get("value") == "tls_directory" + ] + + # --------------------------------------------------------- + # 1) Multiple TLS directories + # --------------------------------------------------------- + if len(tls_entries) > 1: + issues.append(StructuralIssue( + issue=ReasonCodes.TLS_MULTIPLE_DIRECTORIES, + details={"count": len(tls_entries)}, + )) + + if not tls_entries: + return issues + + # Only validate the first directory structurally + entry = tls_entries[0] + meta = entry.get("metadata") or {} + + start = meta.get("start_address") + end = meta.get("end_address") + callbacks = meta.get("callbacks") + + if not isinstance(start, int) or not isinstance(end, int) or not isinstance(callbacks, int): + return issues + + sections = analysis.get("sections", []) or [] + overlay_offset = analysis.get("overlay_offset") + size_of_headers = metadata.get("optional_header", {}).get("size_of_headers") + + # --------------------------------------------------------- + # 2) Range sanity + # --------------------------------------------------------- + if start == end: + issues.append(StructuralIssue( + issue=ReasonCodes.TLS_ZERO_LENGTH_DIRECTORY, + details={"start_address": start, "end_address": end}, + )) + return issues + + if start > end: + issues.append(StructuralIssue( + issue=ReasonCodes.TLS_INVALID_RANGE, + details={"start_address": start, "end_address": end}, + )) + return issues + + # --------------------------------------------------------- + # 3) Missing callbacks + # --------------------------------------------------------- + if callbacks == 0: + issues.append(StructuralIssue( + issue=ReasonCodes.TLS_CALLBACKS_MISSING, + details={"start_address": start, "end_address": end}, + )) + return issues + + # --------------------------------------------------------- + # 4) Callback outside TLS range + # --------------------------------------------------------- + if not (start <= callbacks < end): + issues.append(StructuralIssue( + issue=ReasonCodes.TLS_CALLBACK_OUTSIDE_RANGE, + details={"callbacks": callbacks, "start_address": start, "end_address": end}, + )) + # Do not attempt further mapping - avoid cascading anomalies + return issues + + # --------------------------------------------------------- + # 5) Callback mapping + # --------------------------------------------------------- + sec = _map_rva_to_section(sections, callbacks) + if sec is None: + issues.append(StructuralIssue( + issue=ReasonCodes.TLS_CALLBACK_NOT_MAPPED_TO_SECTION, + details={"callbacks": callbacks}, + )) + return issues + + name = sec.get("name") + chars = sec.get("characteristics", 0) + executable = bool(chars & 0x20000000) + + if not executable: + issues.append(StructuralIssue( + issue=ReasonCodes.TLS_CALLBACK_IN_NON_EXECUTABLE_SECTION, + details={"callbacks": callbacks, "section": name}, + )) + + # --------------------------------------------------------- + # 6) Overlay / header checks + # --------------------------------------------------------- + if isinstance(size_of_headers, int) and callbacks < size_of_headers: + issues.append(StructuralIssue( + issue=ReasonCodes.TLS_CALLBACK_IN_HEADERS, + details={"callbacks": callbacks, "size_of_headers": size_of_headers}, + )) + + if isinstance(overlay_offset, int): + raw = sec.get("raw_address") + va = sec.get("virtual_address") + if isinstance(raw, int) and isinstance(va, int): + raw_offset = raw + (callbacks - va) + if raw_offset >= overlay_offset: + issues.append(StructuralIssue( + issue=ReasonCodes.TLS_CALLBACK_IN_OVERLAY, + details={"callbacks": callbacks, "raw_offset": raw_offset}, + )) + + return issues diff --git a/pyproject.toml b/pyproject.toml index 8a7937b..65044af 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,13 +1,14 @@ [project] name = "iocx" -version = "0.7.2" +version = "0.7.3" description = "Static IOC extraction engine for binaries, text, and logs." authors = [ { name = "MalX Labs" } ] readme = { file = "README-pypi.md", content-type = "text/markdown" } requires-python = ">=3.9" -license = { text = "MIT" } + +license = { text = "MPL-2.0" } keywords = [ "ioc", @@ -23,7 +24,9 @@ classifiers = [ "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", - "License :: OSI Approved :: MIT License", + + "License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)", + "Operating System :: OS Independent", "Topic :: Security", "Topic :: Software Development :: Libraries", diff --git a/tests/conftest.py b/tests/conftest.py index 8f1a0d6..7890b9b 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from dataclasses import dataclass diff --git a/tests/contract/snapshots/layer3_adversarial/broken_rva_addresses.full.json b/tests/contract/snapshots/layer3_adversarial/broken_rva_addresses.full.json index 5036077..9e93e75 100644 --- a/tests/contract/snapshots/layer3_adversarial/broken_rva_addresses.full.json +++ b/tests/contract/snapshots/layer3_adversarial/broken_rva_addresses.full.json @@ -132,11 +132,32 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "data_directory_out_of_range", - "directory": "IMAGE_DIRECTORY_ENTRY_IMPORT", - "rva": 36864, - "size": 512, - "size_of_image": 16384 + "reason": "entrypoint_in_overlay", + "entry_point": 4096, + "entry_point_file_offset": 512, + "overlay_offset": 392 + } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "section_overlaps_headers", + "section": ".zero", + "raw_address": 0, + "size_of_headers": 512 + } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "section_zero_length", + "section": ".zero" } }, { @@ -145,9 +166,24 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "import_rva_invalid", + "reason": "section_out_of_order_raw", + "raw_addresses": [ + 512, + 0 + ] + } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "data_directory_out_of_range", + "directory": "IMAGE_DIRECTORY_ENTRY_IMPORT", "rva": 36864, - "size": 512 + "size": 512, + "size_of_image": 16384 } } ] diff --git a/tests/contract/snapshots/layer3_adversarial/corrupted_data_directories.full.json b/tests/contract/snapshots/layer3_adversarial/corrupted_data_directories.full.json index e609131..3ee05c1 100644 --- a/tests/contract/snapshots/layer3_adversarial/corrupted_data_directories.full.json +++ b/tests/contract/snapshots/layer3_adversarial/corrupted_data_directories.full.json @@ -118,6 +118,18 @@ } ], "heuristics": [ + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "entrypoint_in_overlay", + "entry_point": 4096, + "entry_point_file_offset": 512, + "overlay_offset": 392 + } + }, { "value": "pe_structure_anomaly", "start": 0, @@ -167,17 +179,6 @@ "directory_a": "IMAGE_DIRECTORY_ENTRY_RESOURCE", "directory_b": "IMAGE_DIRECTORY_ENTRY_EXCEPTION" } - }, - { - "value": "pe_structure_anomaly", - "start": 0, - "end": 0, - "category": "pe_heuristic", - "metadata": { - "reason": "import_rva_invalid", - "rva": 0, - "size": 0 - } } ] } diff --git a/tests/contract/snapshots/layer3_adversarial/franken_malformed_pe.full.json b/tests/contract/snapshots/layer3_adversarial/franken_malformed_pe.full.json index be7057f..a3e6522 100644 --- a/tests/contract/snapshots/layer3_adversarial/franken_malformed_pe.full.json +++ b/tests/contract/snapshots/layer3_adversarial/franken_malformed_pe.full.json @@ -167,9 +167,10 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "section_overlap", - "section_a": ".text", - "section_b": ".rdata" + "reason": "entrypoint_out_of_bounds", + "entry_point": 12288, + "size_of_image": 8192, + "position": "beyond_size_of_image" } }, { @@ -204,9 +205,9 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "optional_header_inconsistent_size", - "size_of_image": 8192, - "max_section_end": 11776 + "reason": "section_raw_overlap", + "section_a": ".text", + "section_b": ".rdata" } }, { @@ -215,8 +216,9 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "entrypoint_out_of_bounds", - "entry_point": 12288 + "reason": "section_raw_overlap", + "section_a": ".data", + "section_b": ".rsrc" } }, { @@ -225,34 +227,47 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "data_directory_out_of_range", - "directory": "IMAGE_DIRECTORY_ENTRY_IMPORT", - "rva": 20480, - "size": 512, - "size_of_image": 8192 + "reason": "section_overlap", + "section_a": ".text", + "section_b": ".rdata" } }, + { "value": "pe_structure_anomaly", "start": 0, "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "data_directory_zero_rva_nonzero_size", - "directory": "IMAGE_DIRECTORY_ENTRY_RESOURCE", - "rva": 0, - "size": 256 + "reason": "optional_header_inconsistent_size", + "size_of_image": 8192, + "max_section_end": 11776 } }, + { "value": "pe_structure_anomaly", "start": 0, "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "import_rva_invalid", + "reason": "data_directory_out_of_range", + "directory": "IMAGE_DIRECTORY_ENTRY_IMPORT", "rva": 20480, - "size": 512 + "size": 512, + "size_of_image": 8192 + } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "data_directory_zero_rva_nonzero_size", + "directory": "IMAGE_DIRECTORY_ENTRY_RESOURCE", + "rva": 0, + "size": 256 } } ] diff --git a/tests/contract/snapshots/layer3_adversarial/franken_malformed_pe.pe32.full.json b/tests/contract/snapshots/layer3_adversarial/franken_malformed_pe.pe32.full.json index addbe8f..40ea7bc 100644 --- a/tests/contract/snapshots/layer3_adversarial/franken_malformed_pe.pe32.full.json +++ b/tests/contract/snapshots/layer3_adversarial/franken_malformed_pe.pe32.full.json @@ -167,9 +167,10 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "section_overlap", - "section_a": ".text", - "section_b": ".rdata" + "reason": "entrypoint_out_of_bounds", + "entry_point": 12288, + "size_of_image": 8192, + "position": "beyond_size_of_image" } }, { @@ -204,9 +205,9 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "optional_header_inconsistent_size", - "size_of_image": 8192, - "max_section_end": 11776 + "reason": "section_raw_overlap", + "section_a": ".text", + "section_b": ".rdata" } }, { @@ -215,8 +216,9 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "entrypoint_out_of_bounds", - "entry_point": 12288 + "reason": "section_raw_overlap", + "section_a": ".data", + "section_b": ".rsrc" } }, { @@ -225,11 +227,9 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "data_directory_out_of_range", - "directory": "IMAGE_DIRECTORY_ENTRY_IMPORT", - "rva": 20480, - "size": 512, - "size_of_image": 8192 + "reason": "section_overlap", + "section_a": ".text", + "section_b": ".rdata" } }, { @@ -238,10 +238,9 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "data_directory_zero_rva_nonzero_size", - "directory": "IMAGE_DIRECTORY_ENTRY_RESOURCE", - "rva": 0, - "size": 256 + "reason": "optional_header_inconsistent_size", + "size_of_image": 8192, + "max_section_end": 11776 } }, { @@ -250,9 +249,23 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "import_rva_invalid", + "reason": "data_directory_out_of_range", + "directory": "IMAGE_DIRECTORY_ENTRY_IMPORT", "rva": 20480, - "size": 512 + "size": 512, + "size_of_image": 8192 + } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "data_directory_zero_rva_nonzero_size", + "directory": "IMAGE_DIRECTORY_ENTRY_RESOURCE", + "rva": 0, + "size": 256 } } ] diff --git a/tests/contract/snapshots/layer3_adversarial/heuristic_rich.full.json b/tests/contract/snapshots/layer3_adversarial/heuristic_rich.full.json index c4cc57f..5096817 100644 --- a/tests/contract/snapshots/layer3_adversarial/heuristic_rich.full.json +++ b/tests/contract/snapshots/layer3_adversarial/heuristic_rich.full.json @@ -667,18 +667,6 @@ "section": "UPX0" } }, - { - "value": "tls_callback_anomaly", - "start": 0, - "end": 0, - "category": "pe_heuristic", - "metadata": { - "reason": "callback_outside_tls_range", - "callbacks": 5368754232, - "start_address": 5368758272, - "end_address": 5368758280 - } - }, { "value": "anti_debug_heuristic", "start": 0, @@ -745,6 +733,50 @@ "function": "QueryPerformanceCounter" } }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "section_overlaps_headers", + "section": ".bss", + "raw_address": 0, + "size_of_headers": 1536 + } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "section_out_of_order_raw", + "raw_addresses": [ + 1536, + 8192, + 8704, + 10752, + 13312, + 17408, + 18432, + 0, + 19456, + 21504, + 22016, + 22528, + 23040, + 24576, + 70144, + 78336, + 86016, + 88576, + 89600, + 95744, + 100864 + ] + } + }, { "value": "pe_structure_anomaly", "start": 0, @@ -755,6 +787,18 @@ "directory_a": "IMAGE_DIRECTORY_ENTRY_IMPORT", "directory_b": "IMAGE_DIRECTORY_ENTRY_IAT" } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "callback_outside_tls_range", + "callbacks": 5368754232, + "start_address": 5368758272, + "end_address": 5368758280 + } } ] } diff --git a/tests/contract/snapshots/layer3_adversarial/invalid_optional_header.full.json b/tests/contract/snapshots/layer3_adversarial/invalid_optional_header.full.json index 6d913ed..7bc7009 100644 --- a/tests/contract/snapshots/layer3_adversarial/invalid_optional_header.full.json +++ b/tests/contract/snapshots/layer3_adversarial/invalid_optional_header.full.json @@ -108,6 +108,39 @@ } ], "heuristics": [ + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "optional_header_invalid_size_of_headers", + "size_of_headers": 2048, + "file_alignment": 16384 + } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "optional_header_invalid_section_alignment", + "section_alignment": 4096, + "file_alignment": 16384 + } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "optional_header_size_of_image_misaligned", + "size_of_image": 512, + "section_alignment": 4096 + } + }, { "value": "pe_structure_anomaly", "start": 0, diff --git a/tests/contract/snapshots/layer3_adversarial/invalid_optional_header.pe32.full.json b/tests/contract/snapshots/layer3_adversarial/invalid_optional_header.pe32.full.json index f80cd71..e077361 100644 --- a/tests/contract/snapshots/layer3_adversarial/invalid_optional_header.pe32.full.json +++ b/tests/contract/snapshots/layer3_adversarial/invalid_optional_header.pe32.full.json @@ -118,6 +118,18 @@ } ], "heuristics": [ + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "entrypoint_out_of_bounds", + "entry_point": 2415919104, + "size_of_image": 512, + "position": "beyond_size_of_image" + } + }, { "value": "pe_structure_anomaly", "start": 0, @@ -131,6 +143,18 @@ "file_alignment": 16384 } }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "section_overlaps_headers", + "section": ".text", + "raw_address": 512, + "size_of_headers": 2048 + } + }, { "value": "pe_structure_anomaly", "start": 0, @@ -148,8 +172,31 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "entrypoint_out_of_bounds", - "entry_point": 2415919104 + "reason": "optional_header_invalid_size_of_headers", + "size_of_headers": 2048, + "file_alignment": 16384 + } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "optional_header_invalid_section_alignment", + "section_alignment": 4096, + "file_alignment": 16384 + } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "optional_header_size_of_image_misaligned", + "size_of_image": 512, + "section_alignment": 4096 } }, { diff --git a/tests/contract/snapshots/layer3_adversarial/invalid_section_alignment.full.json b/tests/contract/snapshots/layer3_adversarial/invalid_section_alignment.full.json index 044fe2e..fb4ac34 100644 --- a/tests/contract/snapshots/layer3_adversarial/invalid_section_alignment.full.json +++ b/tests/contract/snapshots/layer3_adversarial/invalid_section_alignment.full.json @@ -137,9 +137,10 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "import_rva_invalid", - "rva": 0, - "size": 0 + "reason": "section_overlaps_headers", + "section": ".text", + "raw_address": 291, + "size_of_headers": 512 } } ] diff --git a/tests/contract/snapshots/layer3_adversarial/malformed_import_table.full.json b/tests/contract/snapshots/layer3_adversarial/malformed_import_table.full.json index bd2f93f..f393720 100644 --- a/tests/contract/snapshots/layer3_adversarial/malformed_import_table.full.json +++ b/tests/contract/snapshots/layer3_adversarial/malformed_import_table.full.json @@ -124,11 +124,10 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "data_directory_out_of_range", - "directory": "IMAGE_DIRECTORY_ENTRY_IMPORT", - "rva": 3735928559, - "size": 512, - "size_of_image": 12288 + "reason": "entrypoint_in_overlay", + "entry_point": 4096, + "entry_point_file_offset": 512, + "overlay_offset": 392 } }, { @@ -137,9 +136,11 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "import_rva_invalid", + "reason": "data_directory_out_of_range", + "directory": "IMAGE_DIRECTORY_ENTRY_IMPORT", "rva": 3735928559, - "size": 512 + "size": 512, + "size_of_image": 12288 } } ] diff --git a/tests/contract/snapshots/layer3_adversarial/overlapping_sections.full.json b/tests/contract/snapshots/layer3_adversarial/overlapping_sections.full.json index ccd2a62..c4fdf90 100644 --- a/tests/contract/snapshots/layer3_adversarial/overlapping_sections.full.json +++ b/tests/contract/snapshots/layer3_adversarial/overlapping_sections.full.json @@ -151,7 +151,19 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "section_overlap", + "reason": "entrypoint_in_overlay", + "entry_point": 4096, + "entry_point_file_offset": 512, + "overlay_offset": 392 + } + }, + { + "value": "pe_structure_anomaly", + "start": 0, + "end": 0, + "category": "pe_heuristic", + "metadata": { + "reason": "section_raw_overlap", "section_a": ".text", "section_b": ".data" } @@ -162,9 +174,9 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "optional_header_inconsistent_size", - "size_of_image": 12288, - "max_section_end": 14336 + "reason": "section_overlap", + "section_a": ".text", + "section_b": ".data" } }, { @@ -173,9 +185,9 @@ "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "import_rva_invalid", - "rva": 0, - "size": 0 + "reason": "optional_header_inconsistent_size", + "size_of_image": 12288, + "max_section_end": 14336 } } ] diff --git a/tests/contract/snapshots/layer3_adversarial/packed_lookalike.full.json b/tests/contract/snapshots/layer3_adversarial/packed_lookalike.full.json index 938434b..4652727 100644 --- a/tests/contract/snapshots/layer3_adversarial/packed_lookalike.full.json +++ b/tests/contract/snapshots/layer3_adversarial/packed_lookalike.full.json @@ -208,17 +208,6 @@ "size_of_image": 16384, "max_section_end": 20480 } - }, - { - "value": "pe_structure_anomaly", - "start": 0, - "end": 0, - "category": "pe_heuristic", - "metadata": { - "reason": "import_rva_invalid", - "rva": 0, - "size": 0 - } } ] } diff --git a/tests/contract/snapshots/layer3_adversarial/truncated_rich_header.full.json b/tests/contract/snapshots/layer3_adversarial/truncated_rich_header.full.json index 637b8e5..d32a714 100644 --- a/tests/contract/snapshots/layer3_adversarial/truncated_rich_header.full.json +++ b/tests/contract/snapshots/layer3_adversarial/truncated_rich_header.full.json @@ -118,15 +118,16 @@ } ], "heuristics": [ - { + { "value": "pe_structure_anomaly", "start": 0, "end": 0, "category": "pe_heuristic", "metadata": { - "reason": "import_rva_invalid", - "rva": 0, - "size": 0 + "reason": "entrypoint_in_overlay", + "entry_point": 4096, + "entry_point_file_offset": 512, + "overlay_offset": 392 } } ] diff --git a/tests/contract/snapshots/layer3_adversarial/upx_name_only.full.json b/tests/contract/snapshots/layer3_adversarial/upx_name_only.full.json index 8669f54..d96ebf1 100644 --- a/tests/contract/snapshots/layer3_adversarial/upx_name_only.full.json +++ b/tests/contract/snapshots/layer3_adversarial/upx_name_only.full.json @@ -172,17 +172,6 @@ "reason": "packer_section_name", "section": ".upx1" } - }, - { - "value": "pe_structure_anomaly", - "start": 0, - "end": 0, - "category": "pe_heuristic", - "metadata": { - "reason": "import_rva_invalid", - "rva": 0, - "size": 0 - } } ] } diff --git a/tests/contract/test_pipeline.py b/tests/contract/test_pipeline.py index 8bc7460..807e524 100644 --- a/tests/contract/test_pipeline.py +++ b/tests/contract/test_pipeline.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + """ Contract‑Safe Snapshot Tests diff --git a/tests/contract/test_snapshot_contract.py b/tests/contract/test_snapshot_contract.py index 557ada9..5402c3e 100644 --- a/tests/contract/test_snapshot_contract.py +++ b/tests/contract/test_snapshot_contract.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import json import pytest from pathlib import Path diff --git a/tests/fuzz/extractors/crypto/test_crypto_fuzz.py b/tests/fuzz/extractors/crypto/test_crypto_fuzz.py index f98bf51..d9f5bd8 100644 --- a/tests/fuzz/extractors/crypto/test_crypto_fuzz.py +++ b/tests/fuzz/extractors/crypto/test_crypto_fuzz.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest import random import string diff --git a/tests/fuzz/extractors/domains/test_punycode_fuzz.py b/tests/fuzz/extractors/domains/test_punycode_fuzz.py index 2185933..995957e 100644 --- a/tests/fuzz/extractors/domains/test_punycode_fuzz.py +++ b/tests/fuzz/extractors/domains/test_punycode_fuzz.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import random import string import idna diff --git a/tests/fuzz/extractors/filepaths/test_filepaths_fuzz.py b/tests/fuzz/extractors/filepaths/test_filepaths_fuzz.py index a9f339f..71d406b 100644 --- a/tests/fuzz/extractors/filepaths/test_filepaths_fuzz.py +++ b/tests/fuzz/extractors/filepaths/test_filepaths_fuzz.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import random import string import pytest diff --git a/tests/fuzz/extractors/ips/test_ips_cidr_fuzz.py b/tests/fuzz/extractors/ips/test_ips_cidr_fuzz.py index 1400a17..c0508de 100644 --- a/tests/fuzz/extractors/ips/test_ips_cidr_fuzz.py +++ b/tests/fuzz/extractors/ips/test_ips_cidr_fuzz.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + """ CIDR-aware fuzzing for the IP extractor. diff --git a/tests/fuzz/extractors/ips/test_ips_corpus_guided.py b/tests/fuzz/extractors/ips/test_ips_corpus_guided.py index c49c5a9..55e59ed 100644 --- a/tests/fuzz/extractors/ips/test_ips_corpus_guided.py +++ b/tests/fuzz/extractors/ips/test_ips_corpus_guided.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + """ Corpus-guided fuzzing for the IP extractor. diff --git a/tests/fuzz/extractors/ips/test_ips_fuzz.py b/tests/fuzz/extractors/ips/test_ips_fuzz.py index 00492a6..427497b 100644 --- a/tests/fuzz/extractors/ips/test_ips_fuzz.py +++ b/tests/fuzz/extractors/ips/test_ips_fuzz.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest import random import string diff --git a/tests/integration/test_cli_file_input.py b/tests/integration/test_cli_file_input.py index 1c68a0c..d76cacf 100644 --- a/tests/integration/test_cli_file_input.py +++ b/tests/integration/test_cli_file_input.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import json import subprocess import pytest diff --git a/tests/integration/test_cli_real_binaries.py b/tests/integration/test_cli_real_binaries.py index f8b611b..8582b83 100644 --- a/tests/integration/test_cli_real_binaries.py +++ b/tests/integration/test_cli_real_binaries.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import json import subprocess from pathlib import Path diff --git a/tests/integration/test_cli_rsrc_iocs.py b/tests/integration/test_cli_rsrc_iocs.py index 50ae743..7385eea 100644 --- a/tests/integration/test_cli_rsrc_iocs.py +++ b/tests/integration/test_cli_rsrc_iocs.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import json import subprocess from pathlib import Path diff --git a/tests/integration/test_cli_text_input.py b/tests/integration/test_cli_text_input.py index 65ed214..7fd9f0f 100644 --- a/tests/integration/test_cli_text_input.py +++ b/tests/integration/test_cli_text_input.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import json import subprocess import pytest diff --git a/tests/integration/test_cli_windows_pe_iocs.py b/tests/integration/test_cli_windows_pe_iocs.py index 6caf564..eccba3c 100644 --- a/tests/integration/test_cli_windows_pe_iocs.py +++ b/tests/integration/test_cli_windows_pe_iocs.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import json import subprocess import pytest diff --git a/tests/integration/test_crypto_entropy_payload.py b/tests/integration/test_crypto_entropy_payload.py index 31b4dc2..7157b71 100644 --- a/tests/integration/test_crypto_entropy_payload.py +++ b/tests/integration/test_crypto_entropy_payload.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import json import subprocess from pathlib import Path diff --git a/tests/integration/test_engine_with_plugins.py b/tests/integration/test_engine_with_plugins.py index 68cac46..b974937 100644 --- a/tests/integration/test_engine_with_plugins.py +++ b/tests/integration/test_engine_with_plugins.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import types import pytest from pathlib import Path diff --git a/tests/integration/test_franken_malformed_pe.py b/tests/integration/test_franken_malformed_pe.py index 7bbc0bc..de8cc9e 100644 --- a/tests/integration/test_franken_malformed_pe.py +++ b/tests/integration/test_franken_malformed_pe.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import json import subprocess import pytest @@ -36,15 +39,17 @@ def test_franken_expected_heuristics(franken_result): } expected = { - "section_overlap", - "section_raw_misaligned", - "optional_header_inconsistent_size", "entrypoint_out_of_bounds", + "optional_header_inconsistent_size", "data_directory_out_of_range", "data_directory_zero_rva_nonzero_size", - "import_rva_invalid", + "section_raw_misaligned", + "section_overlap", + "section_raw_overlap" } + print(heur) + assert heur == expected @pytest.mark.integration diff --git a/tests/integration/test_full_pipeline_with_plugins.py b/tests/integration/test_full_pipeline_with_plugins.py index 1eae9a6..eabf2dd 100644 --- a/tests/integration/test_full_pipeline_with_plugins.py +++ b/tests/integration/test_full_pipeline_with_plugins.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import json, sys import subprocess from pathlib import Path diff --git a/tests/integration/test_pe_fixtures.py b/tests/integration/test_pe_fixtures.py index 6cae3de..8e38d2a 100644 --- a/tests/integration/test_pe_fixtures.py +++ b/tests/integration/test_pe_fixtures.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import json import subprocess import pathlib diff --git a/tests/integration/test_string_obfuscation_tricks.py b/tests/integration/test_string_obfuscation_tricks.py index 9bb271f..cca6217 100644 --- a/tests/integration/test_string_obfuscation_tricks.py +++ b/tests/integration/test_string_obfuscation_tricks.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import json import subprocess from pathlib import Path diff --git a/tests/performance/engine/test_engine_1mb_perf.py b/tests/performance/engine/test_engine_1mb_perf.py index d067915..66e0152 100644 --- a/tests/performance/engine/test_engine_1mb_perf.py +++ b/tests/performance/engine/test_engine_1mb_perf.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import time import pytest from iocx.engine import Engine diff --git a/tests/performance/engine/test_engine_dense_perf.py b/tests/performance/engine/test_engine_dense_perf.py index 11a4cfd..87569bc 100644 --- a/tests/performance/engine/test_engine_dense_perf.py +++ b/tests/performance/engine/test_engine_dense_perf.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import time import pytest from iocx.engine import Engine diff --git a/tests/performance/engine/test_engine_franken_perf.py b/tests/performance/engine/test_engine_franken_perf.py index 0fb2ecb..ce64aae 100644 --- a/tests/performance/engine/test_engine_franken_perf.py +++ b/tests/performance/engine/test_engine_franken_perf.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import time import pytest from iocx.engine import Engine diff --git a/tests/performance/engine/test_engine_typical_perf.py b/tests/performance/engine/test_engine_typical_perf.py index 635addf..a8fb0ea 100644 --- a/tests/performance/engine/test_engine_typical_perf.py +++ b/tests/performance/engine/test_engine_typical_perf.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import time import pytest from iocx.engine import Engine diff --git a/tests/performance/extractors/crypto/test_crypto_perf.py b/tests/performance/extractors/crypto/test_crypto_perf.py index 88c8053..98dcc88 100644 --- a/tests/performance/extractors/crypto/test_crypto_perf.py +++ b/tests/performance/extractors/crypto/test_crypto_perf.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest import time import random diff --git a/tests/performance/extractors/domains/test_domains_perf.py b/tests/performance/extractors/domains/test_domains_perf.py index 5dc60e3..6befe6d 100644 --- a/tests/performance/extractors/domains/test_domains_perf.py +++ b/tests/performance/extractors/domains/test_domains_perf.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest import time import random diff --git a/tests/performance/extractors/filepaths/test_filepaths_perf.py b/tests/performance/extractors/filepaths/test_filepaths_perf.py index e4296f8..753b720 100644 --- a/tests/performance/extractors/filepaths/test_filepaths_perf.py +++ b/tests/performance/extractors/filepaths/test_filepaths_perf.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest import time import random diff --git a/tests/performance/extractors/ips/test_ips_perf.py b/tests/performance/extractors/ips/test_ips_perf.py index f6b06cc..1330c99 100644 --- a/tests/performance/extractors/ips/test_ips_perf.py +++ b/tests/performance/extractors/ips/test_ips_perf.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest import time import random diff --git a/tests/robustness/extractors/crypto/test_crypto_robustness.py b/tests/robustness/extractors/crypto/test_crypto_robustness.py index 6c02b4d..4682cda 100644 --- a/tests/robustness/extractors/crypto/test_crypto_robustness.py +++ b/tests/robustness/extractors/crypto/test_crypto_robustness.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.crypto import extract diff --git a/tests/robustness/extractors/filepaths/test_backtracking_safety.py b/tests/robustness/extractors/filepaths/test_backtracking_safety.py index c2aa5c7..9e80f1a 100644 --- a/tests/robustness/extractors/filepaths/test_backtracking_safety.py +++ b/tests/robustness/extractors/filepaths/test_backtracking_safety.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.filepaths import extract diff --git a/tests/robustness/extractors/filepaths/test_filepaths_chaos.py b/tests/robustness/extractors/filepaths/test_filepaths_chaos.py index 7915f0b..140d907 100644 --- a/tests/robustness/extractors/filepaths/test_filepaths_chaos.py +++ b/tests/robustness/extractors/filepaths/test_filepaths_chaos.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.filepaths import extract from iocx.models import Detection diff --git a/tests/robustness/extractors/ips/test_ips_chaos.py b/tests/robustness/extractors/ips/test_ips_chaos.py index 0e14fd9..7e562e3 100644 --- a/tests/robustness/extractors/ips/test_ips_chaos.py +++ b/tests/robustness/extractors/ips/test_ips_chaos.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.ips import extract diff --git a/tests/unit/analysis/test_extended.py b/tests/unit/analysis/test_extended.py index 0a551b6..d0a1f37 100644 --- a/tests/unit/analysis/test_extended.py +++ b/tests/unit/analysis/test_extended.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.analysis.extended import analyse_extended diff --git a/tests/unit/analysis/test_heuristics.py b/tests/unit/analysis/test_heuristics.py index 7e76728..3f957a9 100644 --- a/tests/unit/analysis/test_heuristics.py +++ b/tests/unit/analysis/test_heuristics.py @@ -1,6 +1,10 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest -from iocx.analysis.heuristics import analyse_pe_heuristics, _analyse_tls, _map_rva_to_section, _analyse_section_overlap, _analyse_section_alignment, _analyse_optional_header_consistency, _analyse_data_directory_anomalies, _analyse_import_directory_validity -from iocx.models import Detection + +from iocx.analysis.heuristics import analyse_pe_heuristics, _analyse_structural +from iocx.validators import run_structural_validators def _find(dets, value, reason): @@ -10,6 +14,15 @@ def _find(dets, value, reason): return None +def build_analysis(sections=None, data_directories=None, extended=None, obfuscation=None): + return { + "sections": sections or [], + "data_directories": data_directories or [], + "extended": extended or [], + "obfuscation": obfuscation or [], + } + + def test_packer_high_entropy_section(): metadata = { "file_type": "PE", @@ -20,8 +33,8 @@ def test_packer_high_entropy_section(): "has_signature": False, } - analysis = { - "sections": [ + analysis = build_analysis( + sections=[ { "name": ".text", "raw_size": 4096, @@ -29,14 +42,13 @@ def test_packer_high_entropy_section(): "characteristics": 0x60000020, "entropy": 8.2, } - ], - "obfuscation": [], - "extended": [], - } + ] + ) + analysis["structural"] = run_structural_validators({}, metadata, analysis) dets = analyse_pe_heuristics(metadata, analysis) - d = _find(dets, "packer_suspected", "high_entropy_section") + d = _find(dets, "packer_suspected", "high_entropy_section") assert d is not None assert d.metadata["section"] == ".text" assert d.metadata["entropy"] == 8.2 @@ -52,8 +64,8 @@ def test_packer_upx_section_name(): "has_signature": False, } - analysis = { - "sections": [ + analysis = build_analysis( + sections=[ { "name": "UPX1", "raw_size": 2048, @@ -61,14 +73,13 @@ def test_packer_upx_section_name(): "characteristics": 0x60000020, "entropy": 6.0, } - ], - "obfuscation": [], - "extended": [], - } + ] + ) + analysis["structural"] = run_structural_validators({}, metadata, analysis) dets = analyse_pe_heuristics(metadata, analysis) - d = _find(dets, "packer_suspected", "packer_section_name") + d = _find(dets, "packer_suspected", "packer_section_name") assert d is not None assert d.metadata["section"] == "UPX1" @@ -83,10 +94,9 @@ def test_tls_callback_outside_range(): "has_signature": False, } - analysis = { - "sections": [], - "obfuscation": [], - "extended": [ + analysis = build_analysis( + sections=[{"name": ".text"}], + extended=[ { "value": "tls_directory", "start": 0, @@ -99,11 +109,12 @@ def test_tls_callback_outside_range(): }, } ], - } + ) + analysis["structural"] = run_structural_validators({}, metadata, analysis) dets = analyse_pe_heuristics(metadata, analysis) - d = _find(dets, "tls_callback_anomaly", "callback_outside_tls_range") + d = _find(dets, "pe_structure_anomaly", "callback_outside_tls_range") assert d is not None assert d.metadata["callbacks"] == 0x3000 @@ -121,8 +132,8 @@ def test_anti_debug_imports_and_rwx_section(): "has_signature": False, } - analysis = { - "sections": [ + analysis = build_analysis( + sections=[ { "name": ".rwx", "raw_size": 1024, @@ -130,16 +141,17 @@ def test_anti_debug_imports_and_rwx_section(): "characteristics": 0xA0000020, # EXECUTE + WRITE "entropy": 5.0, } - ], - "obfuscation": [], - "extended": [], - } + ] + ) + analysis["structural"] = run_structural_validators({}, metadata, analysis) dets = analyse_pe_heuristics(metadata, analysis) assert _find(dets, "anti_debug_heuristic", "anti_debug_api_import") assert _find(dets, "anti_debug_heuristic", "timing_api_import") - assert _find(dets, "anti_debug_heuristic", "rwx_section") + + # RWX is structural + assert _find(dets, "pe_structure_anomaly", "section_rwx") def test_import_anomalies_large_and_ordinal_ratio(): @@ -162,8 +174,9 @@ def test_import_anomalies_large_and_ordinal_ratio(): "has_signature": False, } - analysis = {"sections": [], "obfuscation": [], "extended": []} + analysis = build_analysis(sections=[{"name": ".text"}]) + analysis["structural"] = run_structural_validators({}, metadata, analysis) dets = analyse_pe_heuristics(metadata, analysis) assert _find(dets, "import_anomaly", "large_import_table") @@ -182,10 +195,9 @@ def test_import_anomaly_uncommon_dll_for_gui(): "has_signature": False, } - analysis = { - "sections": [], - "obfuscation": [], - "extended": [ + analysis = build_analysis( + sections=[{"name": ".text"}], + extended=[ { "value": "header", "start": 0, @@ -194,11 +206,12 @@ def test_import_anomaly_uncommon_dll_for_gui(): "metadata": {"subsystem_human": "Windows GUI"}, } ], - } + ) + analysis["structural"] = run_structural_validators({}, metadata, analysis) dets = analyse_pe_heuristics(metadata, analysis) - d = _find(dets, "import_anomaly", "uncommon_dll_for_gui_subsystem") + d = _find(dets, "import_anomaly", "uncommon_dll_for_gui_subsystem") assert d is not None assert d.metadata["dll"] == "ntoskrnl.exe" @@ -213,12 +226,12 @@ def test_signature_flag_without_metadata(): "has_signature": True, } - analysis = {"sections": [], "obfuscation": [], "extended": []} + analysis = build_analysis(sections=[{"name": ".text"}]) + analysis["structural"] = run_structural_validators({}, metadata, analysis) dets = analyse_pe_heuristics(metadata, analysis) - d = _find(dets, "signature_anomaly", "signature_flag_set_but_no_metadata") - assert d is not None + assert _find(dets, "pe_structure_anomaly", "signature_flag_set_but_no_metadata") def test_synthetic_triggers_all_heuristics(): @@ -227,46 +240,36 @@ def test_synthetic_triggers_all_heuristics(): "imports": ["KERNEL32.dll", "ntoskrnl.exe", "user32.dll"], "import_details": ( [ - # Anti-debug + timing {"dll": "KERNEL32.dll", "function": "IsDebuggerPresent", "ordinal": None}, {"dll": "KERNEL32.dll", "function": "GetTickCount", "ordinal": None}, - # Uncommon DLL for GUI subsystem {"dll": "ntoskrnl.exe", "function": "KeBugCheckEx", "ordinal": None}, ] - + [ - # Lots of ordinal-only imports to trigger large table + high ordinal ratio - {"dll": "user32.dll", "function": None, "ordinal": i} - for i in range(600) - ] + + [{"dll": "user32.dll", "function": None, "ordinal": i} for i in range(600)] ), "tls": {}, "signatures": [], - "has_signature": True, # triggers signature_anomaly + "has_signature": True, } - analysis = { - "sections": [ + analysis = build_analysis( + sections=[ { - # Triggers packer_section_name + high_entropy_section "name": "UPX0", "raw_size": 4096, "virtual_size": 4000, - "characteristics": 0xE0000020, # EXECUTE | READ | WRITE + "characteristics": 0xE0000020, "entropy": 8.6, }, { - # Triggers rwx_section "name": ".rwx", "raw_size": 2048, "virtual_size": 1800, - "characteristics": 0xA0000020, # EXECUTE | WRITE + "characteristics": 0xA0000020, "entropy": 5.0, }, ], - "obfuscation": [], - "extended": [ + extended=[ { - # Triggers tls_callback_anomaly "value": "tls_directory", "start": 0, "end": 0, @@ -274,22 +277,20 @@ def test_synthetic_triggers_all_heuristics(): "metadata": { "start_address": 0x1000, "end_address": 0x2000, - "callbacks": 0x3000, # outside range + "callbacks": 0x3000, }, }, { - # Triggers uncommon_dll_for_gui_subsystem "value": "header", "start": 0, "end": 0, "category": "pe_metadata", - "metadata": { - "subsystem_human": "Windows GUI", - }, + "metadata": {"subsystem_human": "Windows GUI"}, }, ], - } + ) + analysis["structural"] = run_structural_validators({}, metadata, analysis) dets = analyse_pe_heuristics(metadata, analysis) expected = { @@ -297,12 +298,12 @@ def test_synthetic_triggers_all_heuristics(): ("packer_suspected", "high_entropy_section"), ("anti_debug_heuristic", "anti_debug_api_import"), ("anti_debug_heuristic", "timing_api_import"), - ("anti_debug_heuristic", "rwx_section"), - ("tls_callback_anomaly", "callback_outside_tls_range"), + ("pe_structure_anomaly", "section_rwx"), + ("pe_structure_anomaly", "callback_outside_tls_range"), ("import_anomaly", "large_import_table"), ("import_anomaly", "high_ordinal_import_ratio"), ("import_anomaly", "uncommon_dll_for_gui_subsystem"), - ("signature_anomaly", "signature_flag_set_but_no_metadata"), + ("pe_structure_anomaly", "signature_flag_set_but_no_metadata"), } seen = {(d.value, d.metadata.get("reason")) for d in dets} @@ -310,160 +311,30 @@ def test_synthetic_triggers_all_heuristics(): assert pair in seen, f"Missing heuristic {pair}" -def test_tls_analysis_skips_incomplete_entries(): - analysis = { - "extended": [ - { - "value": "tls_directory", - "metadata": { - # Missing start_address, end_address, callbacks - # This forces the `continue` branch - } - } - ] - } - - detections = _analyse_tls({}, analysis) - - # No detections should be produced - assert detections == [] - - -def test_map_rva_to_section_skips_invalid_types(): - sections = [ - {"virtual_address": "not-an-int", "virtual_size": 100}, # triggers continue - {"virtual_address": 0x1000, "virtual_size": 0x200}, # valid section - ] - - rva = 0x1100 - result = _map_rva_to_section(sections, rva) - - assert result == sections[1] - +def test_analyse_structural_top_level_return(): + analysis = {"structural": 123} # not a dict → triggers early return + result = _analyse_structural(analysis) + assert result == [] -def test_analyse_section_overlap_skips_invalid_inner_section(): - sections = [ - # a = valid section - {"name": ".text", "virtual_address": 0x1000, "virtual_size": 0x200}, - # b = invalid section (triggers inner continue) - {"name": ".data", "virtual_address": "not-an-int", "virtual_size": 0x100}, - ] - - metadata = {} - analysis = {"sections": sections} - - out = _analyse_section_overlap(metadata, analysis) - - # No overlap detection should be produced - assert out == [] - - -def test_analyse_section_alignment_skips_invalid_section_fields(): - metadata = { - "optional_header": { - "file_alignment": 0x200 # valid alignment - } - } +def test_analyse_structural_continue_non_list_issues(): analysis = { - "sections": [ - # This section triggers the `continue` branch - {"name": ".bad", "raw_address": "oops", "raw_size": 100}, - - # This section is valid and should be processed normally - {"name": ".text", "raw_address": 0x400, "raw_size": 0x200}, - ] - } - - out = _analyse_section_alignment(metadata, analysis) - - # No misalignment here, so output should be empty - assert out == [] - - -def test_optional_header_consistency_skips_invalid_section_fields(): - metadata = { - "optional_header": { - "size_of_image": 0x3000 # valid, positive int + "structural": { + "weird_category": "not-a-list" # triggers first continue } } + result = _analyse_structural(analysis) + assert result == [] - analysis = { - "sections": [ - # This section triggers the `continue` branch - {"name": ".bad", "virtual_address": "oops", "virtual_size": 100}, - - # This section is valid and should be processed - {"name": ".text", "virtual_address": 0x1000, "virtual_size": 0x200}, - ] - } - - out = _analyse_optional_header_consistency(metadata, analysis) - - # max_end = 0x1000 + 0x200 = 0x1200 < size_of_image → no detection - assert out == [] - - -def test_data_directory_anomalies_skips_invalid_entries(): - metadata = { - "optional_header": { - "size_of_image": 0x3000 # valid positive int - } - } +def test_analyse_structural_continue_non_dict_issue(): analysis = { - "data_directories": [ - # This entry triggers the `continue` branch - {"name": "bad", "rva": "oops", "size": 100}, - - # This entry is valid and should be processed - {"name": "good", "rva": 0x1000, "size": 0x200}, - ] - } - - out = _analyse_data_directory_anomalies(metadata, analysis) - - # No anomaly here because rva+size < size_of_image - assert out == [] - - -def test_data_directory_anomalies_skips_invalid_inner_directory(): - metadata = { - "optional_header": { - "size_of_image": 0x3000 # valid, so the function enters the loop + "structural": { + "cat": [ + "not-a-dict", # triggers second continue + 123, # also triggers second continue + ] } } - - analysis = { - "data_directories": [ - # a = valid entry → outer loop does NOT continue - {"name": "A", "rva": 0x1000, "size": 0x200}, - - # b = invalid entry → triggers the inner continue - {"name": "B", "rva": "oops", "size": 0x100}, - ] - } - - out = _analyse_data_directory_anomalies(metadata, analysis) - - # No overlap detection should be produced - assert out == [] - - -def test_import_directory_validity_skips_invalid_rva_or_size(): - metadata = {} - analysis = { - "data_directories": [ - # This entry is treated as the import directory (idx == 1) - # but has invalid types → triggers the continue - {"index": 1, "name": "import", "rva": "oops", "size": 100}, - ], - # Must include at least one section or the function returns early - "sections": [{"name": ".text"}], - } - - out = _analyse_import_directory_validity(metadata, analysis) - - # No detection should be produced - assert out == [] - + result = _analyse_structural(analysis) + assert result == [] diff --git a/tests/unit/analysis/test_obfuscation.py b/tests/unit/analysis/test_obfuscation.py index da8a2e2..e79aeed 100644 --- a/tests/unit/analysis/test_obfuscation.py +++ b/tests/unit/analysis/test_obfuscation.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import math import pytest diff --git a/tests/unit/analysis/test_obfuscation_ext.py b/tests/unit/analysis/test_obfuscation_ext.py index 0276642..3b486a3 100644 --- a/tests/unit/analysis/test_obfuscation_ext.py +++ b/tests/unit/analysis/test_obfuscation_ext.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.analysis.obfuscation import analyse_obfuscation, _detect_high_entropy_sections, _looks_like_rot13, _non_printable_ratio, _detect_string_obfuscation from iocx.analysis.extended import analyse_extended diff --git a/tests/unit/cli/test_cli_basic.py b/tests/unit/cli/test_cli_basic.py index bab7ad7..683ff47 100644 --- a/tests/unit/cli/test_cli_basic.py +++ b/tests/unit/cli/test_cli_basic.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import subprocess import sys from pathlib import Path diff --git a/tests/unit/cli/test_cli_ext.py b/tests/unit/cli/test_cli_ext.py index 66f7164..4d78c29 100644 --- a/tests/unit/cli/test_cli_ext.py +++ b/tests/unit/cli/test_cli_ext.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import subprocess import sys from pathlib import Path diff --git a/tests/unit/detector/test_all_detectors.py b/tests/unit/detector/test_all_detectors.py index 9e7185f..710e297 100644 --- a/tests/unit/detector/test_all_detectors.py +++ b/tests/unit/detector/test_all_detectors.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.registry import all_detectors from iocx.engine import Engine, EngineConfig diff --git a/tests/unit/engine/test_engine.py b/tests/unit/engine/test_engine.py index 3195c21..dc50c96 100644 --- a/tests/unit/engine/test_engine.py +++ b/tests/unit/engine/test_engine.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import os import pytest @@ -329,9 +332,15 @@ def fake_extended(pe, meta, text): monkeypatch.setitem(engine._pipeline_pe.__globals__, "analyse_extended", fake_extended) # --- Patch Engine internal methods --- + class FakePE: + __data__ = b"\x00" * 4096 + def get_overlay_data_start_offset(self): + return None + + engine._get_pe = lambda path: FakePE engine._get_pe_metadata = lambda path: ( - {"pe": True}, + FakePE(), {"resource_strings": ["resA"], "meta": "x"}, ) diff --git a/tests/unit/engine/test_engine_enrichment.py b/tests/unit/engine/test_engine_enrichment.py index f36402f..1543918 100644 --- a/tests/unit/engine/test_engine_enrichment.py +++ b/tests/unit/engine/test_engine_enrichment.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.engine import Engine from iocx.models import Detection diff --git a/tests/unit/engine/test_engine_ext.py b/tests/unit/engine/test_engine_ext.py index bcb8e96..a531a12 100644 --- a/tests/unit/engine/test_engine_ext.py +++ b/tests/unit/engine/test_engine_ext.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.engine import Engine from iocx.models import Detection diff --git a/tests/unit/engine/test_engine_overlap_suppression.py b/tests/unit/engine/test_engine_overlap_suppression.py index d4a93e6..78abc93 100644 --- a/tests/unit/engine/test_engine_overlap_suppression.py +++ b/tests/unit/engine/test_engine_overlap_suppression.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.engine import Engine diff --git a/tests/unit/engine/test_engine_validators.py b/tests/unit/engine/test_engine_validators.py index c406c09..ec29099 100644 --- a/tests/unit/engine/test_engine_validators.py +++ b/tests/unit/engine/test_engine_validators.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.engine import Engine diff --git a/tests/unit/engine/test_internal_metadata_schema.py b/tests/unit/engine/test_internal_metadata_schema.py new file mode 100644 index 0000000..eb618c9 --- /dev/null +++ b/tests/unit/engine/test_internal_metadata_schema.py @@ -0,0 +1,27 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +import pytest +from iocx.schemas.internal_schema import InternalMetadata +from iocx.engine import Engine +from iocx import validators + +@pytest.fixture +def minimal_pe_path(request): + root = request.config.rootpath + return str(root / "tests" / "integration" / "fixtures" / "bin" / "pe_rsrc.exe") + +def test_internal_metadata_schema(minimal_pe_path): + engine = Engine() + engine._analysis_level = "full" + + engine._pipeline_pe(minimal_pe_path) + + internal = engine._internal_metadata + + assert "resources_struct" in internal + root = internal["resources_struct"]["root"] + + assert isinstance(root["rva"], int) + assert isinstance(root["size"], int) + assert isinstance(root["entries"], list) diff --git a/tests/unit/extractors/base64/test_base64.py b/tests/unit/extractors/base64/test_base64.py index b7d1aa1..b0bcacb 100644 --- a/tests/unit/extractors/base64/test_base64.py +++ b/tests/unit/extractors/base64/test_base64.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.base64 import extract diff --git a/tests/unit/extractors/base64/test_base64_edge.py b/tests/unit/extractors/base64/test_base64_edge.py index 083e3da..96a3c07 100644 --- a/tests/unit/extractors/base64/test_base64_edge.py +++ b/tests/unit/extractors/base64/test_base64_edge.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest import base64 from iocx.detectors.extractors.base64 import extract diff --git a/tests/unit/extractors/base64/test_base64_ext.py b/tests/unit/extractors/base64/test_base64_ext.py index 5f5ca82..0c147d7 100644 --- a/tests/unit/extractors/base64/test_base64_ext.py +++ b/tests/unit/extractors/base64/test_base64_ext.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.base64 import extract diff --git a/tests/unit/extractors/crypto/test_crypto.py b/tests/unit/extractors/crypto/test_crypto.py index cb4c262..f659bf7 100644 --- a/tests/unit/extractors/crypto/test_crypto.py +++ b/tests/unit/extractors/crypto/test_crypto.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from iocx.detectors.extractors.crypto import extract from iocx.models import Detection diff --git a/tests/unit/extractors/crypto/test_crypto_base58.py b/tests/unit/extractors/crypto/test_crypto_base58.py index f7685be..7f73b05 100644 --- a/tests/unit/extractors/crypto/test_crypto_base58.py +++ b/tests/unit/extractors/crypto/test_crypto_base58.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from iocx.detectors.extractors.crypto import extract, base58check_decode from iocx.models import Detection import pytest diff --git a/tests/unit/extractors/crypto/test_crypto_ext.py b/tests/unit/extractors/crypto/test_crypto_ext.py index c69a688..e2f0c24 100644 --- a/tests/unit/extractors/crypto/test_crypto_ext.py +++ b/tests/unit/extractors/crypto/test_crypto_ext.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from iocx.detectors.extractors.crypto import extract, is_valid_btc_address import hashlib diff --git a/tests/unit/extractors/crypto/test_crypto_noise.py b/tests/unit/extractors/crypto/test_crypto_noise.py index 4177991..f329384 100644 --- a/tests/unit/extractors/crypto/test_crypto_noise.py +++ b/tests/unit/extractors/crypto/test_crypto_noise.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.engine import Engine, EngineConfig from iocx.detectors.registry import all_detectors diff --git a/tests/unit/extractors/crypto/test_engine_crypto.py b/tests/unit/extractors/crypto/test_engine_crypto.py index e762406..64ef8a9 100644 --- a/tests/unit/extractors/crypto/test_engine_crypto.py +++ b/tests/unit/extractors/crypto/test_engine_crypto.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import iocx.engine import inspect from iocx.engine import Engine diff --git a/tests/unit/extractors/emails/test_emails.py b/tests/unit/extractors/emails/test_emails.py index 01f5a1d..e4aca27 100644 --- a/tests/unit/extractors/emails/test_emails.py +++ b/tests/unit/extractors/emails/test_emails.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.emails import extract diff --git a/tests/unit/extractors/filepaths/test_filepaths_env.py b/tests/unit/extractors/filepaths/test_filepaths_env.py index 906eacd..c366d82 100644 --- a/tests/unit/extractors/filepaths/test_filepaths_env.py +++ b/tests/unit/extractors/filepaths/test_filepaths_env.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.filepaths import extract diff --git a/tests/unit/extractors/filepaths/test_filepaths_relative.py b/tests/unit/extractors/filepaths/test_filepaths_relative.py index 4b0fe74..d182046 100644 --- a/tests/unit/extractors/filepaths/test_filepaths_relative.py +++ b/tests/unit/extractors/filepaths/test_filepaths_relative.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.filepaths import extract diff --git a/tests/unit/extractors/filepaths/test_filepaths_tilde.py b/tests/unit/extractors/filepaths/test_filepaths_tilde.py index 74f4020..8e10f51 100644 --- a/tests/unit/extractors/filepaths/test_filepaths_tilde.py +++ b/tests/unit/extractors/filepaths/test_filepaths_tilde.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.filepaths import extract diff --git a/tests/unit/extractors/filepaths/test_filepaths_unc.py b/tests/unit/extractors/filepaths/test_filepaths_unc.py index 5f8535c..b7eb2cf 100644 --- a/tests/unit/extractors/filepaths/test_filepaths_unc.py +++ b/tests/unit/extractors/filepaths/test_filepaths_unc.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.filepaths import extract diff --git a/tests/unit/extractors/filepaths/test_filepaths_unicode.py b/tests/unit/extractors/filepaths/test_filepaths_unicode.py index c7c0595..c163053 100644 --- a/tests/unit/extractors/filepaths/test_filepaths_unicode.py +++ b/tests/unit/extractors/filepaths/test_filepaths_unicode.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.filepaths import extract diff --git a/tests/unit/extractors/filepaths/test_filepaths_unix_abs.py b/tests/unit/extractors/filepaths/test_filepaths_unix_abs.py index 7418e3e..a630eed 100644 --- a/tests/unit/extractors/filepaths/test_filepaths_unix_abs.py +++ b/tests/unit/extractors/filepaths/test_filepaths_unix_abs.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.filepaths import extract diff --git a/tests/unit/extractors/filepaths/test_filepaths_windows_abs.py b/tests/unit/extractors/filepaths/test_filepaths_windows_abs.py index 9c55f1d..c23ea3e 100644 --- a/tests/unit/extractors/filepaths/test_filepaths_windows_abs.py +++ b/tests/unit/extractors/filepaths/test_filepaths_windows_abs.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.filepaths import extract diff --git a/tests/unit/extractors/hashes/test_hashes.py b/tests/unit/extractors/hashes/test_hashes.py index e666a31..cf2f582 100644 --- a/tests/unit/extractors/hashes/test_hashes.py +++ b/tests/unit/extractors/hashes/test_hashes.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.hashes import extract diff --git a/tests/unit/extractors/ips/test_ips.py b/tests/unit/extractors/ips/test_ips.py index dbb0549..81b4d06 100644 --- a/tests/unit/extractors/ips/test_ips.py +++ b/tests/unit/extractors/ips/test_ips.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.ips import extract diff --git a/tests/unit/extractors/urls/test_bare_domain.py b/tests/unit/extractors/urls/test_bare_domain.py index c48626c..1f23f94 100644 --- a/tests/unit/extractors/urls/test_bare_domain.py +++ b/tests/unit/extractors/urls/test_bare_domain.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.urls.bare_domain import extract_bare_domains diff --git a/tests/unit/extractors/urls/test_deobfuscate.py b/tests/unit/extractors/urls/test_deobfuscate.py index 2cb1c48..5af0232 100644 --- a/tests/unit/extractors/urls/test_deobfuscate.py +++ b/tests/unit/extractors/urls/test_deobfuscate.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.urls.deobfuscate import deobfuscate_text diff --git a/tests/unit/extractors/urls/test_normalise.py b/tests/unit/extractors/urls/test_normalise.py index 4874b17..5a265fe 100644 --- a/tests/unit/extractors/urls/test_normalise.py +++ b/tests/unit/extractors/urls/test_normalise.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.urls.normalise import normalise_url diff --git a/tests/unit/extractors/urls/test_punycode.py b/tests/unit/extractors/urls/test_punycode.py index d0d19b8..0eef803 100644 --- a/tests/unit/extractors/urls/test_punycode.py +++ b/tests/unit/extractors/urls/test_punycode.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.urls.bare_domain import _punycode_decodes_to_unicode, _detect_script diff --git a/tests/unit/extractors/urls/test_strict_url.py b/tests/unit/extractors/urls/test_strict_url.py index 83e63fd..6727cf0 100644 --- a/tests/unit/extractors/urls/test_strict_url.py +++ b/tests/unit/extractors/urls/test_strict_url.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.urls.strict_url import extract_strict_urls diff --git a/tests/unit/extractors/urls/test_super_detector.py b/tests/unit/extractors/urls/test_super_detector.py index 7a8fa91..fb7492c 100644 --- a/tests/unit/extractors/urls/test_super_detector.py +++ b/tests/unit/extractors/urls/test_super_detector.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.detectors.extractors.urls import extract diff --git a/tests/unit/extractors/urls/test_urls_init.py b/tests/unit/extractors/urls/test_urls_init.py index d5a52cc..df5b086 100644 --- a/tests/unit/extractors/urls/test_urls_init.py +++ b/tests/unit/extractors/urls/test_urls_init.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import iocx.detectors.extractors.urls as urls_extract diff --git a/tests/unit/parsers/test_pe_parser.py b/tests/unit/parsers/test_pe_parser.py index cd1f690..be4fa07 100644 --- a/tests/unit/parsers/test_pe_parser.py +++ b/tests/unit/parsers/test_pe_parser.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from types import SimpleNamespace diff --git a/tests/unit/parsers/test_pe_parser_extended.py b/tests/unit/parsers/test_pe_parser_extended.py index 26547fd..7dab37a 100644 --- a/tests/unit/parsers/test_pe_parser_extended.py +++ b/tests/unit/parsers/test_pe_parser_extended.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from types import SimpleNamespace from iocx.parsers.pe_parser import parse_pe diff --git a/tests/unit/parsers/test_pe_parser_sanitise.py b/tests/unit/parsers/test_pe_parser_sanitise.py index 0dad3c5..eb435df 100644 --- a/tests/unit/parsers/test_pe_parser_sanitise.py +++ b/tests/unit/parsers/test_pe_parser_sanitise.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.parsers.pe_parser import sanitize diff --git a/tests/unit/parsers/test_string_extractor.py b/tests/unit/parsers/test_string_extractor.py index 15fbcba..df79b2f 100644 --- a/tests/unit/parsers/test_string_extractor.py +++ b/tests/unit/parsers/test_string_extractor.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.parsers.string_extractor import ( extract_strings_from_bytes, diff --git a/tests/unit/plugins/test_plugin_cache.py b/tests/unit/plugins/test_plugin_cache.py index 1bac072..5e2c7c7 100644 --- a/tests/unit/plugins/test_plugin_cache.py +++ b/tests/unit/plugins/test_plugin_cache.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from iocx.engine import Engine def test_plugin_cache(tmp_path, monkeypatch): diff --git a/tests/unit/plugins/test_plugin_context.py b/tests/unit/plugins/test_plugin_context.py index 30dd7d0..f9ecb5f 100644 --- a/tests/unit/plugins/test_plugin_context.py +++ b/tests/unit/plugins/test_plugin_context.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from iocx.engine import Engine def test_plugin_context(tmp_path, monkeypatch): diff --git a/tests/unit/plugins/test_plugin_discovery.py b/tests/unit/plugins/test_plugin_discovery.py index d24635f..2bd58eb 100644 --- a/tests/unit/plugins/test_plugin_discovery.py +++ b/tests/unit/plugins/test_plugin_discovery.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import os from pathlib import Path from iocx.plugins.loader import PluginLoader diff --git a/tests/unit/plugins/test_plugin_execution.py b/tests/unit/plugins/test_plugin_execution.py index cdab4e6..efd6d7b 100644 --- a/tests/unit/plugins/test_plugin_execution.py +++ b/tests/unit/plugins/test_plugin_execution.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from pathlib import Path from iocx.engine import Engine, EngineConfig diff --git a/tests/unit/plugins/test_plugin_loader.py b/tests/unit/plugins/test_plugin_loader.py index 2bc7c4f..f857284 100644 --- a/tests/unit/plugins/test_plugin_loader.py +++ b/tests/unit/plugins/test_plugin_loader.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import types import importlib.metadata import importlib.util diff --git a/tests/unit/plugins/test_plugin_overlap.py b/tests/unit/plugins/test_plugin_overlap.py index 3584c3e..7b20245 100644 --- a/tests/unit/plugins/test_plugin_overlap.py +++ b/tests/unit/plugins/test_plugin_overlap.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from iocx.engine import Engine, EngineConfig def test_plugin_overlap_suppression(tmp_path, monkeypatch): diff --git a/tests/unit/plugins/test_plugin_transformer.py b/tests/unit/plugins/test_plugin_transformer.py index ce233c8..55f2101 100644 --- a/tests/unit/plugins/test_plugin_transformer.py +++ b/tests/unit/plugins/test_plugin_transformer.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + from iocx.engine import Engine, EngineConfig def test_plugin_transformer_runs_first(tmp_path, monkeypatch): diff --git a/tests/unit/utils/test_utils.py b/tests/unit/utils/test_utils.py index 29b28b1..363652f 100644 --- a/tests/unit/utils/test_utils.py +++ b/tests/unit/utils/test_utils.py @@ -1,3 +1,6 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + import pytest from iocx.utils import detect_file_type, FileType diff --git a/tests/unit/validators/test_dependencies.py b/tests/unit/validators/test_dependencies.py new file mode 100644 index 0000000..7d6334f --- /dev/null +++ b/tests/unit/validators/test_dependencies.py @@ -0,0 +1,16 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +from iocx.validators import STRUCTURAL_VALIDATORS + +def test_all_structural_validators_declare_dependencies(): + missing = [] + + for name, fn in STRUCTURAL_VALIDATORS.items(): + if not hasattr(fn, "_depends_on"): + missing.append(name) + + assert not missing, ( + "All structural validators must declare dependencies via @depends_on. " + f"Missing: {', '.join(missing)}" + ) diff --git a/tests/unit/validators/test_validator_contracts.py b/tests/unit/validators/test_validator_contracts.py new file mode 100644 index 0000000..7016b61 --- /dev/null +++ b/tests/unit/validators/test_validator_contracts.py @@ -0,0 +1,109 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +import inspect +from iocx.validators import STRUCTURAL_VALIDATORS +from iocx.validators.schema import StructuralIssue + +def test_all_validators_have_depends_on(): + missing = [ + name for name, fn in STRUCTURAL_VALIDATORS.items() + if not hasattr(fn, "_depends_on") + ] + + assert not missing, ( + "All validators must declare dependencies via @depends_on. " + f"Missing: {', '.join(missing)}" + ) + +def test_validator_dependencies_match_signature(): + errors = [] + + for name, fn in STRUCTURAL_VALIDATORS.items(): + deps = getattr(fn, "_depends_on", ("metadata", "analysis")) + sig = inspect.signature(fn) + params = list(sig.parameters) + + if len(deps) != len(params): + errors.append((name, deps, params)) + + assert not errors, ( + "Validator dependency declarations must match function signatures. " + f"Errors: {errors}" + ) + +def test_dispatcher_argument_order(): + from iocx.validators import run_structural_validators + + class Marker: + pass + + internal = Marker() + metadata = Marker() + analysis = Marker() + + calls = {} + + # Monkeypatch validators to capture calls + patched = {} + for name, fn in STRUCTURAL_VALIDATORS.items(): + def make_wrapper(fn, name): + def wrapper(*args): + calls[name] = args + return [] + wrapper._depends_on = getattr(fn, "_depends_on", ("metadata", "analysis")) + return wrapper + + patched[name] = make_wrapper(fn, name) + + # Replace validators temporarily + original = STRUCTURAL_VALIDATORS.copy() + STRUCTURAL_VALIDATORS.clear() + STRUCTURAL_VALIDATORS.update(patched) + + try: + run_structural_validators(internal, metadata, analysis) + finally: + # Restore original validators + STRUCTURAL_VALIDATORS.clear() + STRUCTURAL_VALIDATORS.update(original) + + # Validate argument order + for name, args in calls.items(): + deps = patched[name]._depends_on + expected = [] + if "internal" in deps: + expected.append(internal) + if "metadata" in deps: + expected.append(metadata) + if "analysis" in deps: + expected.append(analysis) + + assert list(args) == expected, ( + f"Dispatcher passed incorrect args to {name}: " + f"expected {expected}, got {args}" + ) + +def test_validator_return_types(): + internal = {} + metadata = {} + analysis = {} + + for name, fn in STRUCTURAL_VALIDATORS.items(): + deps = getattr(fn, "_depends_on", ("metadata", "analysis")) + + args = [] + if "internal" in deps: + args.append(internal) + if "metadata" in deps: + args.append(metadata) + if "analysis" in deps: + args.append(analysis) + + result = fn(*args) + + assert isinstance(result, list), f"{name} must return a list" + for item in result: + assert isinstance(item, StructuralIssue), ( + f"{name} returned non‑StructuralIssue: {item}" + ) diff --git a/tests/unit/validators/test_validator_entropy.py b/tests/unit/validators/test_validator_entropy.py new file mode 100644 index 0000000..4b69486 --- /dev/null +++ b/tests/unit/validators/test_validator_entropy.py @@ -0,0 +1,162 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +import pytest +from iocx.validators.entropy import validate_entropy +from iocx.reason_codes import ReasonCodes + + +def make_issue_list(result): + return [i["issue"] for i in result] + + +# --------------------------------------------------------- +# 1) Continue branch (invalid entropy or raw_size) +# --------------------------------------------------------- + +def test_entropy_continue_branch(): + analysis = { + "sections": [ + {"name": ".text", "entropy": "bad", "raw_size": 2000}, # invalid entropy → continue + {"name": ".data", "entropy": 5.0, "raw_size": "bad"}, # invalid raw_size → continue + ] + } + issues = validate_entropy({}, analysis) + assert issues == [] + + +# --------------------------------------------------------- +# 2) High entropy section +# --------------------------------------------------------- + +def test_entropy_high_section(): + analysis = { + "sections": [ + {"name": ".text", "entropy": 8.0, "raw_size": 2000}, + ] + } + issues = validate_entropy({}, analysis) + assert ReasonCodes.ENTROPY_HIGH_SECTION in make_issue_list(issues) + + +# --------------------------------------------------------- +# 3) Very low entropy section +# --------------------------------------------------------- + +def test_entropy_very_low_section(): + analysis = { + "sections": [ + {"name": ".data", "entropy": 0.1, "raw_size": 20000}, # >= 16 KB + ] + } + issues = validate_entropy({}, analysis) + assert ReasonCodes.ENTROPY_VERY_LOW_SECTION in make_issue_list(issues) + + +# --------------------------------------------------------- +# 4) Overlay high entropy +# --------------------------------------------------------- + +def test_entropy_high_overlay(): + analysis = { + "sections": [], + "overlay": {"entropy": 8.0, "size": 2000}, + } + issues = validate_entropy({}, analysis) + assert ReasonCodes.ENTROPY_HIGH_OVERLAY in make_issue_list(issues) + + +# --------------------------------------------------------- +# 5) Region-specific entropy (all regions) +# --------------------------------------------------------- + +@pytest.mark.parametrize("region,reason", [ + ("resources", ReasonCodes.ENTROPY_HIGH_RESOURCES), + ("relocations", ReasonCodes.ENTROPY_HIGH_RELOCATIONS), + ("imports", ReasonCodes.ENTROPY_HIGH_IMPORTS), + ("tls", ReasonCodes.ENTROPY_HIGH_TLS), + ("certificate", ReasonCodes.ENTROPY_HIGH_CERTIFICATE), +]) +def test_entropy_region_specific(region, reason): + analysis = { + "region_entropy": { + region: {"entropy": 8.0, "size": 2000} + } + } + issues = validate_entropy({}, analysis) + assert reason in make_issue_list(issues) + + +# --------------------------------------------------------- +# 6) Uniform entropy across sections +# --------------------------------------------------------- + +def test_entropy_uniform_across_sections(): + analysis = { + "sections": [ + {"name": ".text", "entropy": 7.6, "raw_size": 2000}, + {"name": ".data", "entropy": 7.61, "raw_size": 2000}, + {"name": ".rdata", "entropy": 7.59, "raw_size": 2000}, + ] + } + issues = validate_entropy({}, analysis) + assert ReasonCodes.ENTROPY_UNIFORM_ACROSS_SECTIONS in make_issue_list(issues) + + +# --------------------------------------------------------- +# 7) No issues (baseline) +# --------------------------------------------------------- + +def test_entropy_no_issues(): + analysis = { + "sections": [ + {"name": ".text", "entropy": 5.0, "raw_size": 2000}, + {"name": ".data", "entropy": 4.0, "raw_size": 2000}, + ] + } + issues = validate_entropy({}, analysis) + assert issues == [] + + +# --------------------------------------------------------- +# 8) Mixed: high + low + overlay + region + uniform +# --------------------------------------------------------- + +def test_entropy_mixed_all_paths(): + analysis = { + "sections": [ + {"name": ".text", "entropy": 8.0, "raw_size": 2000}, # high + {"name": ".data", "entropy": 0.1, "raw_size": 20000}, # very low + {"name": ".rdata", "entropy": 7.6, "raw_size": 2000}, # normal + ], + "overlay": {"entropy": 8.0, "size": 2000}, + "region_entropy": { + "resources": {"entropy": 8.0, "size": 2000}, + } + } + + issues = validate_entropy({}, analysis) + codes = make_issue_list(issues) + + assert ReasonCodes.ENTROPY_HIGH_SECTION in codes + assert ReasonCodes.ENTROPY_VERY_LOW_SECTION in codes + assert ReasonCodes.ENTROPY_HIGH_OVERLAY in codes + assert ReasonCodes.ENTROPY_HIGH_RESOURCES in codes + + # Uniform entropy SHOULD NOT appear here + assert ReasonCodes.ENTROPY_UNIFORM_ACROSS_SECTIONS not in codes + + +def test_entropy_uniform_across_sections(): + analysis = { + "sections": [ + {"name": ".text", "entropy": 7.60, "raw_size": 2000}, + {"name": ".data", "entropy": 7.62, "raw_size": 2000}, + {"name": ".rdata", "entropy": 7.58, "raw_size": 2000}, + ] + } + + issues = validate_entropy({}, analysis) + codes = make_issue_list(issues) + + assert ReasonCodes.ENTROPY_UNIFORM_ACROSS_SECTIONS in codes diff --git a/tests/unit/validators/test_validator_entrypoint.py b/tests/unit/validators/test_validator_entrypoint.py new file mode 100644 index 0000000..aab9ccf --- /dev/null +++ b/tests/unit/validators/test_validator_entrypoint.py @@ -0,0 +1,267 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +import pytest +from iocx.validators.entrypoint import validate_entrypoint, _map_rva_to_file_offset +from iocx.reason_codes import ReasonCodes + + +def make_issue_list(result): + return [i["issue"] for i in result] + + +# --------------------------------------------------------- +# 1) No extended header → early return +# --------------------------------------------------------- + +def test_entrypoint_no_extended_header(): + metadata = {} + analysis = {"extended": []} + issues = validate_entrypoint(metadata, analysis) + assert issues == [] + + +# --------------------------------------------------------- +# 2) No entry_point → early return +# --------------------------------------------------------- + +def test_entrypoint_missing_entry_point(): + metadata = {} + analysis = {"extended": [{"value": "header", "metadata": {}}]} + issues = validate_entrypoint(metadata, analysis) + assert issues == [] + + +# --------------------------------------------------------- +# 3) EP <= 0 +# --------------------------------------------------------- + +def test_entrypoint_zero_or_negative(): + metadata = {"optional_header": {}} + analysis = { + "extended": [{"value": "header", "metadata": {"entry_point": 0}}], + "sections": [], + } + issues = validate_entrypoint(metadata, analysis) + assert ReasonCodes.ENTRYPOINT_ZERO_OR_NEGATIVE in make_issue_list(issues) + + +# --------------------------------------------------------- +# 4) EP inside headers +# --------------------------------------------------------- + +def test_entrypoint_in_headers(): + metadata = {"optional_header": {"size_of_headers": 300}} + analysis = { + "extended": [{"value": "header", "metadata": {"entry_point": 100}}], + "sections": [], + } + issues = validate_entrypoint(metadata, analysis) + assert ReasonCodes.ENTRYPOINT_IN_HEADERS in make_issue_list(issues) + + +# --------------------------------------------------------- +# 5) No sections → return after header checks +# --------------------------------------------------------- + +def test_entrypoint_no_sections(): + metadata = {"optional_header": {}} + analysis = { + "extended": [{"value": "header", "metadata": {"entry_point": 500}}], + "sections": [], + } + issues = validate_entrypoint(metadata, analysis) + assert issues == [] + + +# --------------------------------------------------------- +# 6) EP not mapping to any section +# --------------------------------------------------------- + +def test_entrypoint_out_of_bounds_within_image(): + metadata = {"optional_header": {"size_of_image": 2000}} + analysis = { + "extended": [{"value": "header", "metadata": {"entry_point": 1500}}], + "sections": [ + {"name": ".text", "virtual_address": 0, "virtual_size": 1000}, + ], + } + issues = validate_entrypoint(metadata, analysis) + assert ReasonCodes.ENTRYPOINT_OUT_OF_BOUNDS in make_issue_list(issues) + + +def test_entrypoint_out_of_bounds_beyond_image(): + metadata = {"optional_header": {"size_of_image": 1000}} + analysis = { + "extended": [{"value": "header", "metadata": {"entry_point": 2000}}], + "sections": [ + {"name": ".text", "virtual_address": 0, "virtual_size": 500}, + ], + } + issues = validate_entrypoint(metadata, analysis) + assert ReasonCodes.ENTRYPOINT_OUT_OF_BOUNDS in make_issue_list(issues) + + +# --------------------------------------------------------- +# 7) Section not executable +# --------------------------------------------------------- + +def test_entrypoint_section_not_executable(): + metadata = {"optional_header": {}} + analysis = { + "extended": [{"value": "header", "metadata": {"entry_point": 150}}], + "sections": [ + { + "name": ".text", + "virtual_address": 100, + "virtual_size": 100, + "characteristics": 0, # not executable + } + ], + } + issues = validate_entrypoint(metadata, analysis) + assert ReasonCodes.ENTRYPOINT_SECTION_NOT_EXECUTABLE in make_issue_list(issues) + + +# --------------------------------------------------------- +# 8) EP in non-code section +# --------------------------------------------------------- + +def test_entrypoint_in_non_code_section(): + metadata = {"optional_header": {}} + analysis = { + "extended": [{"value": "header", "metadata": {"entry_point": 150}}], + "sections": [ + { + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 100, + "characteristics": 0, # not code + } + ], + } + issues = validate_entrypoint(metadata, analysis) + assert ReasonCodes.ENTRYPOINT_IN_NON_CODE_SECTION in make_issue_list(issues) + + +# --------------------------------------------------------- +# 9) EP in discardable section +# --------------------------------------------------------- + +def test_entrypoint_in_discardable_section(): + metadata = {"optional_header": {}} + analysis = { + "extended": [{"value": "header", "metadata": {"entry_point": 150}}], + "sections": [ + { + "name": ".text", + "virtual_address": 100, + "virtual_size": 100, + "characteristics": 0x02000000 | 0x20000000, # discardable + exec + } + ], + } + issues = validate_entrypoint(metadata, analysis) + assert ReasonCodes.ENTRYPOINT_IN_DISCARDABLE_SECTION in make_issue_list(issues) + + +# --------------------------------------------------------- +# 10) Zero-length section +# --------------------------------------------------------- + +def test_entrypoint_zero_length_section(): + metadata = {"optional_header": {}} + analysis = { + "extended": [{"value": "header", "metadata": {"entry_point": 120}}], + "sections": [ + { + "name": ".text", + "virtual_address": 100, + "virtual_size": 0, # zero-length + "raw_address": 200, + "raw_size": 100, + "characteristics": 0x20000000, # executable + } + ], + } + + issues = validate_entrypoint(metadata, analysis) + assert ReasonCodes.ENTRYPOINT_IN_TRUNCATED_REGION in make_issue_list(issues) + + +# --------------------------------------------------------- +# 11) EP beyond virtual size +# --------------------------------------------------------- + +def test_entrypoint_beyond_virtual_size(): + metadata = {"optional_header": {}} + analysis = { + "extended": [{"value": "header", "metadata": {"entry_point": 180}}], + "sections": [ + { + "name": ".text", + "virtual_address": 100, + "virtual_size": 50, # ends at 150 + "raw_address": 200, + "raw_size": 200, + "characteristics": 0x20000000, + } + ], + } + + issues = validate_entrypoint(metadata, analysis) + assert ReasonCodes.ENTRYPOINT_IN_TRUNCATED_REGION in make_issue_list(issues) + + +# --------------------------------------------------------- +# 12) EP in overlay +# --------------------------------------------------------- + +def test_entrypoint_in_overlay(): + metadata = {"optional_header": {}} + analysis = { + "overlay_offset": 450, + "extended": [{"value": "header", "metadata": {"entry_point": 200}}], + "sections": [ + { + "name": ".text", + "virtual_address": 100, + "virtual_size": 200, + "raw_address": 400, + "raw_size": 300, + "characteristics": 0x20000000, + } + ], + } + + issues = validate_entrypoint(metadata, analysis) + assert ReasonCodes.ENTRYPOINT_IN_OVERLAY in make_issue_list(issues) + + +def test_map_rva_to_file_offset_continue_branch(): + sections = [ + { + "virtual_address": 100, + "virtual_size": 50, + # raw_address missing → triggers continue + "raw_size": 100, + } + ] + + result = _map_rva_to_file_offset(sections, 120) + assert result is None + + +def test_map_rva_to_file_offset_return_none(): + sections = [ + { + "virtual_address": 100, + "virtual_size": 50, + "raw_address": 200, + "raw_size": 50, + } + ] + + # EP outside VA range → no match → return None + result = _map_rva_to_file_offset(sections, 999) + assert result is None diff --git a/tests/unit/validators/test_validator_optional_header.py b/tests/unit/validators/test_validator_optional_header.py new file mode 100644 index 0000000..bbff5b0 --- /dev/null +++ b/tests/unit/validators/test_validator_optional_header.py @@ -0,0 +1,207 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +import pytest +from iocx.validators.optional_header import validate_optional_header +from iocx.reason_codes import ReasonCodes + + +def make_issue_list(result): + return [i["issue"] for i in result] + + +# --------------------------------------------------------- +# 1) SizeOfImage < max section end +# --------------------------------------------------------- + +def test_optional_header_inconsistent_size_of_image(): + metadata = { + "optional_header": {"size_of_image": 200} + } + analysis = { + "sections": [ + {"virtual_address": 100, "virtual_size": 200}, # ends at 300 + ] + } + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_INCONSISTENT_SIZE in make_issue_list(issues) + + +# --------------------------------------------------------- +# 2) SizeOfHeaders misaligned to FileAlignment +# --------------------------------------------------------- + +def test_optional_header_invalid_size_of_headers_alignment(): + metadata = { + "optional_header": { + "size_of_headers": 300, + "file_alignment": 256, + } + } + analysis = {"sections": []} + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_INVALID_SIZE_OF_HEADERS in make_issue_list(issues) + + +# --------------------------------------------------------- +# 3) SizeOfHeaders < header_end +# --------------------------------------------------------- + +def test_optional_header_invalid_size_of_headers_header_end(): + metadata = { + "optional_header": { + "size_of_headers": 200, + "file_alignment": 200, + }, + "header_end": 300, + } + analysis = {"sections": []} + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_INVALID_SIZE_OF_HEADERS in make_issue_list(issues) + + +# --------------------------------------------------------- +# 4) SectionAlignment < FileAlignment +# --------------------------------------------------------- + +def test_optional_header_invalid_section_alignment_less_than_file_alignment(): + metadata = { + "optional_header": { + "section_alignment": 256, + "file_alignment": 512, + } + } + analysis = {"sections": []} + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_INVALID_SECTION_ALIGNMENT in make_issue_list(issues) + + +# --------------------------------------------------------- +# 5) SectionAlignment not power of two +# --------------------------------------------------------- + +def test_optional_header_invalid_section_alignment_not_power_of_two(): + metadata = { + "optional_header": { + "section_alignment": 300, # not power of two + "file_alignment": 256, + } + } + analysis = {"sections": []} + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_INVALID_SECTION_ALIGNMENT in make_issue_list(issues) + + +# --------------------------------------------------------- +# 6) FileAlignment not power of two +# --------------------------------------------------------- + +def test_optional_header_invalid_file_alignment_not_power_of_two(): + metadata = { + "optional_header": { + "file_alignment": 300, # not power of two + } + } + analysis = {"sections": []} + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_INVALID_FILE_ALIGNMENT in make_issue_list(issues) + + +# --------------------------------------------------------- +# 7) FileAlignment out of recommended range +# --------------------------------------------------------- + +def test_optional_header_invalid_file_alignment_out_of_range(): + metadata = { + "optional_header": { + "file_alignment": 128, # < 512 + } + } + analysis = {"sections": []} + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_INVALID_FILE_ALIGNMENT in make_issue_list(issues) + + +# --------------------------------------------------------- +# 8) SizeOfCode / Init / Uninit inconsistent +# --------------------------------------------------------- + +def test_optional_header_size_fields_inconsistent(): + metadata = { + "optional_header": { + "size_of_code": 10, + "size_of_initialized_data": 10, + "size_of_uninitialized_data": 10, + } + } + analysis = { + "sections": [ + {"characteristics": 0x20, "raw_size": 50, "virtual_size": 0}, # code + {"characteristics": 0x40, "raw_size": 50, "virtual_size": 0}, # init + {"characteristics": 0x80, "raw_size": 0, "virtual_size": 50}, # uninit + ] + } + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_SIZE_FIELDS_INCONSISTENT in make_issue_list(issues) + + +# --------------------------------------------------------- +# 9) ImageBase misaligned +# --------------------------------------------------------- + +def test_optional_header_image_base_misaligned(): + metadata = { + "optional_header": { + "image_base": 0x12345, # not 64K aligned + } + } + analysis = {"sections": []} + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_IMAGE_BASE_MISALIGNED in make_issue_list(issues) + + +# --------------------------------------------------------- +# 10) NumberOfRvaAndSizes < 0 or > 16 +# --------------------------------------------------------- + +def test_optional_header_invalid_number_of_rva_and_sizes_range(): + metadata = { + "optional_header": { + "number_of_rva_and_sizes": 20, # > 16 + } + } + analysis = {"sections": []} + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_INVALID_NUMBER_OF_RVA_AND_SIZES in make_issue_list(issues) + + +# --------------------------------------------------------- +# 11) NumberOfRvaAndSizes < actual directories +# --------------------------------------------------------- + +def test_optional_header_invalid_number_of_rva_and_sizes_too_small(): + metadata = { + "optional_header": { + "number_of_rva_and_sizes": 1, + "data_directories": [1, 2], # 2 dirs > 1 allowed + } + } + analysis = {"sections": []} + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_INVALID_NUMBER_OF_RVA_AND_SIZES in make_issue_list(issues) + + +# --------------------------------------------------------- +# 12) SizeOfImage misaligned to SectionAlignment +# --------------------------------------------------------- + +def test_optional_header_size_of_image_misaligned(): + metadata = { + "optional_header": { + "size_of_image": 3000, + "section_alignment": 4096, + } + } + analysis = {"sections": []} + issues = validate_optional_header(metadata, analysis) + assert ReasonCodes.OPTIONAL_HEADER_SIZE_OF_IMAGE_MISALIGNED in make_issue_list(issues) diff --git a/tests/unit/validators/test_validator_resources.py b/tests/unit/validators/test_validator_resources.py new file mode 100644 index 0000000..f346b2f --- /dev/null +++ b/tests/unit/validators/test_validator_resources.py @@ -0,0 +1,341 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +import pytest +from iocx.validators.resources import validate_resources +from iocx.reason_codes import ReasonCodes + + +def make_issue_list(result): + return [i["issue"] for i in result] + + +def test_resources_no_resources_struct(): + metadata = {"resources_struct": None} + analysis = {} + issues = validate_resources(metadata, analysis) + assert issues == [] + + +def test_resources_no_rsrc_section(): + metadata = {"resources_struct": {"root": {}}} + analysis = { + "sections": [{"name": ".text"}], + "file_size": 1000, + "overlay_offset": 500, + } + issues = validate_resources(metadata, analysis) + assert issues == [] + + +def test_resources_zero_length_directory(): + metadata = { + "resources_struct": { + "root": {"rva": 100, "size": 0, "entries": []} + } + } + analysis = { + "sections": [{ + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 100, + "raw_address": 200, + "raw_size": 100, + }], + "file_size": 1000, + "overlay_offset": 500, + } + issues = validate_resources(metadata, analysis) + assert ReasonCodes.RESOURCE_DIRECTORY_ZERO_LENGTH in make_issue_list(issues) + + +def test_resources_directory_loop(): + loop = {"rva": 100, "size": 10, "entries": []} + loop["entries"] = [{"is_directory": True, "directory": loop}] + + metadata = {"resources_struct": {"root": loop}} + analysis = { + "sections": [{ + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 200, + "raw_address": 200, + "raw_size": 200, + }], + "file_size": 1000, + "overlay_offset": 500, + } + + issues = validate_resources(metadata, analysis) + assert ReasonCodes.RESOURCE_DIRECTORY_LOOP in make_issue_list(issues) + + +def test_resources_entry_out_of_bounds(): + metadata = { + "resources_struct": { + "root": { + "rva": 100, "size": 10, + "entries": [ + {"is_directory": True, + "directory": {"rva": 9999, "size": 10, "entries": []}} + ] + } + } + } + analysis = { + "sections": [{ + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 200, + "raw_address": 200, + "raw_size": 200, + }], + "file_size": 1000, + "overlay_offset": 500, + } + + issues = validate_resources(metadata, analysis) + assert ReasonCodes.RESOURCE_ENTRY_OUT_OF_BOUNDS in make_issue_list(issues) + + +def test_resources_zero_size_data(): + metadata = { + "resources_struct": { + "root": { + "rva": 100, "size": 10, + "entries": [ + {"is_directory": False, + "data_rva": 120, "data_size": 0, "raw_offset": 300} + ] + } + } + } + analysis = { + "sections": [{ + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 200, + "raw_address": 200, + "raw_size": 200, + }], + "file_size": 1000, + "overlay_offset": 500, + } + + issues = validate_resources(metadata, analysis) + assert ReasonCodes.RESOURCE_DATA_OUT_OF_BOUNDS in make_issue_list(issues) + + +def test_resources_rva_out_of_bounds(): + metadata = { + "resources_struct": { + "root": { + "rva": 100, "size": 10, + "entries": [ + {"is_directory": False, + "data_rva": 9999, "data_size": 10, "raw_offset": 300} + ] + } + } + } + analysis = { + "sections": [{ + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 200, + "raw_address": 200, + "raw_size": 200, + }], + "file_size": 1000, + "overlay_offset": 500, + } + + issues = validate_resources(metadata, analysis) + assert ReasonCodes.RESOURCE_DATA_OUT_OF_BOUNDS in make_issue_list(issues) + + +def test_resources_raw_out_of_bounds(): + metadata = { + "resources_struct": { + "root": { + "rva": 100, "size": 10, + "entries": [ + {"is_directory": False, + "data_rva": 120, "data_size": 50, "raw_offset": 980} + ] + } + } + } + analysis = { + "sections": [{ + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 200, + "raw_address": 200, + "raw_size": 200, + }], + "file_size": 1000, + "overlay_offset": 500, + } + + issues = validate_resources(metadata, analysis) + assert ReasonCodes.RESOURCE_DATA_OUT_OF_BOUNDS in make_issue_list(issues) + + +def test_resources_overlay_overlap(): + metadata = { + "resources_struct": { + "root": { + "rva": 100, "size": 10, + "entries": [ + {"is_directory": False, + "data_rva": 120, "data_size": 100, "raw_offset": 450} + ] + } + } + } + analysis = { + "sections": [{ + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 300, + "raw_address": 200, + "raw_size": 300, + }], + "file_size": 1000, + "overlay_offset": 500, + } + + issues = validate_resources(metadata, analysis) + assert ReasonCodes.RESOURCE_DATA_OVERLAPS_OTHER_DATA in make_issue_list(issues) + + +def test_resources_raw_overlap_other_section(): + metadata = { + "resources_struct": { + "root": { + "rva": 100, "size": 10, + "entries": [ + {"is_directory": False, + "data_rva": 120, "data_size": 50, "raw_offset": 250} + ] + } + } + } + analysis = { + "sections": [ + { + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 300, + "raw_address": 200, + "raw_size": 300, + }, + { + "name": ".text", + "virtual_address": 1000, + "virtual_size": 100, + "raw_address": 240, + "raw_size": 20, + } + ], + "file_size": 1000, + "overlay_offset": 900, + } + + issues = validate_resources(metadata, analysis) + assert ReasonCodes.RESOURCE_DATA_OVERLAPS_OTHER_DATA in make_issue_list(issues) + + +def test_resources_va_overlap_other_section(): + metadata = { + "resources_struct": { + "root": { + "rva": 100, "size": 10, + "entries": [ + {"is_directory": False, + "data_rva": 150, "data_size": 50, "raw_offset": 250} + ] + } + } + } + analysis = { + "sections": [ + { + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 300, + "raw_address": 200, + "raw_size": 300, + }, + { + "name": ".text", + "virtual_address": 140, + "virtual_size": 100, + "raw_address": 500, + "raw_size": 100, + } + ], + "file_size": 1000, + "overlay_offset": 900, + } + + issues = validate_resources(metadata, analysis) + assert ReasonCodes.RESOURCE_DATA_OVERLAPS_OTHER_DATA in make_issue_list(issues) + + +def test_resources_string_table_corrupt(): + metadata = { + "resources_struct": { + "root": {"rva": 100, "size": 10, "entries": []}, + "string_tables": [ + {"rva": 9999, "size": 20} + ] + } + } + analysis = { + "sections": [{ + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 300, + "raw_address": 200, + "raw_size": 300, + }], + "file_size": 1000, + "overlay_offset": 500, + } + + issues = validate_resources(metadata, analysis) + assert ReasonCodes.RESOURCE_STRING_TABLE_CORRUPT in make_issue_list(issues) + + +def test_resources_directory_outside_rsrc_skips_validation(): + metadata = { + "resources_struct": { + "root": { + "rva": 9999, # OUTSIDE .rsrc VA range + "size": 10, + "entries": [] + } + } + } + + analysis = { + "sections": [ + { + "name": ".rsrc", + "virtual_address": 100, + "virtual_size": 200, # .rsrc covers VA 100–300 + "raw_address": 200, + "raw_size": 200, + } + ], + "file_size": 5000, + "overlay_offset": 4000, + } + + issues = validate_resources(metadata, analysis) + + # Because the directory is outside .rsrc, validate_directory() returns immediately + # → no issues should be produced + assert issues == [] diff --git a/tests/unit/validators/test_validator_rva_graph.py b/tests/unit/validators/test_validator_rva_graph.py new file mode 100644 index 0000000..0a0304c --- /dev/null +++ b/tests/unit/validators/test_validator_rva_graph.py @@ -0,0 +1,257 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +import pytest +from iocx.validators.rva_graph import validate_rva_graph +from iocx.reason_codes import ReasonCodes + + +def make_issue_list(result): + return [i["issue"] for i in result] + + +# --------------------------------------------------------- +# 1) size_of_image missing → early return +# --------------------------------------------------------- + +def test_rva_graph_missing_size_of_image(): + metadata = {"optional_header": {}} + analysis = {} + issues = validate_rva_graph(metadata, analysis) + assert issues == [] + + +# --------------------------------------------------------- +# 2) malformed directory entry → first continue +# --------------------------------------------------------- + +def test_rva_graph_malformed_directory_entry(): + metadata = {"optional_header": {"size_of_image": 1000}} + analysis = { + "data_directories": [ + {"rva": "bad", "size": 10}, # triggers continue + ] + } + issues = validate_rva_graph(metadata, analysis) + assert issues == [] + + +# --------------------------------------------------------- +# 3) negative rva/size +# --------------------------------------------------------- + +def test_rva_graph_negative_values(): + metadata = {"optional_header": {"size_of_image": 1000}} + analysis = { + "data_directories": [ + {"name": "dir", "rva": -1, "size": 10}, + ] + } + issues = validate_rva_graph(metadata, analysis) + assert ReasonCodes.DATA_DIRECTORY_INVALID_RANGE in make_issue_list(issues) + + +# --------------------------------------------------------- +# 4) empty directory (0,0) +# --------------------------------------------------------- + +def test_rva_graph_empty_directory(): + metadata = {"optional_header": {"size_of_image": 1000}} + analysis = { + "data_directories": [ + {"name": "dir", "rva": 0, "size": 0}, + ] + } + issues = validate_rva_graph(metadata, analysis) + assert issues == [] + + +def test_rva_graph_empty_directory_unexpected(monkeypatch): + # Patch REQUIRED_NONZERO_DIRS to force the branch + from iocx.validators import rva_graph + monkeypatch.setattr(rva_graph, "REQUIRED_NONZERO_DIRS", {"dir"}) + + metadata = {"optional_header": {"size_of_image": 1000}} + analysis = { + "data_directories": [ + {"name": "dir", "rva": 0, "size": 0}, + ] + } + + issues = rva_graph.validate_rva_graph(metadata, analysis) + + assert ReasonCodes.DATA_DIRECTORY_ZERO_SIZE_UNEXPECTED in [ + i["issue"] for i in issues + ] + + +# --------------------------------------------------------- +# 5) zero-RVA non-zero size +# --------------------------------------------------------- + +def test_rva_graph_zero_rva_nonzero_size(): + metadata = {"optional_header": {"size_of_image": 1000}} + analysis = { + "data_directories": [ + {"name": "dir", "rva": 0, "size": 50}, + ] + } + issues = validate_rva_graph(metadata, analysis) + assert ReasonCodes.DATA_DIRECTORY_ZERO_RVA_NONZERO_SIZE in make_issue_list(issues) + + +# --------------------------------------------------------- +# 6) directory in headers +# --------------------------------------------------------- + +def test_rva_graph_in_headers(): + metadata = {"optional_header": {"size_of_image": 1000, "size_of_headers": 300}} + analysis = { + "data_directories": [ + {"name": "dir", "rva": 100, "size": 50}, + ] + } + issues = validate_rva_graph(metadata, analysis) + assert ReasonCodes.DATA_DIRECTORY_IN_HEADERS in make_issue_list(issues) + + +# --------------------------------------------------------- +# 7) out-of-range directory +# --------------------------------------------------------- + +def test_rva_graph_out_of_range(): + metadata = {"optional_header": {"size_of_image": 200}} + analysis = { + "data_directories": [ + {"name": "dir", "rva": 150, "size": 100}, + ] + } + issues = validate_rva_graph(metadata, analysis) + assert ReasonCodes.DATA_DIRECTORY_OUT_OF_RANGE in make_issue_list(issues) + + +# --------------------------------------------------------- +# 8) overlay detection +# --------------------------------------------------------- + +def test_rva_graph_overlay_detection(): + metadata = {"optional_header": {"size_of_image": 2000}} + analysis = { + "overlay_offset": 300, + "sections": [ + { + "name": ".text", + "virtual_address": 100, + "virtual_size": 500, + "raw_address": 200, + } + ], + "data_directories": [ + {"name": "dir", "rva": 250, "size": 10}, + ], + } + issues = validate_rva_graph(metadata, analysis) + assert ReasonCodes.DATA_DIRECTORY_IN_OVERLAY in make_issue_list(issues) + + +# --------------------------------------------------------- +# 9) zero-length section skip +# --------------------------------------------------------- + +def test_rva_graph_zero_length_section_skip(): + metadata = {"optional_header": {"size_of_image": 2000}} + analysis = { + "sections": [ + { + "name": ".empty", + "virtual_address": 1000, + "virtual_size": 0, + "raw_address": 500, + } + ], + "data_directories": [ + {"name": "dir", "rva": 1000, "size": 10}, # lands exactly on zero-length section + ], + } + issues = validate_rva_graph(metadata, analysis) + assert issues == [] + + +# --------------------------------------------------------- +# 10) not mapped to any section +# --------------------------------------------------------- + +def test_rva_graph_not_mapped_to_section(): + metadata = {"optional_header": {"size_of_image": 2000}} + analysis = { + "sections": [ + { + "name": ".text", + "virtual_address": 100, + "virtual_size": 100, + } + ], + "data_directories": [ + {"name": "dir", "rva": 500, "size": 10}, # outside section + ], + } + issues = validate_rva_graph(metadata, analysis) + assert ReasonCodes.DATA_DIRECTORY_NOT_MAPPED_TO_SECTION in make_issue_list(issues) + + +# --------------------------------------------------------- +# 11) spans multiple sections +# --------------------------------------------------------- + +def test_rva_graph_spans_multiple_sections(): + metadata = {"optional_header": {"size_of_image": 2000}} + analysis = { + "sections": [ + {"name": "A", "virtual_address": 100, "virtual_size": 100}, + {"name": "B", "virtual_address": 150, "virtual_size": 100}, + ], + "data_directories": [ + {"name": "dir", "rva": 120, "size": 100}, # overlaps A and B + ], + } + issues = validate_rva_graph(metadata, analysis) + assert ReasonCodes.DATA_DIRECTORY_SPANS_MULTIPLE_SECTIONS in make_issue_list(issues) + + +# --------------------------------------------------------- +# 12) directory overlap detection +# --------------------------------------------------------- + +def test_rva_graph_directory_overlap(): + metadata = {"optional_header": {"size_of_image": 2000}} + analysis = { + "data_directories": [ + {"name": "A", "rva": 100, "size": 100}, + {"name": "B", "rva": 150, "size": 100}, # overlaps A + ] + } + issues = validate_rva_graph(metadata, analysis) + assert ReasonCodes.DATA_DIRECTORY_OVERLAP in make_issue_list(issues) + + +def test_rva_graph_directory_overlap_inner_continue(): + metadata = {"optional_header": {"size_of_image": 2000}} + analysis = { + "data_directories": [ + { + "name": "A", + "rva": 100, + "size": 50, # valid → outer loop does NOT continue + }, + { + "name": "B", + "rva": "bad", # invalid → triggers inner continue + "size": 50, + }, + ] + } + + issues = validate_rva_graph(metadata, analysis) + + # No overlap issue should be produced because the inner loop continues + assert ReasonCodes.DATA_DIRECTORY_OVERLAP not in make_issue_list(issues) diff --git a/tests/unit/validators/test_validator_sections.py b/tests/unit/validators/test_validator_sections.py new file mode 100644 index 0000000..6a8c609 --- /dev/null +++ b/tests/unit/validators/test_validator_sections.py @@ -0,0 +1,430 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +import pytest +from iocx.validators.sections import validate_sections +from iocx.reason_codes import ReasonCodes + + +def make_issue_list(result): + return [i["issue"] for i in result] + + +# --------------------------------------------------------- +# 1) RWX section +# --------------------------------------------------------- + +def test_section_rwx(): + metadata = {} + analysis = { + "sections": [ + { + "name": ".text", + "characteristics": 0x20000000 | 0x80000000, # EXEC + WRITE + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_RWX in make_issue_list(issues) + + +# --------------------------------------------------------- +# 2) Code flag but not executable +# --------------------------------------------------------- + +def test_section_non_executable_code_like(): + metadata = {} + analysis = { + "sections": [ + { + "name": ".text", + "characteristics": 0x00000020, # CODE flag only + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_NON_EXECUTABLE_CODE_LIKE in make_issue_list(issues) + + +# --------------------------------------------------------- +# 3) Code-like name but not executable +# --------------------------------------------------------- + +def test_section_codelike_name_not_executable(): + metadata = {} + analysis = { + "sections": [ + { + "name": "text", + "characteristics": 0x0, # not executable + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_CODELIKE_NAME_NOT_EXECUTABLE in make_issue_list(issues) + + +# --------------------------------------------------------- +# 4) Non-ASCII name +# --------------------------------------------------------- + +def test_section_name_non_ascii(): + metadata = {} + analysis = { + "sections": [ + { + "name": "têxt", # non-ASCII + "characteristics": 0x20000000, + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_NAME_NON_ASCII in make_issue_list(issues) + + +def test_is_ascii_printable_typeerror_branch(): + class WeirdName: + def __iter__(self): + return iter([1, 2, 3]) # ord(1) → TypeError + + def strip(self): + return self # allow .strip() to succeed + + def lower(self): + return "not-code-like" + + metadata = {} + analysis = { + "sections": [ + { + "name": WeirdName(), + "characteristics": 0x40000000, # READ flag to avoid other issues + } + ] + } + + issues = validate_sections(metadata, analysis) + + # Because _is_ascii_printable() returned False via TypeError, + # we expect SECTION_NAME_NON_ASCII + assert ReasonCodes.SECTION_NAME_NON_ASCII in make_issue_list(issues) + +# --------------------------------------------------------- +# 5) Padding/empty name +# --------------------------------------------------------- + +def test_section_name_padding(): + metadata = {} + analysis = { + "sections": [ + { + "name": " ", + "characteristics": 0x20000000, + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_NAME_EMPTY_OR_PADDING in make_issue_list(issues) + + +# --------------------------------------------------------- +# 6) Impossible flag combinations (discardable + exec + write) +# --------------------------------------------------------- + +def test_section_impossible_flags(): + metadata = {} + analysis = { + "sections": [ + { + "name": ".x", + "characteristics": ( + 0x02000000 | # discardable + 0x20000000 | # exec + 0x80000000 # write + ), + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_IMPOSSIBLE_FLAGS in make_issue_list(issues) + + +# --------------------------------------------------------- +# 7) Raw misalignment +# --------------------------------------------------------- + +def test_section_raw_misaligned(): + metadata = {"optional_header": {"file_alignment": 512}} + analysis = { + "sections": [ + { + "name": ".data", + "characteristics": 0x20000000, + "raw_address": 123, # not aligned + "raw_size": 100, + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_RAW_MISALIGNED in make_issue_list(issues) + + +# --------------------------------------------------------- +# 8) Section overlaps headers +# --------------------------------------------------------- + +def test_section_overlaps_headers(): + metadata = {"optional_header": {"size_of_headers": 300}} + analysis = { + "sections": [ + { + "name": ".data", + "characteristics": 0x20000000, + "raw_address": 100, # inside headers + "raw_size": 100, + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_OVERLAPS_HEADERS in make_issue_list(issues) + + +# --------------------------------------------------------- +# 9) Zero-length section +# --------------------------------------------------------- + +def test_section_zero_length(): + metadata = {} + analysis = { + "sections": [ + { + "name": ".empty", + "characteristics": 0x20000000, + "virtual_address": 1000, + "virtual_size": 0, + "raw_address": 2000, + "raw_size": 0, + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_ZERO_LENGTH in make_issue_list(issues) + + +# --------------------------------------------------------- +# 10) Discardable + executable (even without write) +# --------------------------------------------------------- + +def test_section_discardable_code(): + metadata = {} + analysis = { + "sections": [ + { + "name": ".text", + "characteristics": 0x02000000 | 0x20000000, # discardable + exec + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_DISCARDABLE_CODE in make_issue_list(issues) + + +# --------------------------------------------------------- +# 11) Contradictory flags +# --------------------------------------------------------- + +def test_section_flags_inconsistent_code_without_read(): + metadata = {} + analysis = { + "sections": [ + { + "name": ".text", + "characteristics": 0x00000020, # CODE but no READ + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_FLAGS_INCONSISTENT in make_issue_list(issues) + + +def test_section_flags_inconsistent_write_without_read(): + metadata = {} + analysis = { + "sections": [ + { + "name": ".data", + "characteristics": 0x80000000, # WRITE but no READ + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_FLAGS_INCONSISTENT in make_issue_list(issues) + + +def test_section_flags_inconsistent_exec_without_read(): + metadata = {} + analysis = { + "sections": [ + { + "name": ".text", + "characteristics": 0x20000000, # EXEC but no READ + } + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_FLAGS_INCONSISTENT in make_issue_list(issues) + + +# --------------------------------------------------------- +# 12) Raw overlap detection +# --------------------------------------------------------- + +def test_section_raw_overlap(): + metadata = {} + analysis = { + "sections": [ + { + "name": "A", + "characteristics": 0x20000000, + "raw_address": 100, + "raw_size": 100, + }, + { + "name": "B", + "characteristics": 0x20000000, + "raw_address": 150, # overlaps A + "raw_size": 100, + }, + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_RAW_OVERLAP in make_issue_list(issues) + + +def test_section_raw_overlap_inner_continue(): + metadata = {} + analysis = { + "sections": [ + { + "name": "A", + "characteristics": 0x40000000, # READ + "raw_address": 100, + "raw_size": 50, + }, + { + "name": "B", + "characteristics": 0x40000000, + "raw_address": "not-an-int", # triggers inner continue + "raw_size": 50, + }, + ] + } + + issues = validate_sections(metadata, analysis) + + # No overlap issues should be produced because the inner loop continues + assert ReasonCodes.SECTION_RAW_OVERLAP not in make_issue_list(issues) + +# --------------------------------------------------------- +# 13) Virtual overlap detection +# --------------------------------------------------------- + +def test_section_virtual_overlap(): + metadata = {} + analysis = { + "sections": [ + { + "name": "A", + "characteristics": 0x20000000, + "virtual_address": 1000, + "virtual_size": 200, + }, + { + "name": "B", + "characteristics": 0x20000000, + "virtual_address": 1100, # overlaps A + "virtual_size": 200, + }, + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_OVERLAP in make_issue_list(issues) + + +def test_section_virtual_overlap_inner_continue(): + metadata = {} + analysis = { + "sections": [ + { + "name": "A", + "characteristics": 0x40000000, # READ + "virtual_address": 1000, + "virtual_size": 100, + }, + { + "name": "B", + "characteristics": 0x40000000, + "virtual_address": "not-an-int", # triggers inner continue + "virtual_size": 200, + }, + ] + } + + issues = validate_sections(metadata, analysis) + + # No virtual overlap issue should be produced because the inner loop continues + assert ReasonCodes.SECTION_OVERLAP not in make_issue_list(issues) + + +# --------------------------------------------------------- +# 14) Raw ordering +# --------------------------------------------------------- + +def test_section_out_of_order_raw(): + metadata = {} + analysis = { + "sections": [ + {"name": "A", "characteristics": 0x20000000, "raw_address": 300}, + {"name": "B", "characteristics": 0x20000000, "raw_address": 100}, + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_OUT_OF_ORDER_RAW in make_issue_list(issues) + + +# --------------------------------------------------------- +# 15) Virtual ordering +# --------------------------------------------------------- + +def test_section_out_of_order_virtual(): + metadata = {} + analysis = { + "sections": [ + {"name": "A", "characteristics": 0x20000000, "virtual_address": 300}, + {"name": "B", "characteristics": 0x20000000, "virtual_address": 100}, + ] + } + issues = validate_sections(metadata, analysis) + assert ReasonCodes.SECTION_OUT_OF_ORDER_VIRTUAL in make_issue_list(issues) + + +# --------------------------------------------------------- +# 16) Clean case +# --------------------------------------------------------- + +def test_section_valid_no_issues(): + metadata = {"optional_header": {"file_alignment": 512, "size_of_headers": 100}} + analysis = { + "sections": [ + { + "name": ".text", + "characteristics": 0x20000000 | 0x40000000, # exec + read + "raw_address": 512, + "raw_size": 100, + "virtual_address": 0x1000, + "virtual_size": 100, + } + ] + } + issues = validate_sections(metadata, analysis) + assert issues == [] diff --git a/tests/unit/validators/test_validator_signatures.py b/tests/unit/validators/test_validator_signatures.py new file mode 100644 index 0000000..4a83154 --- /dev/null +++ b/tests/unit/validators/test_validator_signatures.py @@ -0,0 +1,173 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +import pytest +from iocx.validators.signature import validate_signature +from iocx.reason_codes import ReasonCodes +from iocx.validators.schema import StructuralIssue + + +def make_issue_list(result): + return [i["issue"] for i in result] + + +# --------------------------------------------------------- +# 1) Flag/metadata symmetry +# --------------------------------------------------------- + +def test_flag_set_but_no_metadata(): + metadata = {"has_signature": True, "signatures": []} + analysis = {} + issues = validate_signature(metadata, analysis) + assert make_issue_list(issues) == [ + ReasonCodes.SIGNATURE_FLAG_SET_BUT_NO_METADATA + ] + + +def test_signature_present_but_flag_not_set(): + metadata = {"has_signature": False, "signatures": [{"file_offset": 0, "length": 16}]} + analysis = {} + issues = validate_signature(metadata, analysis) + assert ReasonCodes.SIGNATURE_PRESENT_BUT_FLAG_NOT_SET in make_issue_list(issues) + + +def test_no_sigs_and_flag_false_returns_clean(): + metadata = {"has_signature": False, "signatures": []} + analysis = {} + issues = validate_signature(metadata, analysis) + assert issues == [] + + +# --------------------------------------------------------- +# 2) Multiplicity +# --------------------------------------------------------- + +def test_multiple_signatures_detected(): + metadata = { + "has_signature": True, + "signatures": [ + {"file_offset": 0, "length": 16}, + {"file_offset": 100, "length": 16}, + ], + } + analysis = {} + issues = validate_signature(metadata, analysis) + assert ReasonCodes.SIGNATURE_MULTIPLE_CERTIFICATES in make_issue_list(issues) + + +# --------------------------------------------------------- +# 3) Certificate sanity checks +# --------------------------------------------------------- + +def test_invalid_length(): + metadata = { + "has_signature": True, + "signatures": [{"file_offset": 0, "length": 4}], + } + analysis = {} + issues = validate_signature(metadata, analysis) + assert ReasonCodes.SIGNATURE_INVALID_LENGTH in make_issue_list(issues) + + +def test_invalid_revision(): + metadata = { + "has_signature": True, + "signatures": [{"file_offset": 0, "length": 16, "revision": 0x9999}], + } + analysis = {} + issues = validate_signature(metadata, analysis) + assert ReasonCodes.SIGNATURE_INVALID_REVISION in make_issue_list(issues) + + +def test_invalid_type(): + metadata = { + "has_signature": True, + "signatures": [{"file_offset": 0, "length": 16, "certificate_type": 0x9999}], + } + analysis = {} + issues = validate_signature(metadata, analysis) + assert ReasonCodes.SIGNATURE_INVALID_TYPE in make_issue_list(issues) + + +# --------------------------------------------------------- +# 4) Bounds checks +# --------------------------------------------------------- + +def test_signature_out_of_bounds(): + metadata = { + "has_signature": True, + "signatures": [{"file_offset": 900, "length": 200}], + } + analysis = {"file_size": 1000} + issues = validate_signature(metadata, analysis) + assert ReasonCodes.SIGNATURE_OUT_OF_FILE_BOUNDS in make_issue_list(issues) + + +def test_signature_overlaps_overlay(): + metadata = { + "has_signature": True, + "signatures": [{"file_offset": 100, "length": 200}], + } + analysis = {"overlay_offset": 150} + issues = validate_signature(metadata, analysis) + assert ReasonCodes.SIGNATURE_OVERLAPS_OTHER_DATA in make_issue_list(issues) + + +def test_signature_overlaps_section(): + metadata = { + "has_signature": True, + "signatures": [{"file_offset": 100, "length": 200}], + } + analysis = { + "sections": [ + {"name": ".text", "raw_address": 150, "raw_size": 50} + ] + } + issues = validate_signature(metadata, analysis) + assert ReasonCodes.SIGNATURE_OVERLAPS_OTHER_DATA in make_issue_list(issues) + + +# --------------------------------------------------------- +# 5) Clean case +# --------------------------------------------------------- + +def test_valid_signature_no_issues(): + metadata = { + "has_signature": True, + "signatures": [{ + "file_offset": 100, + "length": 64, + "revision": 0x0200, + "certificate_type": 0x0001, + }], + } + analysis = { + "file_size": 1000, + "overlay_offset": 2000, + "sections": [], + } + issues = validate_signature(metadata, analysis) + assert issues == [] + +# --------------------------------------------------------- +# 6) Malformed case +# --------------------------------------------------------- + +def test_malformed_signature_metadata_skips_entry(): + metadata = { + "has_signature": True, + "signatures": [ + {"file_offset": "not-an-int", "length": 16}, # triggers continue + ], + } + + analysis = { + "file_size": 500, + "sections": [], + "overlay_offset": None, + } + + issues = validate_signature(metadata, analysis) + + # The malformed entry should be skipped entirely — no issues from it. + assert issues == [] diff --git a/tests/unit/validators/test_validator_tls.py b/tests/unit/validators/test_validator_tls.py new file mode 100644 index 0000000..e12e2c8 --- /dev/null +++ b/tests/unit/validators/test_validator_tls.py @@ -0,0 +1,266 @@ +# Copyright (c) 2026 MalX Labs and contributors +# SPDX-License-Identifier: MPL-2.0 + +import pytest +from iocx.validators.tls import validate_tls +from iocx.reason_codes import ReasonCodes + + +def make_issue_list(result): + return [i["issue"] for i in result] + + +# --------------------------------------------------------- +# 1) No TLS entries +# --------------------------------------------------------- + +def test_no_tls_entries_returns_clean(): + metadata = {} + analysis = {"extended": []} + issues = validate_tls(metadata, analysis) + assert issues == [] + + +# --------------------------------------------------------- +# 2) Multiple TLS directories +# --------------------------------------------------------- + +def test_multiple_tls_directories(): + metadata = {} + analysis = { + "extended": [ + {"value": "tls_directory", "metadata": {}}, + {"value": "tls_directory", "metadata": {}}, + ] + } + issues = validate_tls(metadata, analysis) + assert ReasonCodes.TLS_MULTIPLE_DIRECTORIES in make_issue_list(issues) + + +# --------------------------------------------------------- +# 3) Malformed TLS metadata (early return) +# --------------------------------------------------------- + +def test_malformed_tls_metadata_skips_validation(): + metadata = {} + analysis = { + "extended": [ + {"value": "tls_directory", "metadata": { + "start_address": "bad", + "end_address": 200, + "callbacks": 150, + }} + ] + } + issues = validate_tls(metadata, analysis) + assert issues == [] + + +# --------------------------------------------------------- +# 4) Invalid range (start >= end) +# --------------------------------------------------------- + +def test_tls_invalid_range(): + metadata = {} + analysis = { + "extended": [ + {"value": "tls_directory", "metadata": { + "start_address": 300, + "end_address": 200, + "callbacks": 250, + }} + ] + } + issues = validate_tls(metadata, analysis) + assert ReasonCodes.TLS_INVALID_RANGE in make_issue_list(issues) + + +def test_tls_zero_length_directory(): + metadata = {} + analysis = { + "extended": [ + {"value": "tls_directory", "metadata": { + "start_address": 200, + "end_address": 200, + "callbacks": 200, + }} + ] + } + issues = validate_tls(metadata, analysis) + assert ReasonCodes.TLS_ZERO_LENGTH_DIRECTORY in make_issue_list(issues) + + +# --------------------------------------------------------- +# 5) Missing callbacks +# --------------------------------------------------------- + +def test_tls_callbacks_missing(): + metadata = {} + analysis = { + "extended": [ + {"value": "tls_directory", "metadata": { + "start_address": 100, + "end_address": 200, + "callbacks": 0, + }} + ] + } + issues = validate_tls(metadata, analysis) + assert ReasonCodes.TLS_CALLBACKS_MISSING in make_issue_list(issues) + + +# --------------------------------------------------------- +# 6) Callback outside TLS range +# --------------------------------------------------------- + +def test_tls_callback_outside_range(): + metadata = {} + analysis = { + "extended": [ + {"value": "tls_directory", "metadata": { + "start_address": 100, + "end_address": 200, + "callbacks": 500, + }} + ] + } + issues = validate_tls(metadata, analysis) + assert ReasonCodes.TLS_CALLBACK_OUTSIDE_RANGE in make_issue_list(issues) + + +# --------------------------------------------------------- +# 7) Callback not mapped to any section +# --------------------------------------------------------- + +def test_tls_callback_not_mapped_to_section(): + metadata = {} + analysis = { + "extended": [ + {"value": "tls_directory", "metadata": { + "start_address": 100, + "end_address": 200, + "callbacks": 150, + }} + ], + "sections": [], # no mapping possible + } + issues = validate_tls(metadata, analysis) + assert ReasonCodes.TLS_CALLBACK_NOT_MAPPED_TO_SECTION in make_issue_list(issues) + + +# --------------------------------------------------------- +# 8) Callback mapped to non-executable section +# --------------------------------------------------------- + +def test_tls_callback_in_non_executable_section(): + metadata = {} + analysis = { + "extended": [ + {"value": "tls_directory", "metadata": { + "start_address": 100, + "end_address": 200, + "callbacks": 150, + }} + ], + "sections": [ + { + "name": ".data", + "virtual_address": 100, + "virtual_size": 100, + "characteristics": 0x0, # NOT executable + } + ], + } + issues = validate_tls(metadata, analysis) + assert ReasonCodes.TLS_CALLBACK_IN_NON_EXECUTABLE_SECTION in make_issue_list(issues) + + +# --------------------------------------------------------- +# 9) Callback inside headers +# --------------------------------------------------------- + +def test_tls_callback_in_headers(): + metadata = { + "optional_header": {"size_of_headers": 300} + } + analysis = { + "extended": [ + {"value": "tls_directory", "metadata": { + "start_address": 100, + "end_address": 400, + "callbacks": 150, + }} + ], + "sections": [ + { + "name": ".text", + "virtual_address": 100, + "virtual_size": 300, + "characteristics": 0x20000000, # executable + } + ], + } + issues = validate_tls(metadata, analysis) + assert ReasonCodes.TLS_CALLBACK_IN_HEADERS in make_issue_list(issues) + + +# --------------------------------------------------------- +# 10) Callback inside overlay +# --------------------------------------------------------- + +def test_tls_callback_in_overlay(): + metadata = {} + analysis = { + "extended": [ + {"value": "tls_directory", "metadata": { + "start_address": 100, + "end_address": 400, + "callbacks": 150, + }} + ], + "overlay_offset": 120, # overlay starts inside section + "sections": [ + { + "name": ".text", + "virtual_address": 100, + "virtual_size": 300, + "raw_address": 100, + "raw_size": 300, + "characteristics": 0x20000000, + } + ], + } + issues = validate_tls(metadata, analysis) + assert ReasonCodes.TLS_CALLBACK_IN_OVERLAY in make_issue_list(issues) + + +# --------------------------------------------------------- +# 11) Clean case +# --------------------------------------------------------- + +def test_tls_valid_no_issues(): + metadata = { + "optional_header": {"size_of_headers": 50} + } + analysis = { + "extended": [ + {"value": "tls_directory", "metadata": { + "start_address": 100, + "end_address": 400, + "callbacks": 150, + }} + ], + "sections": [ + { + "name": ".text", + "virtual_address": 100, + "virtual_size": 300, + "raw_address": 100, + "raw_size": 300, + "characteristics": 0x20000000, # executable + } + ], + "overlay_offset": 999999, + } + issues = validate_tls(metadata, analysis) + assert issues == []