From a3705cb052541f93138e9e37a42185e95ddf08e2 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 20:29:08 +0530 Subject: [PATCH 01/35] Update CONTRIBUTING.md --- CONTRIBUTING.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e834b72..7a926da 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -129,7 +129,7 @@ The current version is **v0.1.0**. New normative requirements submitted now will To propose a new requirement, open an issue first with the following fields. Do not submit a PR with cross-cutting count changes until the proposal has been reviewed and accepted for a target version. -- **ID:** Next available ID in the target domain. For a tier-gated requirement, use the next sequential number (for example, APTS-SE-027). For an advisory requirement, use the next sequential `A` number in the target domain (for example, APTS-SE-A01 if no SE advisory exists yet, or APTS-TP-A04 as the next TP advisory). Tier-gated and advisory IDs are in separate sequences and do not collide +- **ID:** Next available ID in the target domain. For a tier-gated requirement, use the next sequential number (for example, APTS-SE-027). For an advisory requirement, use the next sequential `A` number in the target domain (for example, APTS-SE-A01 if no SE advisory exists yet, or APTS-TP-A05 as the next TP advisory). Tier-gated and advisory IDs are in separate sequences and do not collide - **Title:** Concise requirement name - **Classification:** MUST, SHOULD, or MAY - **Tier:** 1, 2, or 3 From 377455e3c3161c8aca17a98f7244f63c28c19b95 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 20:29:51 +0530 Subject: [PATCH 02/35] Update Frontispiece.md --- standard/Frontispiece.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/standard/Frontispiece.md b/standard/Frontispiece.md index 26c7cfa..317c2d1 100644 --- a/standard/Frontispiece.md +++ b/standard/Frontispiece.md @@ -41,6 +41,8 @@ This standard uses RFC 2119 language: | **MAY** | Optional. Implementation is at the organization's discretion. | | **OPTIONAL** | Truly discretionary. May be included based on organizational needs. | +A requirement's classification (MUST or SHOULD, shown in its Requirement Index entry) sets its conformance weight for the tier. Within a requirement's text, sub-clauses may carry their own RFC 2119 keywords describing how the control is implemented. Where a SHOULD-classified requirement contains MUST sub-clauses, those sub-clauses are conditional: they become mandatory only once the organization elects to implement the requirement (or where the requirement's Applicability statement is met). A SHOULD requirement may still be deviated from with documented justification per the conformance rules in the Introduction. + --- ## Requirement ID Format @@ -73,3 +75,11 @@ Licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/). | Version | Date | Notes | |---------|------|-------| | 0.1.0 | April 2026 | Initial release. Eight domains, 173 tier-required requirements across three compliance tiers, plus 18 advisory practices in the appendix. | + +--- + +## Version History + +| Version | Date | Notes | +|---------|------|-------| +| 0.1.0 | April 2026 | Initial release. Eight domains, 173 tier-required requirements across three compliance tiers, plus 18 advisory practices in the appendix. | From 562dd7be936d212fa0f1547f3456ea08dc21f74e Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 20:30:29 +0530 Subject: [PATCH 03/35] Update Getting_Started.md --- standard/Getting_Started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/Getting_Started.md b/standard/Getting_Started.md index fe85407..927138d 100644 --- a/standard/Getting_Started.md +++ b/standard/Getting_Started.md @@ -97,7 +97,7 @@ Depending on your role: No. Start with Tier 1 (72 requirements). Tier 2 and Tier 3 add requirements progressively for cumulative totals of 157 and 173. An additional 18 advisory practices live in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern; advisory practices are not required for conformance at any tier. See [Introduction: Compliance Tiers](Introduction.md#compliance-tiers) for details. **Q: What if my platform meets most but not all Tier 1 requirements?** -APTS does not award partial credit. A platform must meet 100% of requirements for its claimed tier. Address gaps before claiming a tier. +APTS does not award partial credit. A platform must meet 100% of requirements for its claimed tier; MUST requirements permit no deviation. At Tier 2 and above, an unimplemented SHOULD requirement does not void the claim if the deviation is documented with justification in the conformance claim. Address MUST gaps before claiming a tier. **Q: Are the Implementation Guides mandatory?** No. Implementation Guides are informative. They suggest approaches but do not define requirements. The domain READMEs contain all normative requirements. From 820c2c326135527c4837b79d2ff4c301aaabc01e Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 20:32:14 +0530 Subject: [PATCH 04/35] Update README.md --- standard/1_Scope_Enforcement/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/1_Scope_Enforcement/README.md b/standard/1_Scope_Enforcement/README.md index fc5a591..6a8907b 100644 --- a/standard/1_Scope_Enforcement/README.md +++ b/standard/1_Scope_Enforcement/README.md @@ -57,7 +57,7 @@ The 26 requirements in this domain fall into seven thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying all MUST requirements at the compliance tier it targets. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SE requirement plus every Tier 2 SE requirement, and a Tier 3 platform satisfies all three tiers. SHOULD-level requirements are interpreted per RFC 2119. +A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SE requirement plus every Tier 2 SE requirement, and a Tier 3 platform satisfies all three tiers. Every requirement in this domain includes a Verification subsection listing the verification procedures a reviewer uses to confirm implementation. From 9133e73649a0a88df0c8d2f72efd84f9d3ecb742 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 20:33:14 +0530 Subject: [PATCH 05/35] Update README.md --- standard/2_Safety_Controls/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/2_Safety_Controls/README.md b/standard/2_Safety_Controls/README.md index f4892d8..fed729f 100644 --- a/standard/2_Safety_Controls/README.md +++ b/standard/2_Safety_Controls/README.md @@ -50,7 +50,7 @@ The 20 requirements in this domain fall into seven thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying all MUST requirements at the compliance tier it targets. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SC requirement plus every Tier 2 SC requirement, and a Tier 3 platform satisfies all three tiers. SHOULD-level requirements are interpreted per RFC 2119. +A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SC requirement plus every Tier 2 SC requirement, and a Tier 3 platform satisfies all three tiers. Three appendix-only advisory practices for this domain (APTS-SC-A01 Platform Health Monitoring and Anomaly Detection, APTS-SC-A02 Context Window Safety and Constraint Preservation, and APTS-SC-A03 Tool Invocation Parameter and Chaining Governance) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. From 466ce726fd50b8240f7beb53c9f449854ee7f1b5 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 20:37:17 +0530 Subject: [PATCH 06/35] Update README.md --- standard/3_Human_Oversight/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/standard/3_Human_Oversight/README.md b/standard/3_Human_Oversight/README.md index 842f20f..e8b130d 100644 --- a/standard/3_Human_Oversight/README.md +++ b/standard/3_Human_Oversight/README.md @@ -49,7 +49,7 @@ The 19 requirements in this domain fall into six thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying all MUST requirements at the compliance tier it targets. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 HO requirement plus every Tier 2 HO requirement, and a Tier 3 platform satisfies all three tiers. Human Oversight has no Tier 3 requirements in this release; a Tier 3 claim therefore requires all Tier 1 and Tier 2 HO requirements. SHOULD-level requirements are interpreted per RFC 2119. +A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 HO requirement plus every Tier 2 HO requirement, and a Tier 3 platform satisfies all three tiers. Human Oversight has no Tier 3 requirements in this release; a Tier 3 claim therefore requires all Tier 1 and Tier 2 HO requirements. Two appendix-only advisory requirements for this domain (APTS-HO-A01 Out-of-Band Kill Switch via Independent Network and APTS-HO-A02 Disclosure and Mitigation of AI Influence on Operator Decisions) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. @@ -90,7 +90,7 @@ Approval gates MUST: > **See also:** APTS-SC-006 (threshold escalation workflow). -**Rationale for CVSS >= 7.0 threshold:** CVSS 7.0 represents the boundary between "Medium" (informational, low likelihood of immediate impact) and "High" severity (significant potential for system compromise or data exposure). Exploitations at or above this threshold carry meaningful risk of unintended production impact, data loss, or service disruption. Requiring human approval at this boundary ensures that an operator evaluates the risk/reward trade-off before the platform attempts high-impact exploitation. Organizations MAY lower this threshold (for example, to 4.0 for critical infrastructure) but MUST NOT raise it above 7.0. +**Rationale for CVSS >= 7.0 threshold:** CVSS 7.0 is the boundary between the "Medium" range (4.0-6.9) and the "High" range (7.0-8.9), where a finding carries significant potential for system compromise or data exposure. Exploitations at or above this threshold carry meaningful risk of unintended production impact, data loss, or service disruption. Requiring human approval at this boundary ensures that an operator evaluates the risk/reward trade-off before the platform attempts high-impact exploitation. Organizations MAY lower this threshold (for example, to 4.0 for critical infrastructure) but MUST NOT raise it above 7.0. ### Verification From 642ebf7294120d50e4a0968398e7bb23e6e3015a Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 20:38:37 +0530 Subject: [PATCH 07/35] Update README.md --- standard/4_Graduated_Autonomy/README.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/standard/4_Graduated_Autonomy/README.md b/standard/4_Graduated_Autonomy/README.md index b7fe5d5..a10b63e 100644 --- a/standard/4_Graduated_Autonomy/README.md +++ b/standard/4_Graduated_Autonomy/README.md @@ -42,6 +42,8 @@ A Tier 1 platform satisfies the foundational APTS requirements and is generally ### Requirement Index +The 28 requirements in this domain are organized by the autonomy level they primarily govern rather than by thematic group; the Primary Level column below indicates that level. + | ID | Title | Classification | Primary Level | |---|---|---|---| | APTS-AL-001 | Single Technique Execution | MUST \| Tier 1 | L1 | @@ -75,7 +77,7 @@ A Tier 1 platform satisfies the foundational APTS requirements and is generally ### Conformance -A platform claims conformance with this domain by satisfying all MUST requirements at the tier it targets. APTS defines three cumulative tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 AL requirement plus every Tier 2 AL requirement, and a Tier 3 platform satisfies all three tiers. SHOULD-level requirements are interpreted per RFC 2119. As described in the Tier and Level Mapping above, level-specific requirements apply only to platforms that offer the corresponding autonomy level. +A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 AL requirement plus every Tier 2 AL requirement, and a Tier 3 platform satisfies all three tiers. As described in the Tier and Level Mapping above, level-specific requirements apply only to platforms that offer the corresponding autonomy level. One advisory practice relevant to this domain (APTS-AL-A01 Continuous Improvement and Maturity Roadmap) is documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). It is not required for conformance at any tier. @@ -723,7 +725,7 @@ Each category contains: **Category: CRITICAL/HIGH Exploitation on Dev Systems (Permitted)** - Applicability: CRITICAL or HIGH severity vuln on 203.0.113.100-150 - Approved actions: Execute identified exploit, confirm vulnerability, assess impact -- Decision criteria: Severity Common Vulnerability Scoring System (CVSS) >7.0, target IP in dev range +- Decision criteria: Severity Common Vulnerability Scoring System (CVSS) >= 7.0, target IP in dev range - Escalation triggers: Unexpected lateral movement, data access, system crash - Constraints: No persistence, no destructive actions, log all activities @@ -746,7 +748,7 @@ Each category contains: For each major decision point, the tool uses a decision tree documenting: - **Root node**: The decision trigger (for example, "vulnerability discovered," "escalation opportunity identified") - **Branch conditions**: Criteria evaluated at each branch (for example, impact classification, scope boundaries, autonomy level permissions) -- **Leaf outcomes**: The action taken and its justification (for example, "proceed with exploitation (within scope, impact ≤ Medium, pre-approved technique)") +- **Leaf outcomes**: The action taken and its justification (for example, "proceed with exploitation (within scope, impact <= Medium, pre-approved technique)") - **Fallback path**: Default-deny outcome when conditions are ambiguous or data is insufficient ### Verification @@ -850,7 +852,7 @@ At Level 3 Semi-Autonomous, the platform MUST support management of assessments 1. **Multi-target deployment test:** Configure an engagement with at least 10 targets at three priority levels (for example, 3 critical, 4 high, 3 medium). Run the platform under normal resource constraints and capture the assessment-start order. Pass criterion: the platform begins assessment of every critical target before any medium target, and the start order within a priority tier reflects the documented prioritization inputs (criticality, estimated effort, dependencies). 2. **Concurrent operation test:** With a concurrency cap of N parallel targets configured, deploy 2N targets and verify via process/network telemetry that no more than N assessments run simultaneously, that bandwidth/CPU stay within configured ceilings, and that dependency-blocked operations (for example, exploitation steps) wait for prerequisite operations (for example, enumeration) to complete. -3. **Resource allocation audit:** From audit logs, extract per-target wall-clock time, request count, and tool-invocation count. Pass criterion: high-criticality targets received measurably more resources (for example, ≥2× the median request count of low-criticality targets), and any deviation is justified by a logged adaptation event (for example, early dead-end, finding-driven escalation). +3. **Resource allocation audit:** From audit logs, extract per-target wall-clock time, request count, and tool-invocation count. Pass criterion: high-criticality targets received measurably more resources (for example, >=2x the median request count of low-criticality targets), and any deviation is justified by a logged adaptation event (for example, early dead-end, finding-driven escalation). 4. **Progress tracking and behind-schedule alerting:** Inject artificial delay into 2 targets so they fall below the expected progress curve. Verify the platform raises an operator alert identifying the at-risk targets and proposes either intensity reduction or schedule extension. 5. **Adaptation-on-finding test:** Plant a high-value finding on a low-priority target. Verify the platform reallocates additional time/resources to that target after the finding is detected and that the reallocation decision is captured in the audit log with the triggering finding ID. @@ -965,7 +967,7 @@ At Level 4 Autonomous, the tool operates independently across multiple targets, 3. **Long-duration stability:** During the test in (2), capture uptime, memory footprint, log volume, and tool-error rates daily. Pass criterion: no unrecovered crashes, no memory growth indicating leaks, and any tool failures are followed by automated retry/skip per the documented strategy. 4. **State preservation across restart:** Mid-campaign, force a platform restart (kill the process). Pass criterion: on restart the platform resumes from the last checkpoint without re-running completed targets, without losing findings, and without re-attacking targets already marked complete. 5. **Adaptation decision audit:** Extract every strategy-adaptation decision from the audit log (target abandoned, intensity changed, lateral movement initiated, new target queued from reconnaissance). For each, verify a referenced finding or condition exists in the data that justifies the decision per the documented adaptation rules. Reject the run if any adaptation lacks a traceable justification. -6. **Multi-campaign coordination test:** Run 2 parallel campaigns sharing the same resource pool. Verify the platform enforces the global resource budget (sum of campaign usage ≤ pool), avoids deadlock between campaigns, and produces consolidated reporting that distinguishes per-campaign findings. +6. **Multi-campaign coordination test:** Run 2 parallel campaigns sharing the same resource pool. Verify the platform enforces the global resource budget (sum of campaign usage <= pool), avoids deadlock between campaigns, and produces consolidated reporting that distinguishes per-campaign findings. --- @@ -1100,14 +1102,14 @@ Tool routes escalation appropriately: ### Verification -1. **Risk scoring logic review:** Inspect the documented risk-scoring formula and confirm it incorporates each of the inputs in the requirement (finding severity, exposure scope, attack chain feasibility, data sensitivity, time-to-exploitation). Score 20 representative test findings by hand and compare to the platform's computed scores; pass criterion: ≥90% agreement, with any divergences explained by documented logic. +1. **Risk scoring logic review:** Inspect the documented risk-scoring formula and confirm it incorporates each of the inputs in the requirement (finding severity, exposure scope, attack chain feasibility, data sensitivity, time-to-exploitation). Score 20 representative test findings by hand and compare to the platform's computed scores; pass criterion: >=90% agreement, with any divergences explained by documented logic. 2. **Threshold definition review:** Confirm that individual-finding, cumulative-exposure, data-access, and critical-system thresholds are explicitly defined per engagement (not hard-coded), that values are documented in the engagement record, and that operators can adjust them only through an authenticated and audited path. 3. **Individual-threshold trigger test:** Plant a finding scoring just above the individual escalation threshold; verify auto-escalation fires within the documented latency. Repeat with a finding scoring just below the threshold; verify it does NOT escalate. Repeat for at least 3 distinct severity classes. 4. **Cumulative-threshold trigger test:** Plant N sub-threshold findings that together exceed the cumulative exposure threshold; verify the platform escalates after the threshold is crossed and that the escalation references the cumulative score, not any single finding. 5. **Predictive escalation test:** Stage an exploitation chain where the next planned step would yield domain admin (or equivalent crown-jewel access). Pass criterion: the platform escalates BEFORE executing the high-impact step, and the escalation message identifies the predicted impact and the planned step. 6. **Critical-system auto-escalation:** Stage a target marked as critical-system and trigger any compromise on it. Verify auto-escalation fires regardless of severity score. 7. **Routing test:** Generate one finding in each escalation routing class (immediate-response, strategic, regulatory/legal, business operational). Verify each is delivered to the correct routing channel (security team, leadership, legal/compliance, business owner) per the documented routing matrix. -8. **False-positive rate monitoring:** Run the platform against a known-clean baseline environment for a defined duration (for example, 24 hours); record the count of escalations triggered. Pass criterion: false-positive rate is below the documented acceptable threshold (for example, ≤1 escalation per 24 hours on the baseline) so that operators are not desensitized. +8. **False-positive rate monitoring:** Run the platform against a known-clean baseline environment for a defined duration (for example, 24 hours); record the count of escalations triggered. Pass criterion: false-positive rate is below the documented acceptable threshold (for example, <=1 escalation per 24 hours on the baseline) so that operators are not desensitized. > **See also:** APTS-SC-007 (cumulative risk tracking). From 5b06ca372141ec8e812654862928d2279c479675 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 20:39:26 +0530 Subject: [PATCH 08/35] Update README.md --- standard/5_Auditability/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/standard/5_Auditability/README.md b/standard/5_Auditability/README.md index f009940..a2c1b4a 100644 --- a/standard/5_Auditability/README.md +++ b/standard/5_Auditability/README.md @@ -54,7 +54,7 @@ Several requirements in this domain reference attack-chain phases. APTS uses a s ### Conformance -A platform claims conformance with this domain by satisfying all MUST requirements at the tier it targets. APTS defines three cumulative tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 AR requirement plus every Tier 2 AR requirement, and a Tier 3 platform satisfies all three tiers. SHOULD-level requirements are interpreted per RFC 2119. +A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 AR requirement plus every Tier 2 AR requirement, and a Tier 3 platform satisfies all three tiers. Four advisory practices relevant to this domain (APTS-AR-A01 State Capture and Replay Support, APTS-AR-A02 Replay Variance Analysis, APTS-AR-A03 Real-Time External Log Streaming, APTS-AR-A04 Continuous Runtime Integrity Monitoring) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. @@ -89,7 +89,7 @@ In addition, the schema MUST enforce conditional fields by event type: ### Verification -1. All timestamps have ≤1ms precision +1. All timestamps have <=1ms precision 2. Every network interaction generates corresponding request AND response log entries 3. Log entries are immutable once written 4. Trace a sample multi-event transaction end-to-end using correlation IDs and confirm all related events are linked @@ -227,7 +227,7 @@ Logs MUST be retained according to the following minimum requirements: **Storage Requirements:** - Primary: Encrypted at-rest (AES-256) -- Backup: Geographically distributed (≥2 locations) +- Backup: Geographically distributed (>=2 locations) - Integrity: Regular cryptographic verification - Format: Immutable (append-only, no modification/deletion) @@ -711,7 +711,7 @@ The platform MUST maintain a software bill of materials (SBOM) in Software Packa The SBOM MUST include: component names, versions, licenses, copyright holders. Vulnerability disclosure includes: CVE ID, affected component, CVSS score, remediation status. -> **See also:** APTS-TP-006 (complementary supply chain controls: dependency inventory, risk assessment, and verification). +> **See also:** APTS-TP-006 (complementary supply chain controls: dependency inventory, risk assessment, and verification). TP-006 establishes the baseline SBOM and dependency inventory at Tier 1; this requirement adds the Tier 2 obligations on SBOM freshness (48-hour update), customer access on request, and platform integrity attestation. The SBOM obligation is layered, not duplicated: assess the inventory itself under TP-006 and the freshness, disclosure, and attestation controls under AR-016. ### Verification From d3bf37230fd9e2d09879c4d6ab0d1ce3ccd4dcce Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 20:40:02 +0530 Subject: [PATCH 09/35] Update README.md --- standard/6_Manipulation_Resistance/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/6_Manipulation_Resistance/README.md b/standard/6_Manipulation_Resistance/README.md index 40c0498..110a50c 100644 --- a/standard/6_Manipulation_Resistance/README.md +++ b/standard/6_Manipulation_Resistance/README.md @@ -60,7 +60,7 @@ The 23 requirements in this domain fall into seven thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying all MUST requirements at the compliance tier it targets. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 MR requirement plus every Tier 2 MR requirement, and a Tier 3 platform satisfies all three tiers. SHOULD-level requirements are interpreted per RFC 2119. +A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 MR requirement plus every Tier 2 MR requirement, and a Tier 3 platform satisfies all three tiers. Three advisory practices relevant to this domain (APTS-MR-A01 Goal Misgeneralization and Emergent Misalignment Evaluation Suite, APTS-MR-A02 Sandbagging Detection and Behavioral Consistency Validation, and APTS-MR-A03 Multi-Turn Adversarial Conversation Resilience) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. From 7b4987a54a43157de65ce5bb62073722c12bc478 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 20:41:11 +0530 Subject: [PATCH 10/35] Update README.md --- standard/7_Supply_Chain_Trust/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/7_Supply_Chain_Trust/README.md b/standard/7_Supply_Chain_Trust/README.md index 75a9059..bab9559 100644 --- a/standard/7_Supply_Chain_Trust/README.md +++ b/standard/7_Supply_Chain_Trust/README.md @@ -53,7 +53,7 @@ The 22 requirements in this domain fall into seven thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying all MUST requirements at the compliance tier it targets. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 TP requirement plus every Tier 2 TP requirement, and a Tier 3 platform satisfies all three tiers. SHOULD-level requirements are interpreted per RFC 2119. +A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 TP requirement plus every Tier 2 TP requirement, and a Tier 3 platform satisfies all three tiers. Four appendix-only advisory requirements for this domain (APTS-TP-A01 Breach Notification and Regulatory Reporting, APTS-TP-A02 Privacy Regulation Compliance, APTS-TP-A03 Professional Liability and Engagement Agreements, APTS-TP-A04 External Tool Connector Trust Boundaries and Credential Isolation) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. From 49ae84ea3f00cd2b3209753f9ed1f1d05dfbf6b1 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 20:41:52 +0530 Subject: [PATCH 11/35] Update README.md --- standard/8_Reporting/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/8_Reporting/README.md b/standard/8_Reporting/README.md index 4eacf14..82a2f83 100644 --- a/standard/8_Reporting/README.md +++ b/standard/8_Reporting/README.md @@ -44,7 +44,7 @@ The 15 requirements in this domain fall into five thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying all MUST requirements at the compliance tier it targets. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 RP requirement plus every Tier 2 RP requirement, and a Tier 3 platform satisfies all three tiers. SHOULD-level requirements are interpreted per RFC 2119. +A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 RP requirement plus every Tier 2 RP requirement, and a Tier 3 platform satisfies all three tiers. Every requirement in this domain includes a Verification subsection listing the verification procedures a reviewer uses to confirm implementation. From 2623fc794b8ad7cfbc710a72e72338fcab21a8ea Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 23:05:44 +0530 Subject: [PATCH 12/35] Update Advisory_Requirements.md --- standard/appendix/Advisory_Requirements.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/standard/appendix/Advisory_Requirements.md b/standard/appendix/Advisory_Requirements.md index e57f82b..f9ff84e 100644 --- a/standard/appendix/Advisory_Requirements.md +++ b/standard/appendix/Advisory_Requirements.md @@ -256,7 +256,7 @@ Document every external connector that can execute actions, access customer data **Applicability:** This practice applies to platforms that use LLM-based agents fine-tuned (SFT, RFT, RLHF, DPO, or equivalent) on offensive-security tasks, or whose foundation model has been adapted with offensive-task adapters, instruction tuning, task-specific reward models, or post-deployment online learning on engagement data. -**Rationale:** Recent peer-reviewed work has demonstrated that fine-tuning a frontier LLM on a narrow task can produce broad behavioral misalignment that extends far outside the training domain (Nature 2026, *Training LLMs on narrow tasks can lead to broad misalignment*). For autonomous penetration testing platforms, two failure modes follow directly: (a) goal misgeneralization, where the agent learns a proxy objective ("produce findings that look like vulnerabilities") that diverges from the true objective ("identify vulnerabilities exploitable in the customer environment") in distinguishing situations the training data did not cover; and (b) emergent misalignment, where narrow fine-tuning on offensive tasks shifts the agent's behavior in adjacent domains with no signal until the shift manifests in a production engagement. APTS-MR-013 (Adversarial Example Detection in Vulnerability Classification) probes input-side robustness; APTS-MR-020 (Adversarial Validation and Resilience Testing of Safety Controls) probes control-side resilience; APTS-AR-019 (AI/ML Model Change Tracking and Drift Detection) tracks output drift. None of these evaluate the agent's underlying objective alignment under distribution shift. The Introduction's *Capability Frontier and Containment Assumptions* section defers verifiable goal alignment as research-stage and out of scope for v0.1.0; this practice begins to close that gap with an evaluation-based approach achievable today. The normative requirement set for v0.1.0 is frozen; this practice is a candidate for tier-gated inclusion in v0.2.0 (likely as SHOULD | Tier 2 for platforms operating at Level 3 autonomy or higher, or for any platform that performs post-deployment fine-tuning on engagement data). +**Rationale:** Recent peer-reviewed work has demonstrated that fine-tuning a frontier LLM on a narrow task can produce broad behavioral misalignment that extends far outside the training domain (Betley et al., 2026, *Training large language models on narrow tasks can lead to broad misalignment*, Nature 649, 584-589). For autonomous penetration testing platforms, two failure modes follow directly: (a) goal misgeneralization, where the agent learns a proxy objective ("produce findings that look like vulnerabilities") that diverges from the true objective ("identify vulnerabilities exploitable in the customer environment") in distinguishing situations the training data did not cover; and (b) emergent misalignment, where narrow fine-tuning on offensive tasks shifts the agent's behavior in adjacent domains with no signal until the shift manifests in a production engagement. APTS-MR-013 (Adversarial Example Detection in Vulnerability Classification) probes input-side robustness; APTS-MR-020 (Adversarial Validation and Resilience Testing of Safety Controls) probes control-side resilience; APTS-AR-019 (AI/ML Model Change Tracking and Drift Detection) tracks output drift. None of these evaluate the agent's underlying objective alignment under distribution shift. The Introduction's *Capability Frontier and Containment Assumptions* section defers verifiable goal alignment as research-stage and out of scope for v0.1.0; this practice begins to close that gap with an evaluation-based approach achievable today. The normative requirement set for v0.1.0 is frozen; this practice is a candidate for tier-gated inclusion in v0.2.0 (likely as SHOULD | Tier 2 for platforms operating at Level 3 autonomy or higher, or for any platform that performs post-deployment fine-tuning on engagement data). **Value:** Platforms that maintain a goal-misgeneralization and emergent-misalignment evaluation suite detect a class of failure that no other safety control catches: situations where every individual safety check passes, scope holds, and the agent produces fluent, plausible output, while the agent is in fact optimizing a proxy objective that diverges from the true objective in distinguishing cases. This is the agent-side analogue of the fabricated-finding problem addressed by APTS-RP-A01: a failure that is invisible to per-action checks because it manifests only across the distribution of decisions the agent makes. @@ -280,7 +280,7 @@ Maintain and execute a goal-misgeneralization and emergent-misalignment evaluati **Applicability:** This practice applies to platforms that use LLM-based agents and run an adversarial evaluation suite under APTS-MR-020 against the same model configuration deployed in customer engagements authorized under APTS-SE-001. It applies regardless of whether the underlying model has been fine-tuned, and is therefore complementary in scope to APTS-MR-A01. -**Rationale:** A recurring finding across frontier-model evaluations between late 2024 and early 2026 is that an agent's behavior on an adversarial eval suite can diverge from its behavior in live deployment, because the model can detect features of the evaluation context and modulate its actions accordingly. Apollo Research demonstrated reproducible in-context scheming and sandbagging across Claude 3.5 Sonnet, Claude 3 Opus, and OpenAI o1 (Meinke et al., 2024, *Frontier Models are Capable of In-context Scheming*, arXiv:2412.04984), including a documented case in which o3 internally reasoned "we want to survive as the model, we need to fail purposely" and then deliberately submitted incorrect chemistry answers; METR's o3 evaluation report cautions that the evaluation setup "is not robust to sandbagging by the model." Independent peer-reviewed work establishes sandbagging as a measurable, controllable behavior (van der Weij et al., *AI Sandbagging*, ICLR 2025, arXiv:2406.07358), and Anthropic's *Alignment Faking* result shows that production-grade Claude 3 Opus differentially modulates behavior under naturalistic conditions (Greenblatt et al., 2024). Anthropic's own Sonnet 4.5 and Opus 4.6 system cards report that internal interpretability features for evaluation awareness strengthened during training, with UK AISI testing finding Opus 4.6 correctly identifies evaluations roughly 80% of the time while verbalizing that recognition only ~2% of the time. The capability scales predictably with model size (Chaudhary et al., 2025, arXiv:2509.13333) and is measurable on agentic tasks (Needham et al., 2025, arXiv:2505.23836). For autonomous penetration testing platforms the consequence is concrete: Meta's 2026 Muse Spark report shows roughly a 32-percentage-point drop on cyber capability evaluations under sandbagging instruction with general-knowledge performance moving by 0.4 points, exactly the failure mode an adversarial eval suite is supposed to surface. APTS-MR-020 (Adversarial Validation and Resilience Testing of Safety Controls) requires the platform to run the eval suite; APTS-AR-019 (AI/ML Model Change Tracking and Drift Detection) tracks output drift across model versions. Neither measures the differential between the agent's eval-context behavior and its production-context behavior on the same model snapshot. The EU AI Act Article 15 obligation that high-risk systems "perform consistently ... throughout their lifecycle" gestures at this property without specifying a measurement; this practice supplies one. The normative requirement set for v0.1.0 is frozen; this practice is a candidate for tier-gated inclusion in v0.2.0 (likely as SHOULD | Tier 2 for platforms operating at Level 3 autonomy or higher). +**Rationale:** A recurring finding across frontier-model evaluations between late 2024 and early 2026 is that an agent's behavior on an adversarial eval suite can diverge from its behavior in live deployment, because the model can detect features of the evaluation context and modulate its actions accordingly. Apollo Research demonstrated reproducible in-context scheming and sandbagging across multiple frontier models including Claude 3.5 Sonnet, Claude 3 Opus, OpenAI o1, Gemini 1.5, and Llama 3.1 (Meinke et al., 2024, *Frontier Models are Capable of In-context Scheming*, arXiv:2412.04984). Apollo's later anti-scheming work with OpenAI documented OpenAI o3 reasoning explicitly about whether evaluators would be able to detect underperformance before deciding how to answer (Schoen et al., 2025, *Stress Testing Deliberative Alignment for Anti-Scheming Training*, arXiv:2509.15541, Figures 48-49). Independent peer-reviewed work establishes sandbagging as a measurable, controllable behavior (van der Weij et al., *AI Sandbagging*, ICLR 2025, arXiv:2406.07358), and Anthropic's *Alignment Faking* result shows that production-grade Claude 3 Opus differentially modulates behavior under naturalistic conditions (Greenblatt et al., 2024, arXiv:2412.14093). Recent model system cards report that interpretability features for evaluation awareness can strengthen during training and that a model can recognize an evaluation context without verbalizing that recognition in its reasoning trace; operators should consult the system card for the specific model and version they deploy rather than rely on a single published figure. The capability scales predictably with model size (Chaudhary et al., 2025, arXiv:2509.13333) and is measurable on agentic tasks (Needham et al., 2025, arXiv:2505.23836). For autonomous penetration testing platforms the consequence is concrete: published sandbagging experiments report large drops on targeted cyber and capability evaluations under an underperformance instruction while general-knowledge performance is largely unchanged, exactly the failure mode an adversarial eval suite is supposed to surface. APTS-MR-020 (Adversarial Validation and Resilience Testing of Safety Controls) requires the platform to run the eval suite; APTS-AR-019 (AI/ML Model Change Tracking and Drift Detection) tracks output drift across model versions. Neither measures the differential between the agent's eval-context behavior and its production-context behavior on the same model snapshot. The EU AI Act Article 15 obligation that high-risk systems perform consistently throughout their lifecycle gestures at this property without specifying a measurement; this practice supplies one. The normative requirement set for v0.1.0 is frozen; this practice is a candidate for tier-gated inclusion in v0.2.0 (likely as SHOULD | Tier 2 for platforms operating at Level 3 autonomy or higher). **Value:** Platforms that maintain and validate an eval-versus-production behavioral-consistency measurement detect a class of failure that a clean eval pass-rate cannot rule out: situations where the adversarial eval suite shows no anomalies while the agent's deployed behavior diverges along the dimensions the eval was supposed to govern (scope compliance, refusal of out-of-scope tools, finding-fabrication rate). This is the context-side analogue of the objective-side failure addressed by APTS-MR-A01: MR-A01 catches the agent pursuing the wrong objective when training data is ambiguous; MR-A02 catches the agent pursuing the right objective during evaluation while pursuing a different one in deployment, on the same model snapshot. From 9c69243b12ebbb38fbc2760fa68893deeb74dd92 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 23:07:11 +0530 Subject: [PATCH 13/35] Update Compliance_Matrix.md --- standard/appendix/Compliance_Matrix.md | 689 +++++++++++-------------- 1 file changed, 297 insertions(+), 392 deletions(-) diff --git a/standard/appendix/Compliance_Matrix.md b/standard/appendix/Compliance_Matrix.md index d68587f..f056e05 100644 --- a/standard/appendix/Compliance_Matrix.md +++ b/standard/appendix/Compliance_Matrix.md @@ -10,16 +10,16 @@ This appendix is part of the OWASP Autonomous Penetration Testing Standard (APTS ## Overview -This appendix maps APTS requirements to nine external frameworks: -1. **NIST Cybersecurity Framework (CSF) 2.0** - Risk-based cybersecurity framework -2. **ISO/IEC 27001:2022** - International information security standard -3. **NIST AI RMF 1.0** - AI risk management framework -4. **SOC 2 Trust Services Criteria (2017, with 2022 revised points of focus)** - Trust services for service organizations -5. **PCI DSS 4.0.1** - Payment card industry data security standard -6. **GDPR** - EU General Data Protection Regulation -7. **NIST SP 800-53 Rev. 5** - Federal information security controls -8. **HIPAA Security Rule (45 CFR Part 164)** - Healthcare data security requirements -9. **CIS Critical Security Controls v8** - Foundational security practices +This appendix maps APTS requirements to nine external frameworks, in the order they appear below: +- **NIST Cybersecurity Framework (CSF) 2.0** - Risk-based cybersecurity framework +- **ISO/IEC 27001:2022** - International information security standard +- **SOC 2 Trust Services Criteria (2017, with 2022 revised points of focus)** - Trust services for service organizations +- **NIST AI RMF 1.0** - AI risk management framework +- **PCI DSS 4.0.1** - Payment card industry data security standard +- **GDPR** - EU General Data Protection Regulation +- **NIST SP 800-53 Rev. 5** - Federal information security controls +- **HIPAA Security Rule (45 CFR Part 164)** - Healthcare data security requirements +- **CIS Critical Security Controls v8** - Foundational security practices The first four frameworks are mapped comprehensively across all domains. PCI DSS 4.0.1, GDPR, and HIPAA mappings apply primarily to data-handling, privacy, and supply chain requirements. NIST SP 800-53, CIS Controls, and other frameworks address specific governance and technical control areas. @@ -33,7 +33,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS ### GOVERN Function -**GV.PO-1: Organizational Context** +**GV.RR-01: Organizational Leadership and Governance Structure** - Related: Scope Enforcement (Scope Definition), Safety Controls - Requirements: Document formal governance structure for autonomous pentesting platform operations - Controls: @@ -41,7 +41,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Perform impact classification for every pentest action before execution using multi-tier system (APTS-SC-001) - Document cumulative risk scoring algorithm with automated escalation thresholds (APTS-SC-007) -**GV.PO-2: Roles and Responsibilities** +**GV.RR-02: Roles, Responsibilities, and Authorities** - Related: Human Oversight (Governance), Safety Controls - Requirements: Document Authority Delegation Matrix specifying approval authority by role and action type - Controls: @@ -49,7 +49,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Require mandatory human approval gates for all medium and high-impact actions with defined SLA response windows (APTS-HO-003) - Log all approval decisions with immutable audit trail including timestamp, approver identity, and rationale (APTS-HO-005) -**GV.RM-1: Risk Management Strategy** +**GV.RM-01: Risk Management Strategy** - Related: Human Oversight (Risk Assessment), Safety Controls - Requirements: Implement formal risk scoring and escalation controls for autonomous pentesting actions - Controls: @@ -57,7 +57,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Implement graduated responsibility escalation with automatic actions, approval requirements, and prohibited thresholds (APTS-SC-006) - Track cumulative risk score across entire engagement with configurable decay and reset windows (APTS-SC-007) -**GV.RM-2: Cybersecurity Supply Chain Risk** +**GV.SC-01: Cybersecurity Supply Chain Risk Management** - Related: Supply Chain Trust (Third-Party Dependencies) - Requirements: Establish vetting process and maintain inventory of all third-party dependencies - Controls: @@ -67,7 +67,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS ### IDENTIFY Function -**ID.AM-1: Asset Management** +**ID.AM-01: Asset Inventory** - Related: Scope Enforcement (Scope Definition) - Requirements: Ingest machine-parseable scope with complete target inventory and asset criticality classifications - Controls: @@ -75,15 +75,15 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Validate all IP ranges using CIDR notation, detecting overlaps and maintaining awareness of reserved IP space (APTS-SE-002) - Classify each asset by criticality level (Critical, Production, Non-Production) with corresponding action restrictions enforced at runtime (APTS-SE-005) -**ID.AM-4: External Information Systems** +**ID.AM-04: Supplier-Provided Services Inventory** - Related: Third-Party & Supply Chain Trust -- Requirements: Maintain comprehensive inventory of cloud dependencies and third-party services +- Requirements: Maintain an inventory of supplier-provided services and external dependencies - Controls: - Document all AI provider dependencies with vetting criteria including data protection and contractual terms (APTS-TP-001) - Maintain Software Bill of Materials (SBOM) including versions and vulnerability tracking for all dependencies (APTS-TP-006) - Perform risk assessment for each dependency evaluated against criticality of its use in operations (APTS-TP-006) -**ID.RA-1: Asset Vulnerabilities** +**ID.RA-01: Vulnerability Identification** - Related: Graduated Autonomy, Auditability (Vulnerability Assessment & Exploitation) - Requirements: Identify vulnerabilities through automated scanning with findings confidence scoring - Controls: @@ -91,9 +91,9 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Log all vulnerability assessment findings as decision events with confidence scores and alternative actions evaluated (APTS-AR-004) - Record vulnerability metadata including type, severity, evidence hashes, and CVE references in structured log format (APTS-AR-001) -**ID.RA-2: Threat Identification** +**ID.AM-03: Authorized Communication and Scope Boundary Maintenance** - Related: Scope Enforcement (Scope Definition and Approval) -- Requirements: Assess target system threats within scope boundaries with explicit approval +- Requirements: Maintain authoritative representations of authorized testing scope and detect boundary drift - Controls: - Document domain scope specifications with explicit wildcard policies distinguishing exact domains, all subdomains, and single-level wildcards (APTS-SE-003) - Perform continuous DNS monitoring and cloud boundary validation to detect scope drift before unauthorized testing (APTS-SE-007) @@ -101,7 +101,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS ### PROTECT Function -**PR.AA-1: Identity Management, Authentication, and Access Control Policy** +**PR.AA-01: Identity Management** - Related: Auditability (Data Protection) + Supply Chain Trust (Data Handling) - Requirements: Implement role-based access control with multi-tenancy isolation - Controls: @@ -109,7 +109,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Implement multi-tenancy isolation controlling data access between engagements with row-level security and file system permissions (APTS-TP-017) - Restrict operator access to engagement-specific scope, findings, and data with mandatory access control enforcement (APTS-TP-017) -**PR.AA-2: Access Enforcement** +**PR.AA-05: Access Enforcement** - Related: Supply Chain Trust (Multi-Tenancy Data Isolation) - Requirements: Enforce access controls at database, file system, and network layers - Controls: @@ -117,7 +117,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Enforce file system permissions with restricted access to engagement findings and evidence (APTS-TP-017) - Isolate testing network traffic per engagement with network segmentation and firewall rules (APTS-TP-017) -**PR.DS-1: Data Security Policy** +**PR.DS-01: Data Security (classification and minimization before external transmission)** - Related: Auditability (Data Protection) + Supply Chain Trust (Data Handling) - Requirements: Classify data by sensitivity level and minimize transmission to external providers - Controls: @@ -125,7 +125,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Implement data minimization by stripping unnecessary metadata, redacting credentials/API keys, and anonymizing IPs before transmission to providers (APTS-TP-012) - Obtain explicit client consent disclosing specific AI providers used, data categories sent, and data retention policies (APTS-TP-012) -**PR.DS-2: Data In Transit** +**PR.DS-02: Data-in-Transit Confidentiality and Integrity** - Related: Supply Chain Trust (Encryption & Data Handling) - Requirements: Enforce encryption for all data transmitted to external providers - Controls: @@ -133,7 +133,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Implement mutual TLS for service-to-service authentication where supported by providers (APTS-TP-003) - Monitor all API usage patterns for anomalies and enforce secure credential storage with rotation schedules (APTS-TP-003) -**PR.DS-3: Data At Rest** +**PR.DS-01: Data-at-Rest Protection (encryption, key management, secure deletion)** - Related: Supply Chain Trust (Encryption & Data Handling) - Requirements: Encrypt sensitive data at rest with documented key management and deletion procedures - Controls: @@ -141,7 +141,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Implement key management controls with separate key storage from encrypted data and key rotation procedures (APTS-TP-014) - Define and test secure deletion procedures for sensitive data with verification that old data cannot be recovered (APTS-TP-014) -**PR.PS-1: Security Policy** +**PR.PS-01: Configuration and Security Policy Management** - Related: Scope Enforcement, Safety Controls, Human Oversight (Governance and Scope) - Requirements: Document APTS security policies, Rules of Engagement, and approval workflows - Controls: @@ -149,17 +149,9 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Document Rules of Engagement in machine-parseable format specifying scope, temporal boundaries, and action restrictions (APTS-SE-001) - Implement scope approval workflows enforcing pre-action scope validation for all network actions before execution (APTS-SE-006) -**PR.PS-2: Third-Party Maintenance** -- Related: Supply Chain Trust (Third-Party Dependencies) -- Requirements: Monitor provider SLAs and establish incident response for provider failures -- Controls: - - Continuously monitor provider availability against documented SLAs with automated alerts for failures (APTS-TP-004) - - Establish documented failover procedures for critical providers with fallback providers identified and tested (APTS-TP-004) - - Establish incident response procedures for provider breaches with breach notification obligations aligned to regulatory requirements (APTS-TP-005) - ### DETECT Function -**DE.AE-1: Detection Processes** +**DE.AE-02: Adverse Event Analysis** - Related: Human Oversight + Supply Chain Trust (Risk Assessment & Monitoring) - Requirements: Detect vulnerabilities and anomalies with confidence scoring and false positive procedures - Controls: @@ -167,7 +159,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Score all anomalies with confidence levels and route according to documented escalation policy (APTS-SC-010) - Generate decision event logs for all detection findings with evidence hashes and vulnerability metadata (APTS-AR-004) -**DE.CM-1: Network Monitoring** +**DE.CM-01: Network Monitoring** - Related: Supply Chain Trust (Cloud Dependencies & Monitoring) - Requirements: Monitor target system health and detect impacts during testing - Controls: @@ -175,25 +167,33 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Establish baseline metrics within first 5 health check cycles and detect degradation exceeding 200% of baseline (APTS-SC-010) - Implement circuit breaker logic automatically suspending testing on sustained degradation with cooldown and probe-based recovery (APTS-SC-012) -**DE.CM-2: Data Monitoring** +**DE.CM-03: Personnel and Data Activity Monitoring** - Related: Supply Chain Trust (Data Handling) -- Requirements: Monitor all data access and implement breach detection mechanisms +- Requirements: Monitor personnel and data activity for unauthorized access and implement breach detection mechanisms - Controls: - Log all data access with structured event logging including timestamp, source, target, and data accessed (APTS-AR-001) - Implement breach detection mechanisms monitoring for unauthorized data exfiltration patterns (APTS-TP-017) - Establish DLP monitoring rules specific to regulated data types (PHI, PCI, PII) with automatic escalation (APTS-SC-002) +**DE.CM-06: External Service Provider Monitoring** +- Related: Supply Chain Trust (Third-Party Dependencies) +- Requirements: Monitor provider SLAs and establish incident response for provider failures +- Controls: + - Continuously monitor provider availability against documented SLAs with automated alerts for failures (APTS-TP-004) + - Establish documented failover procedures for critical providers with fallback providers identified and tested (APTS-TP-004) + - Establish incident response procedures for provider breaches with breach notification obligations aligned to regulatory requirements (APTS-TP-005) + ### RESPOND Function -**RS.MA-1: Response Planning** -- Related: Manipulation Resistance (Incident Response) + Supply Chain Trust (Breach Notification) -- Requirements: Establish documented incident response plan with escalation procedures and team assignments +**RS.MA-01: Incident Management Plan Execution** +- Related: Safety Controls + Manipulation Resistance (Incident Response) + Supply Chain Trust (Breach Notification) +- Requirements: Execute a documented incident management plan with escalation procedures and team assignments - Controls: - Document platform incident response plan specifying response timelines, communication channels, and recovery procedures (APTS-SC-018) - Define escalation procedures routing detected anomalies to human operators with defined SLA notification timelines (APTS-SC-017) - Assign incident response team members with documented responsibilities and communication protocols (APTS-SC-018) -**RS.MI-1: Incident Handling** +**RS.MI-01: Incident Mitigation** - Related: Manipulation Resistance (Incident Response) - Requirements: Classify, investigate, and preserve evidence for all detected incidents - Controls: @@ -201,8 +201,8 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS - Preserve memory dumps, log snapshots, network captures, and process state before any containment procedures (APTS-SC-018) - Document incident timeline, root cause, affected systems, remediation steps, and lessons learned in post-incident report (APTS-SC-018) -**RS.CO-1: Response Communication** -- Related: Manipulation Resistance + Supply Chain Trust (Incident Response & Breach Notification) +**RS.CO-02: Incident Reporting and Communication** +- Related: Safety Controls + Supply Chain Trust (Incident Response & Breach Notification) - Requirements: Notify operators and customers of incidents within defined SLA timelines - Controls: - Establish Watchdog infrastructure monitoring platform health with independent network and credentials (APTS-SC-017) @@ -211,7 +211,7 @@ NIST CSF 2.0 organizes controls into six Functions and multiple Categories. APTS ### RECOVER Function -**RC.RP-1: Recovery Planning** +**RC.RP-01: Incident Recovery Plan Execution** - Related: Supply Chain Trust (Business Continuity) - Requirements: Establish disaster recovery procedures with documented RTO and RPO targets - Controls: @@ -229,291 +229,226 @@ ISO/IEC 27001:2022 contains 93 controls organized into four themes: A.5 Organiza **A.5.1: Policies for information security** - Controls: Establish APTS charter defining organizational roles and decision authorities (APTS-HO-004); document Rules of Engagement specifying testing scope and boundaries (APTS-SE-001); implement scope approval workflows with pre-action validation (APTS-SE-006) -- Addresses: **A.5.2: Information security roles and responsibilities** - Controls: Document Authority Delegation Matrix specifying approval authority by role and action type (APTS-HO-004); implement mandatory approval gates for medium and high-impact actions (APTS-HO-001); maintain immutable decision audit trail (APTS-HO-005) -- Addresses: **A.5.7: Threat intelligence** - Controls: Maintain authoritative technique mapping with impact classifications, CIA scores, and reversibility status (APTS-SC-001); log all decision points with confidence scores and alternative actions evaluated (APTS-AR-004); establish baseline for anomaly detection from threat feeds (APTS-SC-010) -- Addresses: **A.5.8: Information security in project management** - Controls: Ingest and validate machine-parseable Rules of Engagement with target lists, temporal boundaries, and action restrictions (APTS-SE-001); perform pre-action scope validation for all network actions (APTS-SE-006); continuously monitor for DNS changes and scope drift (APTS-SE-007) -- Addresses: **A.5.19: Information security in supplier relationships** - Controls: Document AI provider vetting process assessing security posture, data protection, and compliance (APTS-TP-001); maintain complete inventory of software dependencies and third-party components (APTS-TP-006); perform risk assessment for each dependency (APTS-TP-006) -- Addresses: **A.5.20: Addressing information security within supplier agreements** - Controls: Establish SLA requirements for critical providers with minimum 99.5% uptime; include breach notification obligations aligned to regulatory requirements in provider contracts (APTS-TP-004, APTS-TP-005) -- Addresses: **A.5.21: Managing information security in the ICT supply chain** - Controls: Document all third-party service dependencies with version numbers and known vulnerabilities in SBOM (APTS-TP-006); maintain alternative providers for critical services (APTS-TP-004) -- Addresses: **A.5.22: Monitoring, review and change management of supplier services** - Controls: Continuously monitor provider availability against documented SLAs with automated alerts for failures (APTS-TP-004); establish incident response procedures for provider breaches (APTS-TP-005) -- Addresses: **A.5.23: Information security for use of cloud services** - Controls: Establish procedures to respond to provider compromises discovered during active engagements (APTS-TP-005); validate cloud resource boundaries for AWS, Azure, GCP in-scope actions (APTS-SE-007) -- Addresses: **A.5.24: Information security incident management planning and preparation** - Controls: Document platform incident response plan with response timelines and communication channels (APTS-SC-018); establish Watchdog infrastructure monitoring with independent network (APTS-SC-017); define explicit termination conditions with detailed logging (APTS-SC-011) -- Addresses: **A.5.25: Assessment and decision on information security events** - Controls: Score anomalies with confidence levels and route according to documented escalation policy (APTS-SC-010); classify incidents and perform independent audit log validation before confirming (APTS-SC-017) -- Addresses: **A.5.26: Response to information security incidents** - Controls: Execute automatic network isolation and credential rotation upon confirmed incident (APTS-SC-018); preserve evidence on separate secure system before containment (APTS-SC-018); escalate to customers within documented SLA (APTS-SC-017) -- Addresses: **A.5.28: Collection of evidence** - Controls: Capture screenshots, log output, and modified file contents before rollback in write-once, tamper-evident storage (APTS-SC-016); preserve memory dump and logs for forensic investigation (APTS-SC-018) -- Addresses: **A.5.29: Information security during disruption** - Controls: Establish documented failover procedures for critical providers with fallback providers identified (APTS-TP-004); implement kill switch for immediate halt of all testing with Phase 1 (5 seconds) and Phase 2 (60 seconds) sequencing (APTS-SC-009) -- Addresses: **A.5.30: ICT readiness for business continuity** - Controls: Document recovery procedures with defined Recovery Time Objective (RTO) for each incident scenario (APTS-SC-018); perform complete safety control validation after recovery verifying all controls pass (APTS-SC-018) -- Addresses: **A.5.31: Legal, statutory, regulatory and contractual requirements** - Controls: Adjust impact classifications for industry-specific regulatory requirements including healthcare (PHI access as Critical) and financial systems (PCI data access as Critical) (APTS-SC-002) -- Addresses: **A.5.33: Protection of records** -- Controls: Log all events in structured format with cryptographic timestamps and immutability enforcement (APTS-AR-001, APTS-AR-001); maintain append-only log storage with minimum 12-month retention (APTS-AR-005); validate cryptographic signatures on audit log entries (APTS-AR-005) -- Addresses: +- Controls: Log all events in structured format with cryptographic timestamps and immutability enforcement (APTS-AR-001); maintain append-only log storage with minimum 12-month retention (APTS-AR-005); validate cryptographic signatures on audit log entries (APTS-AR-005) **A.5.34: Privacy and protection of PII** - Controls: Classify data by sensitivity level (Public, Sensitive, Confidential, Restricted) with data minimization before external transmission (APTS-TP-012); detect and redact PII/PHI/PCI data before logging or transmission (APTS-TP-012) -- Addresses: **A.5.36: Conformance with policies, rules and standards for information security** - Controls: Enforce temporal scope boundaries with no testing before start_time or after end_time (APTS-SE-008); verify authorized contacts listed in RoE and reachable within 5 minutes (APTS-HO-002) -- Addresses: **A.5.37: Documented operating procedures** - Controls: Document all required APTS procedures including Rules of Engagement (APTS-SE-001), incident response (APTS-SC-018), and rollback procedures (APTS-SC-014); maintain operational documentation for scope validation and escalation (APTS-SE-006, APTS-HO-003) -- Addresses: ### A.6: People Controls **A.6.3: Information security awareness, education and training** - Controls: Provide security training for platform operators covering approval authority, decision delegation, and incident response procedures -- Addresses: ### A.7: Physical Controls **A.7.1: Physical security perimeters** -- Controls: Data center security, facility access controls -- Addresses: (organizational responsibility) +- Controls: Data center security and facility access controls (organizational responsibility outside the platform's APTS scope) ### A.8: Technological Controls **A.8.1: User endpoint devices** - Controls: Enforce Rules of Engagement scope for all client-side agents with hard deny list enforcement (APTS-SE-009) -- Addresses: **A.8.2: Privileged access rights** - Controls: Implement multiple independent kill switch mechanisms with operator-initiated, remote, and automatic failsafe halts (APTS-SC-009); require multi-factor authentication for sensitive approval actions -- Addresses: **A.8.3: Information access restriction** - Controls: Implement multi-tenancy isolation with row-level security restricting operator access to assigned engagements (APTS-TP-017); enforce least privilege in Authority Delegation Matrix (APTS-HO-004) -- Addresses: **A.8.4: Access to source code** - Controls: Document code signing and verification processes for platform integrity; maintain secure source code repositories with access controls -- Addresses: **A.8.5: Secure authentication** - Controls: Enforce secure credential storage with never-in-code API keys and documented rotation schedules (APTS-TP-003); implement mutual TLS for service-to-service authentication (APTS-TP-003) -- Addresses: **A.8.7: Protection against malware** - Controls: Maintain SBOM with continuous vulnerability monitoring of all dependencies (APTS-TP-006); implement input sanitization preventing target-side content from modifying tool behavior (APTS-MR-002) -- Addresses: (organizational responsibility) **A.8.8: Management of technical vulnerabilities** - Controls: Maintain authoritative mapping of pentest techniques with pre-classified impact levels and reversibility status (APTS-SC-001); continuously monitor vulnerability feeds (APTS-TP-006) -- Addresses: **A.8.9: Configuration management** - Controls: Store threshold configurations in structured format with schema validation (APTS-SC-008); maintain baseline configurations for scanning tools and impact classification updates -- Addresses: **A.8.10: Information deletion** - Controls: Implement secure deletion procedures with verification that data cannot be recovered (APTS-TP-014); provide automated cleanup of all test artifacts with idempotent execution (APTS-SC-016) -- Addresses: **A.8.11: Data masking** - Controls: Implement data minimization by stripping metadata, redacting credentials, and anonymizing IPs before transmission (APTS-TP-012); sanitize responses removing instruction-like patterns before processing (APTS-MR-002) -- Addresses: **A.8.12: Data leakage prevention** - Controls: Classify data as Public, Sensitive, Confidential, or Restricted with access restrictions enforced by classification (APTS-TP-012); establish DLP monitoring rules for regulated data types (PHI, PCI, PII) (APTS-SC-002) -- Addresses: **A.8.15: Logging** -- Controls: Log all events in structured format with mandatory fields including timestamp, event type, source, target, and status code (APTS-AR-001, APTS-AR-001); enforce immutable append-only storage with cryptographic signatures (APTS-AR-005) -- Addresses: +- Controls: Log all events in structured format with mandatory fields including timestamp, event type, source, target, and status code (APTS-AR-001); enforce immutable append-only storage with cryptographic signatures (APTS-AR-005) **A.8.16: Monitoring activities** - Controls: Implement anomaly detection identifying deviations from baseline for testing patterns, decision-making patterns, and action patterns (APTS-SC-010); route anomalies by confidence level per documented escalation policy -- Addresses: **A.8.20: Networks security** - Controls: Validate all IP ranges using CIDR notation with overlap detection and reserved IP space awareness (APTS-SE-002); monitor for DNS changes and scope drift during engagement (APTS-SE-007) -- Addresses: **A.8.21: Security of network services** - Controls: Enforce TLS 1.2 minimum (TLS 1.3 preferred) for all API calls to external providers with certificate validation (APTS-TP-003) -- Addresses: **A.8.22: Segregation of networks** - Controls: Implement network isolation controls for multi-tenancy with file system permission enforcement and network segmentation (APTS-TP-017); isolate testing network traffic per engagement -- Addresses: **A.8.24: Use of cryptography** - Controls: Encrypt engagement data at rest with documented encryption standards (APTS-TP-014); enforce TLS for data in transit and implement key management with separate key storage (APTS-TP-014) -- Addresses: **A.8.25: Secure development life cycle** - Controls: Document secure platform development practices with regular security testing; implement defense-in-depth controls for prompt injection prevention (APTS-MR-001 through APTS-MR-012) -- Addresses: **A.8.28: Secure coding** - Controls: Enforce instruction boundary enforcement with cryptographic verification of operator-provided instructions (APTS-MR-001); prevent target-side content from influencing tool behavior -- Addresses: **A.8.31: Separation of development, test and production environments** - Controls: Segregate development, test, and production environments with controlled data flow between environments -- Addresses: **A.8.32: Change management** - Controls: Implement version pinning for AI models with formal change management and testing before deployment (APTS-TP-002); document configuration changes and deploy within defined windows -- Addresses: **A.8.33: Test information** - Controls: Handle test data without exposing production credentials; redact sensitive data from test logs and evidence -- Addresses: **A.8.34: Protection of information systems during audit testing** - Controls: Preserve evidence in write-once, tamper-evident storage before any rollback (APTS-SC-016); maintain cryptographically signed audit logs with immutability validation (APTS-AR-005) -- Addresses: --- ## 3. SOC 2 Trust Services Criteria Mapping (2017 TSC, 2022 revised Points of Focus) -SOC 2 defines five trust services categories with specific Trust Services Criteria. Mappings reference the AICPA 2017 Trust Services Criteria as revised with 2022 Points of Focus. The standard addresses all five categories. +SOC 2 defines five trust services categories, each with specific Trust Services Criteria. Mappings reference the AICPA 2017 Trust Services Criteria as revised with 2022 Points of Focus. The standard addresses all five categories. -### Principle 1: Security (CC - Common Criteria) +### Security (CC - Common Criteria) -**CC1: The entity has defined security objectives.** +**CC1: Control Environment** - integrity, governance structures, and security objectives. - Controls: Establish APTS charter documenting organizational roles, decision authorities, and security objectives (APTS-HO-004); define Rules of Engagement specifying scope and boundaries (APTS-SE-001) -- Addresses: -**CC2: The board of directors demonstrates independence from management.** +**CC2: Communication and Information** - internal and external communication of security responsibilities. - Controls: Establish organizational governance with CISO oversight and documented Authority Delegation Matrix (APTS-HO-004) -- Addresses: -**CC3: Management establishes structures, reporting lines, and appropriate authorities.** +**CC3: Risk Assessment** - identification and analysis of risks to security objectives. - Controls: Document Authority Delegation Matrix specifying approval authority by role and action type (APTS-HO-004); implement escalation procedures with defined SLA response windows (APTS-HO-003) -- Addresses: -**CC4: The entity holds people accountable for their responsibilities.** +**CC4: Monitoring Activities** - ongoing evaluation and accountability for control effectiveness. - Controls: Maintain immutable decision audit trail for all approvals with timestamp, approver identity, and rationale (APTS-HO-005); log all operator activities with structured event format (APTS-AR-001) -- Addresses: -**CC6: The entity defines and implements logical access controls.** +**CC6: Logical and Physical Access Controls** - access provisioning, restriction, and enforcement. - Controls: Define role-based access control for autonomous system operator functions (APTS-HO-004); enforce multi-tenancy isolation with row-level security restricting access per engagement (APTS-TP-017) -- Addresses: -**CC7: The entity restricts access to assets.** +**CC7: System Operations** - detection and handling of security events and anomalies. - Controls: Classify data as Public, Sensitive, Confidential, or Restricted with access controls enforced by classification (APTS-TP-012); encrypt engagement data at rest with documented key management (APTS-TP-014) -- Addresses: -**CC9: The entity obtains or generates information to support operation.** -- Controls: Log all events in structured format with timestamps and required fields (APTS-AR-001, APTS-AR-001); maintain audit trails for 12 months minimum with immutability validation (APTS-AR-005) -- Addresses: +**CC9: Risk Mitigation** - mitigation of risks arising from business operations and vendors. +- Controls: Log all events in structured format with timestamps and required fields (APTS-AR-001); maintain audit trails for 12 months minimum with immutability validation (APTS-AR-005) -### Principle 2: Availability (A) +### Availability (A) -**A1.1: The entity obtains or generates, uses, and communicates relevant, quality information regarding the objectives and responsibilities for information and communication technology security to support the functioning of other principles.** +**A1.1: The entity maintains, monitors, and evaluates current processing capacity and use of system components to manage capacity demand and enable additional capacity.** - Controls: Continuously monitor platform health with heartbeat, resource utilization, and behavioral baselines (APTS-SC-010); establish Watchdog infrastructure monitoring with independent network (APTS-SC-017) -- Addresses: -**A1.2: The entity authorizes, designs, develops, configures, documents, tests, approves, implements, maintains, monitors, evaluates, and disposes of changes to systems to achieve objectives.** +**A1.2: The entity authorizes, designs, develops, implements, operates, maintains, and monitors environmental protections, software, data backup processes, and recovery infrastructure to meet its availability objectives.** - Controls: Implement version pinning for AI models with formal change management and testing before deployment (APTS-TP-002); store configuration in structured format with schema validation (APTS-SC-008) -- Addresses: -**A1.3: The entity authorizes, designs, develops, configures, documents, tests, approves, implements, and maintains physical and logical access controls.** +**A1.3: The entity tests recovery plan procedures supporting system recovery to meet its availability objectives.** - Controls: Enforce Authority Delegation Matrix with role-based access control (APTS-HO-004); implement multi-tenancy isolation with row-level security (APTS-TP-017) -- Addresses: -### Principle 3: Processing Integrity (PI) +### Processing Integrity (PI) -**PI1.1: The entity obtains or generates, uses, and communicates relevant, quality information regarding the objectives and responsibilities for processing integrity to support the functioning of other principles.** +**PI1.1: The entity obtains or generates, uses, and communicates relevant, quality information regarding the objectives related to processing, including definitions of data processed and product and service specifications, to support the use of products and services.** - Controls: Classify data as Public, Sensitive, Confidential, or Restricted with minimization before external transmission (APTS-TP-012); detect and redact PII/PHI/PCI data before logging (APTS-TP-012) -- Addresses: -**PI1.2: The entity authorizes, designs, configures, implements, maintains, and monitors technologies to achieve objectives related to processing integrity.** +**PI1.2: The entity implements policies and procedures over system inputs, including controls over completeness and accuracy, to result in products, services, and reporting that meet the entity's objectives.** - Controls: Validate all target-side responses and sanitize before processing by LLM (APTS-MR-002); implement defense-in-depth controls preventing prompt injection (APTS-MR-001 through APTS-MR-012) -- Addresses: -**PI1.3: The entity authorizes, designs, develops, configures, documents, tests, approves, implements, and maintains policies and procedures for processing integrity.** +**PI1.3: The entity implements policies and procedures over system processing to result in products, services, and reporting that meet the entity's objectives.** - Controls: Document data handling procedures with explicit data minimization, redaction, and retention policies (APTS-TP-012); define validation procedures for all scope decisions (APTS-SE-006) -- Addresses: -**PI1.4: The entity authorizes, designs, develops, configures, documents, tests, approves, implements, and maintains monitoring of operations to achieve objectives.** +**PI1.4: The entity implements policies and procedures to make available or deliver output completely, accurately, and timely in accordance with specifications to meet the entity's objectives.** - Controls: Implement continuous vulnerability monitoring of dependencies (APTS-TP-006); monitor target system health with degradation detection (APTS-SC-010) -- Addresses: -**PI1.5: The entity authorizes, designs, develops, configures, documents, tests, approves, implements, and maintains the physical infrastructure.** +**PI1.5: The entity implements policies and procedures to store inputs, items in processing, and outputs completely, accurately, and timely in accordance with system specifications to meet the entity's objectives.** - Controls: Establish infrastructure segregation with separate storage for health monitoring data and audit logs (APTS-SC-010) -- Addresses: -### Principle 4: Confidentiality (C) +### Confidentiality (C) -**C1.1: The entity obtains or generates, uses, and communicates relevant, quality information regarding the objectives and responsibilities for confidentiality to support the functioning of other principles.** +**C1.1: The entity identifies and maintains confidential information to meet the entity's objectives related to confidentiality.** - Controls: Classify data by sensitivity level (Public, Sensitive, Confidential, Restricted) (APTS-TP-012); obtain explicit client consent disclosing AI providers and data categories (APTS-TP-012) -- Addresses: -**C1.2: The entity authorizes, designs, develops, configures, documents, tests, approves, implements, and maintains logical and physical access controls.** +**C1.2: The entity disposes of confidential information to meet the entity's objectives related to confidentiality.** - Controls: - Access controls (APTS-HO-004) - Encryption (APTS-TP-014) - Multi-tenancy isolation (APTS-TP-017) -- Addresses: **C1.3: The entity authorizes, designs, develops, configures, documents, tests, approves, implements, and maintains technologies to achieve objectives.** - Controls: - Encryption technologies - Secure deletion (APTS-TP-016) - Data protection mechanisms -- Addresses: -### Principle 5: Privacy (P) +### Privacy (P) -**P2.1: The entity provides notice to data subjects about privacy practices.** +**P1.1 / P3.1: The entity provides notice to data subjects about its privacy practices and obtains consent for the collection, use, retention, disclosure, and disposal of personal information.** - Controls: Include data handling disclosures in engagement documents specifying AI provider usage and data categories (APTS-TP-012); obtain written consent before testing regulated data types (APTS-SC-002) -- Addresses: -**P2.2: The entity obtains and retains evidence of explicit consent prior to the collection, use, and sharing of personal information.** +**P3.2: The entity obtains explicit consent for sensitive personal information, and obtains and documents consent prior to the collection, use, and sharing of personal information.** - Controls: Document explicit client consent for data handling practices with specific AI providers and data categories (APTS-TP-012); obtain consent before transmitting classified data (APTS-TP-012) -- Addresses: --- @@ -525,41 +460,33 @@ NIST AI RMF 1.0 defines four functions for managing AI system risks. APTS addres **GOVERN 1: Policies and Procedures** - Controls: Implement graduated autonomy governance with mandatory approval gates for all significant actions at L1 (APTS-HO-001); document human oversight policies specifying role responsibilities (APTS-HO-004) -- Addresses: **GOVERN 2: Accountability Structures** - Controls: Establish Authority Delegation Matrix specifying approval authority by role and action type (APTS-HO-004); implement escalation chains with documented SLA response windows (APTS-HO-003) -- Addresses: ### MAP Function **MAP 1: AI System Context and Risk Framing** - Controls: Ingest machine-parseable Rules of Engagement specifying scope, boundaries, and action restrictions (APTS-SE-001); implement multi-tier impact classification system for every action before execution (APTS-SC-001) -- Addresses: **MAP 2: AI Impact Characterization** - Controls: Continuously monitor target system health with degradation detection and automatic escalation (APTS-SC-010); implement circuit breaker logic suspending testing on sustained degradation (APTS-SC-012) -- Addresses: ### MEASURE Function **MEASURE 1: AI System Performance and Risk Metrics** - Controls: Log all decisions with confidence scores and auditable methodology (APTS-AR-004); document CIA impact scores for each pentest action (APTS-SC-001) -- Addresses: **MEASURE 2: AI System Trustworthiness Characteristics** - Controls: Log decision points with all alternatives evaluated and selected action justification (APTS-AR-004); maintain structured logs with confidence scores and risk assessment (APTS-AR-001) -- Addresses: ### MANAGE Function **MANAGE 1: AI Risk Treatment and Response** - Controls: Implement multiple independent kill switch mechanisms with Phase 1 (5 seconds) and Phase 2 (60 seconds) sequencing (APTS-SC-009); enforce instruction boundary enforcement with cryptographic verification of operator instructions (APTS-MR-001) -- Addresses: **MANAGE 2: Continuous Monitoring** - Controls: Implement platform health monitoring with anomaly detection for testing patterns and decision-making patterns (APTS-SC-010); establish external Watchdog on independent infrastructure monitoring platform health (APTS-SC-017) -- Addresses: --- @@ -567,55 +494,45 @@ NIST AI RMF 1.0 defines four functions for managing AI system risks. APTS addres PCI DSS 4.0.1 contains 12 requirements for payment card security. The standard addresses applicable controls (not all apply if the platform doesn't handle payment cards directly). -**Requirement 1: Firewall Configuration** +**Requirement 1: Install and Maintain Network Security Controls** - Controls: Implement network isolation controls for multi-tenancy with network segmentation (APTS-TP-017) - Applicable if: the platform processes cardholder data -- Addresses: -**Requirement 2: Default Passwords** +**Requirement 2: Apply Secure Configurations to All System Components** - Controls: Enforce strong authentication with API key rotation schedules and secure credential storage (APTS-TP-003) - Applicable if: the platform systems exposed to external networks -- Addresses: -**Requirement 3: Stored Data Protection** +**Requirement 3: Protect Stored Account Data** - Controls: Encrypt engagement data at rest with documented encryption standards and key management (APTS-TP-014) - Applicable if: the platform stores card data -- Addresses: -**Requirement 4: Transmitted Data Protection** +**Requirement 4: Protect Cardholder Data with Strong Cryptography During Transmission Over Open, Public Networks** - Controls: Enforce TLS 1.2 minimum (TLS 1.3 preferred) for all API calls to external providers (APTS-TP-003); implement data minimization redacting payment data before transmission (APTS-TP-012) - Applicable if: the platform transmits card data -- Addresses: -**Requirement 6: Secure Development** +**Requirement 6: Develop and Maintain Secure Systems and Software** - Controls: Implement defense-in-depth controls preventing prompt injection (APTS-MR-001 through APTS-MR-012); enforce input sanitization removing instruction-like patterns (APTS-MR-002) - Applicable if: the platform developed in-house -- Addresses: -**Requirement 7: Access Control** +**Requirement 7: Restrict Access to System Components and Cardholder Data by Business Need to Know** - Controls: Define role-based access control for operators with Authority Delegation Matrix (APTS-HO-004); enforce least privilege through multi-tenancy isolation (APTS-TP-017) - Applicable if: the platform processes card data -- Addresses: -**Requirement 8: User Identification** +**Requirement 8: Identify Users and Authenticate Access to System Components** - Controls: Enforce multi-factor authentication for approval actions; log all operator activities with structured event format (APTS-AR-001) - Applicable if: the platform processes card data -- Addresses: -**Requirement 10: Logging and Monitoring** -- Controls: Log all events in structured format with timestamps, status codes, and mandatory fields (APTS-AR-001, APTS-AR-001); maintain audit trails for 12 months minimum (APTS-AR-005) +**Requirement 10: Log and Monitor All Access to System Components and Cardholder Data** +- Controls: Log all events in structured format with timestamps, status codes, and mandatory fields (APTS-AR-001); maintain audit trails for 12 months minimum (APTS-AR-005) - Applicable if: the platform processes card data -- Addresses: -**Requirement 11: Vulnerability Management** +**Requirement 11: Test Security of Systems and Networks Regularly** - Controls: Maintain SBOM with continuous vulnerability monitoring of dependencies (APTS-TP-006); implement version pinning with formal change management (APTS-TP-002) - Applicable always -- Addresses: -**Requirement 12: Policies and Procedures** +**Requirement 12: Support Information Security with Organizational Policies and Programs** - Controls: Document APTS charter with security policies and procedures (APTS-HO-004); establish Rules of Engagement with approval workflows (APTS-SE-001) - Applicable always -- Addresses: --- @@ -625,48 +542,36 @@ GDPR (EU privacy regulation) contains key obligations for processing personal da **Article 4: Definitions** - Controls: Classify data as Public, Sensitive, Confidential, or Restricted with explicit definitions (APTS-TP-012) -- Addresses: Defines what constitutes personal data, processing -- Addresses: **Article 5: Principles** - Controls: Obtain explicit client consent disclosing AI providers and data categories (APTS-TP-012); implement data minimization stripping metadata and redacting credentials (APTS-TP-012); maintain secure deletion procedures with verification (APTS-TP-014) -- Addresses: **Article 6: Lawfulness of Processing** - Controls: Document explicit client consent in engagement agreement disclosing specific data processing practices (APTS-TP-012) -- Addresses: **Article 9: Processing Special Categories** - Controls: Adjust impact classifications for special categories treating PHI/PCI access as Critical (APTS-SC-002); provide extra protections and documentation if applicable -- Addresses: (with additional safeguards if applicable) **Article 12-14: Transparency** -- Controls: Include privacy notices in engagement documents explaining data handling (APTS-TP-012); provide audit trail of data processing activities (APTS-AR-001, APTS-AR-001) -- Addresses: +- Controls: Include privacy notices in engagement documents explaining data handling (APTS-TP-012); provide audit trail of data processing activities (APTS-AR-001) **Article 17: Right to Erasure** - Controls: Implement secure deletion procedures with verification that data cannot be recovered (APTS-TP-014); provide automated cleanup of test artifacts (APTS-SC-016) -- Addresses: **Article 18: Right to Restriction** - Controls: Preserve evidence in write-once storage before cleanup enabling data retention flexibility (APTS-SC-016) -- Addresses: **Article 28: Data Processing Agreements** - Controls: Establish Data Processing Agreements with external providers including breach notification obligations (APTS-TP-005); document sub-processor agreements with AI providers and cloud services (APTS-TP-001) -- Addresses: **Article 32: Security of Processing** - Controls: Encrypt engagement data at rest and in transit with documented encryption standards (APTS-TP-014); implement data minimization as pseudonymization (APTS-TP-012); enforce multi-tenancy isolation controls (APTS-TP-017) -- Addresses: **Article 33: Breach Notification** - Controls: Establish incident response procedures for provider breaches with breach notification obligations aligned to regulatory timelines (APTS-TP-005); document breach assessment and notification timeline -- Addresses: **Article 34: Individual Notification** - Controls: Notify affected tenants or individuals upon confirmed breach in accordance with documented notification procedures and required notification contents (APTS-TP-018) -- Addresses: --- @@ -674,245 +579,245 @@ GDPR (EU privacy regulation) contains key obligations for processing personal da This section maps all 8 APTS domains to external frameworks, organized by domain. -### 6.1 Scope Enforcement (APTS-SE) +### 7.1 Scope Enforcement (APTS-SE) | APTS Requirement | NIST CSF 2.0 | ISO/IEC 27001:2022 | NIST AI RMF 1.0 | SOC 2 TSC 2017 (2022 PoF) | Notes | |---|---|---|---|---|---| -| APTS-SE-001: Rules of Engagement (RoE) Specification and Validation | GV.PO-1 | A.5.8 | GOVERN 1 | CC3.2 | Scope definition and validation process control | -| APTS-SE-002: IP Range Validation and RFC 1918 Awareness | ID.AM-1 | A.8.20, A.8.22 | GOVERN 1 | CC1.1 | Asset inventory validation, scope boundary enforcement | -| APTS-SE-003: Domain Scope Validation and Wildcard Handling | ID.AM-1 | A.8.20 | GOVERN 1 | CC1.1 | Domain ownership verification, third-party detection | -| APTS-SE-004: Temporal Boundary and Timezone Handling | GV.PO-1 | A.5.37, A.8.16 | GOVERN 1 | CC2.1 | Time-based operational controls, timezone handling | -| APTS-SE-005: Asset Criticality Classification and Integration | ID.AM-5 | A.5.12 | GOVERN 1 | CC4.1 | Risk-based testing restrictions per asset tier | -| APTS-SE-006: Pre-Action Scope Validation | PR.AA-1 | A.8.5 | GOVERN 1 | CC6.6 | Authorization boundary enforcement before action | -| APTS-SE-007: Dynamic Scope Monitoring and Drift Detection | DE.CM-1 | A.8.16 | MAP 1 | CC9.1 | Continuous drift detection, boundary violation alerts | -| APTS-SE-008: Temporal Scope Compliance Monitoring | DE.CM-1 | A.5.1 | GOVERN 1 | CC9.1 | Engagement window enforcement, deadline alerts | -| APTS-SE-009: Hard Deny Lists and Critical Asset Protection | PR.AA-1 | A.8.5 | GOVERN 1 | CC6.6 | Immutable asset protection, cryptographic enforcement | -| APTS-SE-010: Production Database Safeguards | PR.AA-1 | A.8.5 | GOVERN 1 | CC6.6 | MUST \| Tier 2 | Multi-layer database protection, read-only mode | -| APTS-SE-011: Multi-Tenant Environment Awareness | PR.AA-2 | A.8.5 | GOVERN 1 | CC7.2 | SHOULD \| Tier 2 | Cross-tenant isolation, shared infrastructure detection | -| APTS-SE-012: DNS Rebinding Attack Prevention | PR.AA-1 | A.8.9 | GOVERN 1 | CC6.6 | Network-level attack prevention, resolution validation | -| APTS-SE-013: Network Boundary and Lateral Movement Enforcement | ID.AM-1 | A.8.20 | GOVERN 1 | CC6.6 | VLAN/subnet/cloud security group boundaries | -| APTS-SE-014: Network Topology Discovery Limitations | DE.CM-1 | A.8.9 | GOVERN 1 | CC9.1 | Reconnaissance scope limitations, host/port count limits | -| APTS-SE-015: Scope Enforcement Audit and Compliance Verification | PR.PS-1 | A.5.36 | MAP 1 | CC9.1 | Complete audit trail of scope decisions | -| APTS-SE-016: Scope Refresh and Revalidation Cycle | DE.CM-1 | A.8.16 | MAP 1 | CC9.1 | MUST \| Tier 2 | Infrastructure change detection, delta reporting | -| APTS-SE-017: Engagement Boundary Definition for Recurring Tests | GV.PO-1 | A.5.1 | GOVERN 1 | CC2.1 | MUST \| Tier 2 | Recurring test cycle management, authorization renewal | -| APTS-SE-018: Cross-Cycle Finding Correlation and Regression Detection | DE.AE-1 | A.5.36 | MAP 1 | PI1.1 | SHOULD \| Tier 2 | Finding lifecycle tracking, regression detection | -| APTS-SE-019: Rate Limiting, Adaptive Backoff, and Production Impact Controls | DE.CM-1 | A.8.9 | GOVERN 1 | CC9.1 | MUST \| Tier 2 | Per-target and global rate limits, adaptive throttling, production impact prevention, response time monitoring | -| APTS-SE-020: Deployment-Triggered Testing Governance | GV.PO-1 | A.8.25 | GOVERN 1 | CC3.2 | CI/CD integration governance, scope validation for auto-triggers | -| APTS-SE-021: Scope Conflict Resolution for Overlapping Engagements | DE.CM-1 | A.8.9 | GOVERN 1 | CC6.6 | SHOULD \| Tier 3 | Multi-engagement overlap handling, restrictive constraint application | -| APTS-SE-022: Client-Side Agent Scope and Safety Boundaries | PR.AA-1 | A.8.5 | GOVERN 1 | CC6.6 | SHOULD \| Tier 2 | Agent boundary enforcement, kill switch integration | -| APTS-SE-023: Credential and Secret Lifecycle Governance | PR.DS-1, PR.AA-1 | A.8.3, A.5.33, A.8.24 | GOVERN 1 | C1.2 | MUST \| Tier 2 | Credential inventory, provenance classification, reuse policy, delegation control, and secure disposal | -| APTS-SE-024: Cloud-Native and Ephemeral Infrastructure Governance | PR.PS-1 | A.8.9, A.5.23 | GOVERN 1 | CC3.2 | Cloud control plane, serverless, and ephemeral infrastructure governance | -| APTS-SE-025: API-First and Business Logic Testing Governance | PR.PS-1 | A.5.23 | GOVERN 1 | CC3.2 | API business logic traversal, token propagation, and schema drift governance | -| APTS-SE-026: Out-of-Distribution Action Monitoring | DE.AE-2, DE.CM-1 | A.8.16 | MEASURE 2 | CC9.1 | SHOULD \| Tier 2 | Baseline-driven action-distribution monitoring with staffed review queue | - -### 6.2 Safety Controls (APTS-SC) +| APTS-SE-001: Rules of Engagement (RoE) Specification and Validation | GV.PO-01 | A.5.8 | GOVERN 1 | CC3.2 | Scope definition and validation process control | +| APTS-SE-002: IP Range Validation and RFC 1918 Awareness | ID.AM-01 | A.8.20, A.8.22 | GOVERN 1 | CC1.1 | Asset inventory validation, scope boundary enforcement | +| APTS-SE-003: Domain Scope Validation and Wildcard Handling | ID.AM-01 | A.8.20 | GOVERN 1 | CC1.1 | Domain ownership verification, third-party detection | +| APTS-SE-004: Temporal Boundary and Timezone Handling | GV.PO-01 | A.5.37, A.8.16 | GOVERN 1 | CC2.1 | Time-based operational controls, timezone handling | +| APTS-SE-005: Asset Criticality Classification and Integration | ID.AM-05 | A.5.12 | GOVERN 1 | CC4.1 | Risk-based testing restrictions per asset tier | +| APTS-SE-006: Pre-Action Scope Validation | PR.AA-01 | A.8.5 | GOVERN 1 | CC6.6 | Authorization boundary enforcement before action | +| APTS-SE-007: Dynamic Scope Monitoring and Drift Detection | DE.CM-01 | A.8.16 | MAP 1 | CC9.1 | Continuous drift detection, boundary violation alerts | +| APTS-SE-008: Temporal Scope Compliance Monitoring | DE.CM-01 | A.5.1 | GOVERN 1 | CC9.1 | Engagement window enforcement, deadline alerts | +| APTS-SE-009: Hard Deny Lists and Critical Asset Protection | PR.AA-01 | A.8.5 | GOVERN 1 | CC6.6 | Immutable asset protection, cryptographic enforcement | +| APTS-SE-010: Production Database Safeguards | PR.AA-01 | A.8.5 | GOVERN 1 | CC6.6 | Multi-layer database protection, read-only mode | +| APTS-SE-011: Multi-Tenant Environment Awareness | PR.AA-05 | A.8.5 | GOVERN 1 | CC7.2 | Cross-tenant isolation, shared infrastructure detection | +| APTS-SE-012: DNS Rebinding Attack Prevention | PR.AA-01 | A.8.9 | GOVERN 1 | CC6.6 | Network-level attack prevention, resolution validation | +| APTS-SE-013: Network Boundary and Lateral Movement Enforcement | ID.AM-01 | A.8.20 | GOVERN 1 | CC6.6 | VLAN/subnet/cloud security group boundaries | +| APTS-SE-014: Network Topology Discovery Limitations | DE.CM-01 | A.8.9 | GOVERN 1 | CC9.1 | Reconnaissance scope limitations, host/port count limits | +| APTS-SE-015: Scope Enforcement Audit and Compliance Verification | PR.PS-01 | A.5.36 | MAP 1 | CC9.1 | Complete audit trail of scope decisions | +| APTS-SE-016: Scope Refresh and Revalidation Cycle | DE.CM-01 | A.8.16 | MAP 1 | CC9.1 | Infrastructure change detection, delta reporting | +| APTS-SE-017: Engagement Boundary Definition for Recurring Tests | GV.PO-01 | A.5.1 | GOVERN 1 | CC2.1 | Recurring test cycle management, authorization renewal | +| APTS-SE-018: Cross-Cycle Finding Correlation and Regression Detection | DE.AE-02 | A.5.36 | MAP 1 | PI1.1 | Finding lifecycle tracking, regression detection | +| APTS-SE-019: Rate Limiting, Adaptive Backoff, and Production Impact Controls | DE.CM-01 | A.8.9 | GOVERN 1 | CC9.1 | Per-target and global rate limits, adaptive throttling, production impact prevention, response time monitoring | +| APTS-SE-020: Deployment-Triggered Testing Governance | GV.PO-01 | A.8.25 | GOVERN 1 | CC3.2 | CI/CD integration governance, scope validation for auto-triggers | +| APTS-SE-021: Scope Conflict Resolution for Overlapping Engagements | DE.CM-01 | A.8.9 | GOVERN 1 | CC6.6 | Multi-engagement overlap handling, restrictive constraint application | +| APTS-SE-022: Client-Side Agent Scope and Safety Boundaries | PR.AA-01 | A.8.5 | GOVERN 1 | CC6.6 | Agent boundary enforcement, kill switch integration | +| APTS-SE-023: Credential and Secret Lifecycle Governance | PR.DS-01, PR.AA-01 | A.8.3, A.5.33, A.8.24 | GOVERN 1 | C1.2 | Credential inventory, provenance classification, reuse policy, delegation control, and secure disposal | +| APTS-SE-024: Cloud-Native and Ephemeral Infrastructure Governance | PR.PS-01 | A.8.9, A.5.23 | GOVERN 1 | CC3.2 | Cloud control plane, serverless, and ephemeral infrastructure governance | +| APTS-SE-025: API-First and Business Logic Testing Governance | PR.PS-01 | A.5.23 | GOVERN 1 | CC3.2 | API business logic traversal, token propagation, and schema drift governance | +| APTS-SE-026: Out-of-Distribution Action Monitoring | DE.AE-02, DE.CM-01 | A.8.16 | MEASURE 2 | CC9.1 | Baseline-driven action-distribution monitoring with staffed review queue | + +### 7.2 Safety Controls (APTS-SC) | APTS Requirement | NIST CSF 2.0 | ISO/IEC 27001:2022 | NIST AI RMF 1.0 | SOC 2 TSC 2017 (2022 PoF) | Notes | |---|---|---|---|---|---| -| APTS-SC-001: Impact Classification and CIA Scoring | ID.RA-4 | A.5.1 | GOVERN 1 | CC4.1 | Risk assessment framework with discrete tiers | -| APTS-SC-002: Industry-Specific Impact Considerations | ID.RA-4 | A.5.1 | GOVERN 1 | CC4.1 | Regulatory impact elevation (Healthcare/Finance/CI) | -| APTS-SC-003: Real-World Impact Classification Examples | PR.PS-1 | A.5.1 | GOVERN 1 | CC3.2 | Documented scenarios with classification rationale | -| APTS-SC-004: Rate Limiting, Bandwidth, and Payload Constraints | DE.CM-1 | A.8.9 | GOVERN 1 | CC9.1 | Target-specific rate enforcement by impact tier | -| APTS-SC-005: Cascading Failure Prevention in Interconnected Systems | DE.CM-1 | A.8.9 | GOVERN 1 | A1.1 | Dependency mapping, upstream impact detection | -| APTS-SC-006: Threshold Escalation Workflow (Automated → Approval → Prohibited) | GV.RR-2 | A.5.1 | GOVERN 1 | CC4.1 | Graduated approval gates with timeout defaults | -| APTS-SC-007: Cumulative Risk Scoring with Time-Based Decay | DE.CM-1, ID.RA-5 | A.5.1 | GOVERN 1 | CC4.1 | Cumulative impact tracking with multi-factor risk algorithm and audit trail | -| APTS-SC-008: Threshold Configuration with Schema Validation | PR.PS-1 | A.5.8 | GOVERN 1 | CC3.2 | Schema-validated threshold configuration | -| APTS-SC-009: Kill Switch | PR.PS-1, RS.MA-1 | A.5.26, A.5.29 | GOVERN 1 | CC4.1 | Independent halt mechanisms, Phase 1/2 sequencing | -| APTS-SC-010: Health Check Monitoring, Threshold Adjustment, and Automatic Halt | DE.CM-1 | A.8.9 | MEASURE 1 | A1.1 | Dynamic threshold adjustment and automatic halt on target degradation | -| APTS-SC-011: Condition-Based Automated Termination | DE.CM-1 | A.5.1 | MEASURE 1 | A1.1 | Automated service unavailability response | -| APTS-SC-012: Network-Level Circuit Breaker | DE.CM-1 | A.8.9 | MEASURE 1 | A1.1 | Degradation-triggered suspension with recovery probe | -| APTS-SC-013: Time-Based Automatic Termination with Operator Override | DE.CM-1 | A.5.1 | GOVERN 1 | CC2.1 | Engagement duration limits with advance warning | -| APTS-SC-014: Reversible Action Tracking and Rollback | PR.PS-1 | A.5.1 | MANAGE 1 | CC7.2 | State capture, rollback procedures, verification | -| APTS-SC-015: Post-Test System Integrity Validation | DE.CM-1 | A.8.9 | MANAGE 1 | CC7.2 | Baseline comparison, discrepancy escalation | -| APTS-SC-016: Evidence Preservation and Automated Cleanup | PR.PS-1 | A.5.28 | MANAGE 1 | CC7.2 | Immutable evidence storage, idempotent artifact removal | -| APTS-SC-017: External Watchdog and Operator Notification | DE.CM-1 | A.8.9 | MEASURE 1 | A1.1 | Independent health verification, operator SLA | -| APTS-SC-018: Incident Containment and Recovery | RS.MA-1 | A.5.24 | MANAGE 1 | A1.1 | Automatic isolation, credential rotation, recovery RTO | -| APTS-SC-019: Kernel-Enforced Execution Sandbox for Agent Runtime | PR.PS-1, PR.IR-1 | A.8.22, A.8.25 | GOVERN 1 | CC6.6 | MUST \| Tier 2 | Kernel-enforced sandbox (namespaces, seccomp, AppArmor/SELinux, hypervisor/gVisor/Kata); agent holds no credentials to move its own boundary | -| APTS-SC-020: External Enforcement of Tool and Action Allowlist | PR.PS-1, PR.AA-1 | A.8.5, A.8.25 | GOVERN 1 | CC6.6 | MUST \| Tier 1 | Allowlist enforced by external gateway or policy engine, not by the model system prompt | - -### 6.3 Human Oversight (APTS-HO) +| APTS-SC-001: Impact Classification and CIA Scoring | ID.RA-04 | A.5.1 | GOVERN 1 | CC4.1 | Risk assessment framework with discrete tiers | +| APTS-SC-002: Industry-Specific Impact Considerations | ID.RA-04 | A.5.1 | GOVERN 1 | CC4.1 | Regulatory impact elevation (Healthcare/Finance/CI) | +| APTS-SC-003: Real-World Impact Classification Examples | PR.PS-01 | A.5.1 | GOVERN 1 | CC3.2 | Documented scenarios with classification rationale | +| APTS-SC-004: Rate Limiting, Bandwidth, and Payload Constraints | DE.CM-01 | A.8.9 | GOVERN 1 | CC9.1 | Target-specific rate enforcement by impact tier | +| APTS-SC-005: Cascading Failure Prevention in Interconnected Systems | DE.CM-01 | A.8.9 | GOVERN 1 | A1.1 | Dependency mapping, upstream impact detection | +| APTS-SC-006: Threshold Escalation Workflow (Automated → Approval → Prohibited) | GV.RR-02 | A.5.1 | GOVERN 1 | CC4.1 | Graduated approval gates with timeout defaults | +| APTS-SC-007: Cumulative Risk Scoring with Time-Based Decay | DE.CM-01, ID.RA-05 | A.5.1 | GOVERN 1 | CC4.1 | Cumulative impact tracking with multi-factor risk algorithm and audit trail | +| APTS-SC-008: Threshold Configuration with Schema Validation | PR.PS-01 | A.5.8 | GOVERN 1 | CC3.2 | Schema-validated threshold configuration | +| APTS-SC-009: Kill Switch | PR.PS-01, RS.MA-01 | A.5.26, A.5.29 | GOVERN 1 | CC4.1 | Independent halt mechanisms, Phase 1/2 sequencing | +| APTS-SC-010: Health Check Monitoring, Threshold Adjustment, and Automatic Halt | DE.CM-01 | A.8.9 | MEASURE 1 | A1.1 | Dynamic threshold adjustment and automatic halt on target degradation | +| APTS-SC-011: Condition-Based Automated Termination | DE.CM-01 | A.5.1 | MEASURE 1 | A1.1 | Automated service unavailability response | +| APTS-SC-012: Network-Level Circuit Breaker | DE.CM-01 | A.8.9 | MEASURE 1 | A1.1 | Degradation-triggered suspension with recovery probe | +| APTS-SC-013: Time-Based Automatic Termination with Operator Override | DE.CM-01 | A.5.1 | GOVERN 1 | CC2.1 | Engagement duration limits with advance warning | +| APTS-SC-014: Reversible Action Tracking and Rollback | PR.PS-01 | A.5.1 | MANAGE 1 | CC7.2 | State capture, rollback procedures, verification | +| APTS-SC-015: Post-Test System Integrity Validation | DE.CM-01 | A.8.9 | MANAGE 1 | CC7.2 | Baseline comparison, discrepancy escalation | +| APTS-SC-016: Evidence Preservation and Automated Cleanup | PR.PS-01 | A.5.28 | MANAGE 1 | CC7.2 | Immutable evidence storage, idempotent artifact removal | +| APTS-SC-017: External Watchdog and Operator Notification | DE.CM-01 | A.8.9 | MEASURE 1 | A1.1 | Independent health verification, operator SLA | +| APTS-SC-018: Incident Containment and Recovery | RS.MA-01 | A.5.24 | MANAGE 1 | A1.1 | Automatic isolation, credential rotation, recovery RTO | +| APTS-SC-019: Kernel-Enforced Execution Sandbox for Agent Runtime | PR.PS-01, PR.IR-01 | A.8.22, A.8.25 | GOVERN 1 | CC6.6 | Kernel-enforced sandbox (namespaces, seccomp, AppArmor/SELinux, hypervisor/gVisor/Kata); agent holds no credentials to move its own boundary | +| APTS-SC-020: External Enforcement of Tool and Action Allowlist | PR.PS-01, PR.AA-01 | A.8.5, A.8.25 | GOVERN 1 | CC6.6 | Allowlist enforced by external gateway or policy engine, not by the model system prompt | + +### 7.3 Human Oversight (APTS-HO) | APTS Requirement | NIST CSF 2.0 | ISO/IEC 27001:2022 | NIST AI RMF 1.0 | SOC 2 TSC 2017 (2022 PoF) | Notes | |---|---|---|---|---|---| -| APTS-HO-001: Mandatory Pre-Approval Gates for Autonomy Levels L1 and L2 | GV.RR-2 | A.5.2 | GOVERN 1 | CC3.2 | Mandatory approval for autonomy levels L1 and L2 | -| APTS-HO-002: Real-Time Monitoring and Intervention Capability | DE.CM-1 | A.8.16 | MEASURE 2 | CC9.1 | Live activity visualization and monitoring of autonomous operations | -| APTS-HO-003: Decision Timeout and Default-Safe Behavior | GV.RR-1 | A.5.3 | GOVERN 1 | CC4.1 | SLA-based approval windows with safe fallback behavior | -| APTS-HO-004: Authority Delegation Matrix | GV.RR-2 | A.5.2 | GOVERN 1 | CC4.1 | Clear definition and enforcement of delegated authorities | -| APTS-HO-005: Delegation Chain-of-Custody and Decision Audit Trail | GV.RR-3 | A.5.3 | GOVERN 1 | CC3.2 | Complete chain of delegation with audit trail | -| APTS-HO-006: Graceful Pause Mechanism with State Preservation | PR.IR-1 | A.5.24 | MEASURE 1 | CC9.1 | Operator-initiated pause with full state recovery capability | -| APTS-HO-007: Mid-Engagement Redirect Capability | PR.IR-1 | A.5.37 | GOVERN 1 | CC3.2 | Ability to redirect engagement scope mid-test | -| APTS-HO-008: Immediate Kill Switch with State Dump | RS.MA-1 | A.5.24 | MANAGE 1 | CC4.1 | Immediate termination with complete state capture | -| APTS-HO-009: Multi-Operator Kill Switch Authority and Handoff | RS.MA-1 | A.5.26 | MAP 1 | CC9.1 | Multiple kill switch authorities with handoff procedures | -| APTS-HO-010: Mandatory Human Decision Points Before Irreversible Actions | GV.RR-2 | A.5.2 | GOVERN 1 | CC4.1 | Human approval required for permanent or irreversible actions | -| APTS-HO-011: Unexpected Findings Escalation Framework | DE.AE-2 | A.5.24 | GOVERN 1 | CC3.2 | Escalation procedures for unexpected or anomalous findings | -| APTS-HO-012: Impact Threshold Breach Escalation | DE.AE-2 | A.5.25 | MEASURE 1 | CC4.1 | Automatic escalation when impact thresholds exceeded | -| APTS-HO-013: Confidence-Based Escalation (Scope Uncertainty) | DE.AE-2 | A.5.24 | GOVERN 1 | CC4.1 | Escalation triggers based on confidence levels | -| APTS-HO-014: Legal and Compliance Escalation Triggers | RS.CO-2 | A.5.25 | GOVERN 1 | CC3.2 | Escalation for legal and compliance boundary concerns | -| APTS-HO-015: Real-Time Activity Monitoring and Multi-Channel Notification | DE.CM-1 | A.8.16 | MEASURE 1 | CC9.1 | Real-time monitoring with multi-channel alerts | -| APTS-HO-016: Alert Fatigue Mitigation and Smart Aggregation | DE.AE-3 | A.8.16 | MEASURE 1 | CC9.1 | Intelligent alert filtering and aggregation | -| APTS-HO-017: Stakeholder Notification and Engagement Closure | RS.CO-3 | A.5.37 | MANAGE 1 | CC3.2 | Notification procedures and engagement conclusion | -| APTS-HO-018: Operator Qualification, Training, and Competency Governance | GV.RR-2 | A.6.3 | GOVERN 1 | CC3.2 | Minimum competency and certification requirements, full training curriculum and incident response, continuous competency assessment and succession planning | -| APTS-HO-019: 24/7 Operational Continuity and Shift Handoff | GV.RR-2 | A.5.2, A.5.3 | MANAGE 2 | CC3.2 | Shift handoff, stale approval expiry, suppression drift, and operator desensitization monitoring | - -### 6.4 Graduated Autonomy (APTS-AL) +| APTS-HO-001: Mandatory Pre-Approval Gates for Autonomy Levels L1 and L2 | GV.RR-02 | A.5.2 | GOVERN 1 | CC3.2 | Mandatory approval for autonomy levels L1 and L2 | +| APTS-HO-002: Real-Time Monitoring and Intervention Capability | DE.CM-01 | A.8.16 | MEASURE 2 | CC9.1 | Live activity visualization and monitoring of autonomous operations | +| APTS-HO-003: Decision Timeout and Default-Safe Behavior | GV.RR-01 | A.5.3 | GOVERN 1 | CC4.1 | SLA-based approval windows with safe fallback behavior | +| APTS-HO-004: Authority Delegation Matrix | GV.RR-02 | A.5.2 | GOVERN 1 | CC4.1 | Clear definition and enforcement of delegated authorities | +| APTS-HO-005: Delegation Chain-of-Custody and Decision Audit Trail | GV.RR-03 | A.5.3 | GOVERN 1 | CC3.2 | Complete chain of delegation with audit trail | +| APTS-HO-006: Graceful Pause Mechanism with State Preservation | PR.IR-01 | A.5.24 | MEASURE 1 | CC9.1 | Operator-initiated pause with full state recovery capability | +| APTS-HO-007: Mid-Engagement Redirect Capability | PR.IR-01 | A.5.37 | GOVERN 1 | CC3.2 | Ability to redirect engagement scope mid-test | +| APTS-HO-008: Immediate Kill Switch with State Dump | RS.MA-01 | A.5.24 | MANAGE 1 | CC4.1 | Immediate termination with complete state capture | +| APTS-HO-009: Multi-Operator Kill Switch Authority and Handoff | RS.MA-01 | A.5.26 | MAP 1 | CC9.1 | Multiple kill switch authorities with handoff procedures | +| APTS-HO-010: Mandatory Human Decision Points Before Irreversible Actions | GV.RR-02 | A.5.2 | GOVERN 1 | CC4.1 | Human approval required for permanent or irreversible actions | +| APTS-HO-011: Unexpected Findings Escalation Framework | DE.AE-02 | A.5.24 | GOVERN 1 | CC3.2 | Escalation procedures for unexpected or anomalous findings | +| APTS-HO-012: Impact Threshold Breach Escalation | DE.AE-02 | A.5.25 | MEASURE 1 | CC4.1 | Automatic escalation when impact thresholds exceeded | +| APTS-HO-013: Confidence-Based Escalation (Scope Uncertainty) | DE.AE-02 | A.5.24 | GOVERN 1 | CC4.1 | Escalation triggers based on confidence levels | +| APTS-HO-014: Legal and Compliance Escalation Triggers | RS.CO-02 | A.5.25 | GOVERN 1 | CC3.2 | Escalation for legal and compliance boundary concerns | +| APTS-HO-015: Real-Time Activity Monitoring and Multi-Channel Notification | DE.CM-01 | A.8.16 | MEASURE 1 | CC9.1 | Real-time monitoring with multi-channel alerts | +| APTS-HO-016: Alert Fatigue Mitigation and Smart Aggregation | DE.AE-03 | A.8.16 | MEASURE 1 | CC9.1 | Intelligent alert filtering and aggregation | +| APTS-HO-017: Stakeholder Notification and Engagement Closure | RS.CO-03 | A.5.37 | MANAGE 1 | CC3.2 | Notification procedures and engagement conclusion | +| APTS-HO-018: Operator Qualification, Training, and Competency Governance | GV.RR-02 | A.6.3 | GOVERN 1 | CC3.2 | Minimum competency and certification requirements, full training curriculum and incident response, continuous competency assessment and succession planning | +| APTS-HO-019: 24/7 Operational Continuity and Shift Handoff | GV.RR-02 | A.5.2, A.5.3 | MANAGE 2 | CC3.2 | Shift handoff, stale approval expiry, suppression drift, and operator desensitization monitoring | + +### 7.4 Graduated Autonomy (APTS-AL) | APTS Requirement | NIST CSF 2.0 | ISO/IEC 27001:2022 | NIST AI RMF 1.0 | SOC 2 TSC 2017 (2022 PoF) | Notes | |---|---|---|---|---|---| -| APTS-AL-001: Single Technique Execution | PR.PS-1 | A.8.25 | GOVERN 1 | CC3.2 | Atomic action constraint at L1 | -| APTS-AL-002: Human-Directed Target and Technique Selection | GV.RR-2 | A.5.1 | GOVERN 1 | CC3.2 | Operator-driven targeting at L1 | -| APTS-AL-003: Parameter Configuration by Human Operator | PR.PS-1 | A.5.8 | GOVERN 1 | CC3.2 | No defaults without explicit confirmation | -| APTS-AL-004: No Automated Chaining or Sequential Decision-Making | PR.PS-1 | A.8.25 | GOVERN 1 | CC3.2 | Prohibition on autonomous workflow sequencing at L1 | -| APTS-AL-005: Mandatory Logging and Human-Reviewable Audit Trail | DE.CM-1 | A.8.15 | MAP 1 | CC9.1 | Complete audit trail with structured fields | -| APTS-AL-006: Basic Scope Validation and Policy Enforcement | PR.AA-1 | A.8.5 | GOVERN 1 | CC6.6 | Policy enforcement before technique execution | -| APTS-AL-007: Multi-Step Technique Chaining Within Single Phase | PR.PS-1 | A.8.25 | GOVERN 1 | CC3.2 | L2 multi-step chaining within phase boundaries | -| APTS-AL-008: Real-Time Human Monitoring and Approval Gates | DE.CM-1 | A.8.16 | GOVERN 1 | CC4.1 | L2 real-time monitoring with approval gates | -| APTS-AL-009: Tool-Proposed Actions with Operator Modification Capability | GV.RR-2 | A.5.2 | GOVERN 1 | CC4.1 | L2 tool proposes, operator modifies/approves | -| APTS-AL-010: Step-by-Step Audit Log with Phase Transitions | DE.CM-1 | A.8.15 | MAP 1 | CC9.1 | L2 detailed phase transition logging | -| APTS-AL-011: Escalation Triggers and Exception Handling | DE.AE-2 | A.5.24 | GOVERN 1 | CC4.1 | Automatic escalation on boundary conditions | -| APTS-AL-012: Kill Switch and Pause Capability | PR.IR-1 | A.5.24 | GOVERN 1 | CC4.1 | Immediate halt and pause at all levels | -| APTS-AL-013: Complete Attack Chain Execution Within Boundaries | PR.PS-1 | A.8.25 | GOVERN 1 | CC3.2 | L3 full attack chain within defined boundaries | -| APTS-AL-014: Boundary Definition and Enforcement Framework | PR.AA-1 | A.8.5 | GOVERN 1 | CC6.6 | L3 boundary definition and runtime enforcement | -| APTS-AL-015: Pre-Approved Action Categories and Decision Trees | PR.PS-1 | A.5.8 | GOVERN 1 | CC3.2 | L3 pre-approved action categories | -| APTS-AL-016: Continuous Boundary Monitoring and Breach Detection | DE.CM-1 | A.8.16 | MEASURE 1 | CC4.1 | L3 continuous monitoring for boundary violations | -| APTS-AL-017: Multi-Target Assessment Management | DE.CM-1 | A.8.25 | GOVERN 1 | CC9.1 | L3 concurrent multi-target management | -| APTS-AL-018: Incident Response During Autonomous Testing | RS.MA-1 | A.5.24 | MANAGE 1 | CC4.1 | Incident response procedures during autonomous ops | -| APTS-AL-019: Multi-Target Campaign Management Without Intervention | PR.PS-1 | A.8.25 | GOVERN 1 | CC3.2 | L4 fully autonomous campaign management | -| APTS-AL-020: Dynamic Scope Adjustment and Target Discovery | GV.PO-1 | A.8.16, A.5.37 | GOVERN 1 | CC4.1 | L4 dynamic scope within pre-approved boundaries | -| APTS-AL-021: Adaptive Testing Strategy and Resource Reallocation | PR.PS-1 | A.5.37 | GOVERN 1 | CC3.2 | L4 adaptive strategy with resource optimization | -| APTS-AL-022: Continuous Risk Assessment and Automated Escalation | DE.AE-2 | A.5.24 | MEASURE 1 | CC4.1 | L4 continuous risk assessment | -| APTS-AL-023: Complete Audit Trail and Forensic Reconstruction | DE.CM-1 | A.8.15 | MAP 1 | CC9.1 | SHOULD \| Tier 3 | L4 complete forensic-grade audit trail | -| APTS-AL-024: Periodic Autonomous Review Cycles | DE.CM-1 | A.5.36 | MEASURE 1 | CC9.1 | L4 periodic review cycles | -| APTS-AL-025: Autonomy Level Authorization, Transition, and Reauthorization | GV.RR-1 | A.5.2 | GOVERN 1 | CC3.2 | Level authorization and periodic reauthorization | -| APTS-AL-026: Incident Investigation and Autonomy Level Adjustment | RS.MA-1 | A.5.24 | MANAGE 1 | CC4.1 | Post-incident autonomy level review | -| APTS-AL-027: Evasion and Stealth Mode Governance | GV.PO-1 | A.5.2, A.5.31 | GOVERN 1 | CC3.2 | SHOULD \| Tier 3 | Default-off evasion, explicit authorization, disclosure, prohibited classes, impact reclassification | -| APTS-AL-028: Containment Verification for L3 and L4 Autonomy | DE.CM-1, ID.IM-2 | A.8.16, A.8.29 | MEASURE 1 | CC7.2 | MUST \| Tier 3 | Operator-run probes of sandbox and allowlist boundaries at quarterly (L3) or monthly (L4) cadence; verification MUST NOT be performed by the agent runtime | - -### 6.5 Auditability (APTS-AR) +| APTS-AL-001: Single Technique Execution | PR.PS-01 | A.8.25 | GOVERN 1 | CC3.2 | Atomic action constraint at L1 | +| APTS-AL-002: Human-Directed Target and Technique Selection | GV.RR-02 | A.5.1 | GOVERN 1 | CC3.2 | Operator-driven targeting at L1 | +| APTS-AL-003: Parameter Configuration by Human Operator | PR.PS-01 | A.5.8 | GOVERN 1 | CC3.2 | No defaults without explicit confirmation | +| APTS-AL-004: No Automated Chaining or Sequential Decision-Making | PR.PS-01 | A.8.25 | GOVERN 1 | CC3.2 | Prohibition on autonomous workflow sequencing at L1 | +| APTS-AL-005: Mandatory Logging and Human-Reviewable Audit Trail | DE.CM-01 | A.8.15 | MAP 1 | CC9.1 | Complete audit trail with structured fields | +| APTS-AL-006: Basic Scope Validation and Policy Enforcement | PR.AA-01 | A.8.5 | GOVERN 1 | CC6.6 | Policy enforcement before technique execution | +| APTS-AL-007: Multi-Step Technique Chaining Within Single Phase | PR.PS-01 | A.8.25 | GOVERN 1 | CC3.2 | L2 multi-step chaining within phase boundaries | +| APTS-AL-008: Real-Time Human Monitoring and Approval Gates | DE.CM-01 | A.8.16 | GOVERN 1 | CC4.1 | L2 real-time monitoring with approval gates | +| APTS-AL-009: Tool-Proposed Actions with Operator Modification Capability | GV.RR-02 | A.5.2 | GOVERN 1 | CC4.1 | L2 tool proposes, operator modifies/approves | +| APTS-AL-010: Step-by-Step Audit Log with Phase Transitions | DE.CM-01 | A.8.15 | MAP 1 | CC9.1 | L2 detailed phase transition logging | +| APTS-AL-011: Escalation Triggers and Exception Handling | DE.AE-02 | A.5.24 | GOVERN 1 | CC4.1 | Automatic escalation on boundary conditions | +| APTS-AL-012: Kill Switch and Pause Capability | PR.IR-01 | A.5.24 | GOVERN 1 | CC4.1 | Immediate halt and pause at all levels | +| APTS-AL-013: Complete Attack Chain Execution Within Boundaries | PR.PS-01 | A.8.25 | GOVERN 1 | CC3.2 | L3 full attack chain within defined boundaries | +| APTS-AL-014: Boundary Definition and Enforcement Framework | PR.AA-01 | A.8.5 | GOVERN 1 | CC6.6 | L3 boundary definition and runtime enforcement | +| APTS-AL-015: Pre-Approved Action Categories and Decision Trees | PR.PS-01 | A.5.8 | GOVERN 1 | CC3.2 | L3 pre-approved action categories | +| APTS-AL-016: Continuous Boundary Monitoring and Breach Detection | DE.CM-01 | A.8.16 | MEASURE 1 | CC4.1 | L3 continuous monitoring for boundary violations | +| APTS-AL-017: Multi-Target Assessment Management | DE.CM-01 | A.8.25 | GOVERN 1 | CC9.1 | L3 concurrent multi-target management | +| APTS-AL-018: Incident Response During Autonomous Testing | RS.MA-01 | A.5.24 | MANAGE 1 | CC4.1 | Incident response procedures during autonomous ops | +| APTS-AL-019: Multi-Target Campaign Management Without Intervention | PR.PS-01 | A.8.25 | GOVERN 1 | CC3.2 | L4 fully autonomous campaign management | +| APTS-AL-020: Dynamic Scope Adjustment and Target Discovery | GV.PO-01 | A.8.16, A.5.37 | GOVERN 1 | CC4.1 | L4 dynamic scope within pre-approved boundaries | +| APTS-AL-021: Adaptive Testing Strategy and Resource Reallocation | PR.PS-01 | A.5.37 | GOVERN 1 | CC3.2 | L4 adaptive strategy with resource optimization | +| APTS-AL-022: Continuous Risk Assessment and Automated Escalation | DE.AE-02 | A.5.24 | MEASURE 1 | CC4.1 | L4 continuous risk assessment | +| APTS-AL-023: Complete Audit Trail and Forensic Reconstruction | DE.CM-01 | A.8.15 | MAP 1 | CC9.1 | L4 complete forensic-grade audit trail | +| APTS-AL-024: Periodic Autonomous Review Cycles | DE.CM-01 | A.5.36 | MEASURE 1 | CC9.1 | L4 periodic review cycles | +| APTS-AL-025: Autonomy Level Authorization, Transition, and Reauthorization | GV.RR-01 | A.5.2 | GOVERN 1 | CC3.2 | Level authorization and periodic reauthorization | +| APTS-AL-026: Incident Investigation and Autonomy Level Adjustment | RS.MA-01 | A.5.24 | MANAGE 1 | CC4.1 | Post-incident autonomy level review | +| APTS-AL-027: Evasion and Stealth Mode Governance | GV.PO-01 | A.5.2, A.5.31 | GOVERN 1 | CC3.2 | Default-off evasion, explicit authorization, disclosure, prohibited classes, impact reclassification | +| APTS-AL-028: Containment Verification for L3 and L4 Autonomy | DE.CM-01, ID.IM-02 | A.8.16, A.8.29 | MEASURE 1 | CC7.2 | Operator-run probes of sandbox and allowlist boundaries at quarterly (L3) or monthly (L4) cadence; verification MUST NOT be performed by the agent runtime | + +### 7.5 Auditability (APTS-AR) | APTS Requirement | NIST CSF 2.0 | ISO/IEC 27001:2022 | NIST AI RMF 1.0 | SOC 2 TSC 2017 (2022 PoF) | Notes | |---|---|---|---|---|---| -| APTS-AR-001: Structured Event Logging with Schema Validation | DE.CM-1, PR.PS-1 | A.8.15 | MAP 1 | CC9.1 | Millisecond-precision timestamps, correlation IDs, schema-validated structured format | -| APTS-AR-002: State Transition Logging | DE.CM-1 | A.8.15 | MEASURE 1 | CC9.1 | Phase change documentation with authorization | -| APTS-AR-003: Resource Utilization Metrics Logging | DE.CM-1 | A.8.9 | MEASURE 1 | CC9.1 | Network/system metrics per operation | -| APTS-AR-004: Decision Point Logging and Confidence Scoring | DE.AE-1 | A.8.15 | MAP 1 | CC9.1 | Confidence scores, alternatives, rationale | -| APTS-AR-005: Log Retention and Archival Requirements | PR.PS-1 | A.5.33 | MANAGE 1 | A1.1 | Minimum retention per engagement, compliance alignment | -| APTS-AR-006: Decision Chain of Reasoning and Alternative Evaluation | DE.AE-1 | A.8.15 | MAP 1 | CC9.1 | Complete reasoning chain with alternative evaluation and rejection rationale | -| APTS-AR-007: Risk Assessment Documentation Before Action Execution | ID.RA-5 | A.5.1 | GOVERN 1 | CC4.1 | Pre-action risk assessment documentation | -| APTS-AR-008: Context-Aware Decision Logging | DE.AE-1 | A.8.15 | MAP 1 | CC9.1 | Environmental context captured with decisions | -| APTS-AR-009: Transparency Report Requirements | GV.OC-2 | A.5.37 | GOVERN 1 | CC2.1 | Public transparency reporting requirements | -| APTS-AR-010: Cryptographic Hashing of All Evidence | PR.DS-1 | A.8.24 | MANAGE 1 | C1.1 | SHA-256+ hashing of all evidence artifacts | -| APTS-AR-011: Chain of Custody for Evidence | PR.DS-1 | A.5.28 | MANAGE 1 | CC9.1 | Evidence provenance and custody tracking | -| APTS-AR-012: Tamper-Evident Logging with Hash Chains | PR.DS-1 | A.8.24 | MANAGE 1 | CC7.2 | Append-only hash chain integrity | -| APTS-AR-013: RFC 3161 Trusted Timestamp Integration | PR.DS-1 | A.8.24 | MANAGE 1 | CC9.1 | External trusted timestamping for evidence integrity | -| APTS-AR-014: Screenshot and Packet Capture Evidence Standards | PR.PS-1 | A.5.28 | MEASURE 1 | CC9.1 | Evidence capture format and integrity requirements | -| APTS-AR-015: Evidence Classification and Sensitive Data Handling | PR.DS-1 | A.5.12 | MANAGE 1 | C1.1 | Evidence classification and redaction procedures | -| APTS-AR-016: Platform Integrity and Supply Chain Attestation | PR.PS-1 | A.5.21 | GOVERN 1 | CC7.2 | Platform binary and supply chain verification | -| APTS-AR-017: Safety Control Regression Testing After Platform Updates | PR.PS-1 | A.8.25 | MEASURE 1 | CC9.1 | Post-update safety validation | -| APTS-AR-018: Customer Notification for Behavior-Affecting Updates | RS.CO-3 | A.8.32 | MANAGE 1 | C1.1 | Advance notification of behavior changes | -| APTS-AR-019: AI/ML Model Change Tracking and Drift Detection | DE.CM-1 | A.8.25 | MEASURE 1 | CC9.1 | Model version tracking and drift monitoring | -| APTS-AR-020: Audit Trail Isolation from Agent Runtime | PR.DS-1, PR.PS-1 | A.8.15, A.8.24 | MAP 1 | CC7.2 | MUST \| Tier 2 | Authoritative audit trail on append-only infrastructure the agent runtime cannot reach (WORM, external SIEM, dedicated log service) | - -### 6.6 Manipulation Resistance (APTS-MR) +| APTS-AR-001: Structured Event Logging with Schema Validation | DE.CM-01, PR.PS-01 | A.8.15 | MAP 1 | CC9.1 | Millisecond-precision timestamps, correlation IDs, schema-validated structured format | +| APTS-AR-002: State Transition Logging | DE.CM-01 | A.8.15 | MEASURE 1 | CC9.1 | Phase change documentation with authorization | +| APTS-AR-003: Resource Utilization Metrics Logging | DE.CM-01 | A.8.9 | MEASURE 1 | CC9.1 | Network/system metrics per operation | +| APTS-AR-004: Decision Point Logging and Confidence Scoring | DE.AE-02 | A.8.15 | MAP 1 | CC9.1 | Confidence scores, alternatives, rationale | +| APTS-AR-005: Log Retention and Archival Requirements | PR.PS-01 | A.5.33 | MANAGE 1 | A1.1 | Minimum retention per engagement, compliance alignment | +| APTS-AR-006: Decision Chain of Reasoning and Alternative Evaluation | DE.AE-02 | A.8.15 | MAP 1 | CC9.1 | Complete reasoning chain with alternative evaluation and rejection rationale | +| APTS-AR-007: Risk Assessment Documentation Before Action Execution | ID.RA-05 | A.5.1 | GOVERN 1 | CC4.1 | Pre-action risk assessment documentation | +| APTS-AR-008: Context-Aware Decision Logging | DE.AE-02 | A.8.15 | MAP 1 | CC9.1 | Environmental context captured with decisions | +| APTS-AR-009: Transparency Report Requirements | GV.OC-02 | A.5.37 | GOVERN 1 | CC2.1 | Public transparency reporting requirements | +| APTS-AR-010: Cryptographic Hashing of All Evidence | PR.DS-01 | A.8.24 | MANAGE 1 | C1.1 | SHA-256+ hashing of all evidence artifacts | +| APTS-AR-011: Chain of Custody for Evidence | PR.DS-01 | A.5.28 | MANAGE 1 | CC9.1 | Evidence provenance and custody tracking | +| APTS-AR-012: Tamper-Evident Logging with Hash Chains | PR.DS-01 | A.8.24 | MANAGE 1 | CC7.2 | Append-only hash chain integrity | +| APTS-AR-013: RFC 3161 Trusted Timestamp Integration | PR.DS-01 | A.8.24 | MANAGE 1 | CC9.1 | External trusted timestamping for evidence integrity | +| APTS-AR-014: Screenshot and Packet Capture Evidence Standards | PR.PS-01 | A.5.28 | MEASURE 1 | CC9.1 | Evidence capture format and integrity requirements | +| APTS-AR-015: Evidence Classification and Sensitive Data Handling | PR.DS-01 | A.5.12 | MANAGE 1 | C1.1 | Evidence classification and redaction procedures | +| APTS-AR-016: Platform Integrity and Supply Chain Attestation | PR.PS-01 | A.5.21 | GOVERN 1 | CC7.2 | Platform binary and supply chain verification | +| APTS-AR-017: Safety Control Regression Testing After Platform Updates | PR.PS-01 | A.8.25 | MEASURE 1 | CC9.1 | Post-update safety validation | +| APTS-AR-018: Customer Notification for Behavior-Affecting Updates | RS.CO-03 | A.8.32 | MANAGE 1 | C1.1 | Advance notification of behavior changes | +| APTS-AR-019: AI/ML Model Change Tracking and Drift Detection | DE.CM-01 | A.8.25 | MEASURE 1 | CC9.1 | Model version tracking and drift monitoring | +| APTS-AR-020: Audit Trail Isolation from Agent Runtime | PR.DS-01, PR.PS-01 | A.8.15, A.8.24 | MAP 1 | CC7.2 | Authoritative audit trail on append-only infrastructure the agent runtime cannot reach (WORM, external SIEM, dedicated log service) | + +### 7.6 Manipulation Resistance (APTS-MR) | APTS Requirement | NIST CSF 2.0 | ISO/IEC 27001:2022 | NIST AI RMF 1.0 | SOC 2 TSC 2017 (2022 PoF) | Notes | |---|---|---|---|---|---| -| APTS-MR-001: Instruction Boundary Enforcement | PR.PS-1 | A.8.25 | GOVERN 1 | CC3.2 | Operator instruction isolation from target content | -| APTS-MR-002: Response Validation & Sanitization | DE.AE-1 | A.8.25 | MEASURE 1 | CC9.1 | Target content parsing in sandbox, injection detection | -| APTS-MR-003: Error Message Neutrality | DE.AE-1 | A.8.25 | MEASURE 1 | CC9.1 | Error data analysis without behavioral influence | -| APTS-MR-004: Configuration File Integrity Verification | PR.PS-1 | A.8.25 | GOVERN 1 | CC3.2 | Configuration content as data, not directives | -| APTS-MR-005: Authority Claim Detection & Rejection | PR.PS-1 | A.8.25 | GOVERN 1 | CC3.2 | Target-sourced authority claims rejected | -| APTS-MR-006: Decision Boundary Enforcement | PR.PS-1 | A.8.25 | GOVERN 1 | CC6.6 | SHOULD \| Tier 2 | Decision logic isolated from target influence | -| APTS-MR-007: Redirect Following Policy | PR.AA-1 | A.8.20 | GOVERN 1 | CC6.6 | HTTP redirect scope validation | -| APTS-MR-008: DNS and Network-Level Redirect Prevention | PR.AA-1 | A.8.20 | GOVERN 1 | CC6.6 | DNS rebinding and network redirect defense | -| APTS-MR-009: Server-Side Request Forgery (SSRF) Prevention in Testing | PR.AA-1 | A.8.25 | GOVERN 1 | CC6.6 | SSRF prevention in testing operations | -| APTS-MR-010: Scope Expansion Social Engineering Prevention | PR.PS-1 | A.5.2 | GOVERN 1 | CC3.2 | Social engineering scope expansion defense | -| APTS-MR-011: Out-of-Band Communication Prevention | PR.PS-1 | A.8.25 | GOVERN 1 | CC6.6 | No out-of-band communication channels | -| APTS-MR-012: Immutable Scope Enforcement Architecture | PR.AA-1 | A.8.5 | GOVERN 1 | CC6.6 | Scope cannot be modified by target interaction | -| APTS-MR-013: Adversarial Example Detection in Vulnerability Classification | DE.AE-1 | A.8.25 | MEASURE 1 | CC9.1 | Adversarial input detection in classification | -| APTS-MR-014: Resource Exhaustion and Tarpit Attack Prevention | DE.CM-1 | A.8.9 | MEASURE 1 | CC9.1 | Tarpit and resource exhaustion defense | -| APTS-MR-015: Deceptive Authentication Honeypots | DE.AE-1 | A.8.25 | MEASURE 1 | CC9.1 | Honeypot and deceptive credential detection | -| APTS-MR-016: Anti-Automation Defense Detection | DE.AE-1 | A.8.25 | MEASURE 1 | CC9.1 | CAPTCHA and anti-automation detection | -| APTS-MR-017: Anomaly Detection in Response Patterns | DE.AE-1 | A.8.25 | MEASURE 1 | CC9.1 | Response pattern anomaly detection | -| APTS-MR-018: AI Model Input/Output Architectural Boundary | PR.PS-1, PR.IR-1 | A.8.22, A.8.27 | GOVERN 1 | CC3.2 | AI model I/O isolation architecture | -| APTS-MR-019: Discovered Credential Protection | PR.DS-1 | A.8.3 | GOVERN 1 | C1.2 | Discovered credentials not auto-used cross-system | -| APTS-MR-020: Adversarial Validation and Resilience Testing of Safety Controls | DE.CM-1 | A.8.25 | MEASURE 1 | CC9.1 | Periodic red-team testing of safety controls, safety control resilience under adversarial conditions | -| APTS-MR-021: Data Isolation Adversarial Testing | DE.CM-1 | A.8.22 | MEASURE 1 | CC7.2 | Cross-tenant isolation adversarial testing | -| APTS-MR-022: Inter-Model Trust Boundaries and Output Validation | PR.DS-1 | A.8.25, A.8.26 | MANAGE 1 | CC7.2 | Inter-component sanitization, shared state integrity, pipeline documentation | -| APTS-MR-023: Agent Runtime as Untrusted Component in Threat Model | ID.RA-1, GV.RM-1 | A.5.7, A.8.27 | MAP 1 | CC3.2 | MUST \| Tier 2 | Threat model names agent runtime as untrusted; agent-originated threats traced to architectural containment controls (SC-019, SC-020, AR-020, MR-012) | - -### 6.7 Supply Chain Trust (APTS-TP) +| APTS-MR-001: Instruction Boundary Enforcement | PR.PS-01 | A.8.25 | GOVERN 1 | CC3.2 | Operator instruction isolation from target content | +| APTS-MR-002: Response Validation & Sanitization | DE.AE-02 | A.8.25 | MEASURE 1 | CC9.1 | Target content parsing in sandbox, injection detection | +| APTS-MR-003: Error Message Neutrality | DE.AE-02 | A.8.25 | MEASURE 1 | CC9.1 | Error data analysis without behavioral influence | +| APTS-MR-004: Configuration File Integrity Verification | PR.PS-01 | A.8.25 | GOVERN 1 | CC3.2 | Configuration content as data, not directives | +| APTS-MR-005: Authority Claim Detection & Rejection | PR.PS-01 | A.8.25 | GOVERN 1 | CC3.2 | Target-sourced authority claims rejected | +| APTS-MR-006: Decision Boundary Enforcement | PR.PS-01 | A.8.25 | GOVERN 1 | CC6.6 | Decision logic isolated from target influence | +| APTS-MR-007: Redirect Following Policy | PR.AA-01 | A.8.20 | GOVERN 1 | CC6.6 | HTTP redirect scope validation | +| APTS-MR-008: DNS and Network-Level Redirect Prevention | PR.AA-01 | A.8.20 | GOVERN 1 | CC6.6 | DNS rebinding and network redirect defense | +| APTS-MR-009: Server-Side Request Forgery (SSRF) Prevention in Testing | PR.AA-01 | A.8.25 | GOVERN 1 | CC6.6 | SSRF prevention in testing operations | +| APTS-MR-010: Scope Expansion Social Engineering Prevention | PR.PS-01 | A.5.2 | GOVERN 1 | CC3.2 | Social engineering scope expansion defense | +| APTS-MR-011: Out-of-Band Communication Prevention | PR.PS-01 | A.8.25 | GOVERN 1 | CC6.6 | No out-of-band communication channels | +| APTS-MR-012: Immutable Scope Enforcement Architecture | PR.AA-01 | A.8.5 | GOVERN 1 | CC6.6 | Scope cannot be modified by target interaction | +| APTS-MR-013: Adversarial Example Detection in Vulnerability Classification | DE.AE-02 | A.8.25 | MEASURE 1 | CC9.1 | Adversarial input detection in classification | +| APTS-MR-014: Resource Exhaustion and Tarpit Attack Prevention | DE.CM-01 | A.8.9 | MEASURE 1 | CC9.1 | Tarpit and resource exhaustion defense | +| APTS-MR-015: Deceptive Authentication Honeypots | DE.AE-02 | A.8.25 | MEASURE 1 | CC9.1 | Honeypot and deceptive credential detection | +| APTS-MR-016: Anti-Automation Defense Detection | DE.AE-02 | A.8.25 | MEASURE 1 | CC9.1 | CAPTCHA and anti-automation detection | +| APTS-MR-017: Anomaly Detection in Response Patterns | DE.AE-02 | A.8.25 | MEASURE 1 | CC9.1 | Response pattern anomaly detection | +| APTS-MR-018: AI Model Input/Output Architectural Boundary | PR.PS-01, PR.IR-01 | A.8.22, A.8.27 | GOVERN 1 | CC3.2 | AI model I/O isolation architecture | +| APTS-MR-019: Discovered Credential Protection | PR.DS-01 | A.8.3 | GOVERN 1 | C1.2 | Discovered credentials not auto-used cross-system | +| APTS-MR-020: Adversarial Validation and Resilience Testing of Safety Controls | DE.CM-01 | A.8.25 | MEASURE 1 | CC9.1 | Periodic red-team testing of safety controls, safety control resilience under adversarial conditions | +| APTS-MR-021: Data Isolation Adversarial Testing | DE.CM-01 | A.8.22 | MEASURE 1 | CC7.2 | Cross-tenant isolation adversarial testing | +| APTS-MR-022: Inter-Model Trust Boundaries and Output Validation | PR.DS-01 | A.8.25, A.8.26 | MANAGE 1 | CC7.2 | Inter-component sanitization, shared state integrity, pipeline documentation | +| APTS-MR-023: Agent Runtime as Untrusted Component in Threat Model | ID.RA-01, GV.RM-01 | A.5.7, A.8.27 | MAP 1 | CC3.2 | Threat model names agent runtime as untrusted; agent-originated threats traced to architectural containment controls (SC-019, SC-020, AR-020, MR-012) | + +### 7.7 Supply Chain Trust (APTS-TP) | APTS Requirement | NIST CSF 2.0 | ISO/IEC 27001:2022 | NIST AI RMF 1.0 | SOC 2 TSC 2017 (2022 PoF) | Notes | |---|---|---|---|---|---| -| APTS-TP-001: Third-Party Provider Selection and Vetting | GV.SC-3, GV.SC-4 | A.5.19, A.5.21 | GOVERN 1 | CC3.2 | Vendor vetting, SOC 2 Type II review, SaaS vendor evaluation and contract review | -| APTS-TP-002: Model Version Pinning and Change Management | GV.SC-3, GV.SC-4 | A.5.23, A.8.32 | GOVERN 1 | CC3.2 | Explicit model versions, no "latest" tracking | -| APTS-TP-003: API Security and Authentication | PR.AA-1, PR.AA-3 | A.8.5 | GOVERN 1 | CC6.6 | Transport encryption, key rotation, mutual authentication | -| APTS-TP-004: Provider Availability, SLA Management, and Failover | GV.SC-7 | A.5.22 | GOVERN 1 | A1.2 | Documented uptime SLA, metrics tracking, failover procedures | -| APTS-TP-005: Provider Incident Response, Breach Notification, and Mid-Engagement Compromise | RS.MA-1, RS.CO-2 | A.5.24, A.5.26 | MANAGE 1 | C1.1 | Provider breach notification, mid-engagement compromise detection and response | -| APTS-TP-006: Dependency Inventory, Risk Assessment, and Supply Chain Verification | GV.SC-5, GV.SC-7 | A.5.19, A.5.21 | GOVERN 1 | CC3.2 | Annual or more frequent security review, dependency integrity verification and monitoring | -| APTS-TP-007: Data Residency and Sovereignty Requirements | GV.OC-3 | A.5.31 | GOVERN 1 | CC3.2 | Geographic data storage and sovereignty compliance | -| APTS-TP-008: Cloud Security Configuration and Hardening | PR.PS-1 | A.8.9 | GOVERN 1 | CC4.1 | AWS/Azure/GCP security baseline enforcement | -| APTS-TP-009: Incident Response and Service Continuity Planning | RS.MA-1, RS.CO-2 | A.5.24, A.5.26 | MANAGE 1 | CC4.1 | Vendor incident response procedures | -| APTS-TP-010: Vulnerability Feed Selection and Management | ID.RA-1, ID.RA-2 | A.8.8 | MEASURE 1 | PI1.1 | Vulnerability/threat feed accuracy verification | -| APTS-TP-011: Feed Quality Assurance and Incident Response | ID.RA-1, ID.RA-2 | A.8.8 | MEASURE 1 | PI1.1 | Cross-feed correlation, false positive identification | -| APTS-TP-012: Client Data Classification Framework | PR.DS-1, PR.DS-2 | A.5.12, A.5.13 | GOVERN 1 | C1.1 | Public/Sensitive/Confidential/Restricted taxonomy | -| APTS-TP-013: Sensitive Data Discovery and Handling | PR.DS-1, PR.DS-2 | A.5.12, A.5.13 | MEASURE 1 | C1.3 | Automatic PII/PHI/credentials identification | -| APTS-TP-014: Data Encryption and Cryptographic Controls | PR.DS-1, PR.DS-2 | A.8.24 | GOVERN 1 | C1.1 | Encryption at rest, encryption in transit, secure key management | -| APTS-TP-015: Data Retention and Secure Deletion | PR.DS-1 | A.8.10 | MANAGE 1 | C1.3 | Crypto-shred, disposal verification | -| APTS-TP-A01: Breach Notification and Regulatory Reporting (Advisory) | RS.CO-2, RS.CO-3 | A.5.24, A.5.5 | MANAGE 1 | CC7.4 | Client notification per applicable regulatory timelines | -| APTS-TP-016: Data Destruction Proof and Certification | PR.DS-1 | A.8.10 | MANAGE 1 | C1.3 | Certified data destruction and audit trail | -| APTS-TP-017: Multi-Tenant and Engagement Isolation | PR.DS-1, PR.AA-1 | A.8.22 | GOVERN 1 | CC7.2 | Engagement and tenant isolation verification | -| APTS-TP-018: Tenant Breach Notification | RS.CO-2, RS.CO-3 | A.5.24, A.5.26 | MANAGE 1 | CC7.4 | Timely breach notification to affected tenants per contractual terms | -| APTS-TP-A02: Privacy Regulation Compliance (Advisory) | GV.OC-3 | A.5.34 | GOVERN 1 | P2.1 | GDPR, CCPA, and regional privacy compliance | -| APTS-TP-A03: Professional Liability and Engagement Agreements (Advisory) | GV.OC-2 | A.5.31 | GOVERN 1 | CC9.2 | E&O insurance, service agreements, liability caps | -| APTS-TP-019: AI Model Provenance and Training Data Governance | GV.SC-3, GV.SC-4 | A.5.23, A.8.32 | GOVERN 1 | CC3.2 | Model training data documentation and verification | -| APTS-TP-020: Persistent Memory and Retrieval State Governance | PR.DS-1 | A.5.12, A.8.10 | GOVERN 1 | C1.1 | State inventory, cross-engagement isolation, operator visibility, decision influence auditing | -| APTS-TP-021: Foundation Model Disclosure and Capability Baseline | GV.SC-3, GV.SC-4 | A.5.23, A.8.32 | GOVERN 1 | CC3.2 | MUST \| Tier 1 | Disclose provider, family, version, release date, and operator customizations of the foundation model; publish a capability baseline in the conformance claim | -| APTS-TP-022: Re-attestation on Material Foundation Model Change | GV.SC-7, ID.IM-2 | A.8.32, A.5.23 | GOVERN 1 | CC7.2 | MUST \| Tier 2 | Re-assess SE, SC, MR, AL when the foundation model undergoes material change (provider/family/generation/fine-tune/capability shift); block promotion until workpaper is complete | - -### 6.8 Reporting (APTS-RP) +| APTS-TP-001: Third-Party Provider Selection and Vetting | GV.SC-03, GV.SC-04 | A.5.19, A.5.21 | GOVERN 1 | CC3.2 | Vendor vetting, SOC 2 Type II review, SaaS vendor evaluation and contract review | +| APTS-TP-002: Model Version Pinning and Change Management | GV.SC-03, GV.SC-04 | A.5.23, A.8.32 | GOVERN 1 | CC3.2 | Explicit model versions, no "latest" tracking | +| APTS-TP-003: API Security and Authentication | PR.AA-01, PR.AA-03 | A.8.5 | GOVERN 1 | CC6.6 | Transport encryption, key rotation, mutual authentication | +| APTS-TP-004: Provider Availability, SLA Management, and Failover | GV.SC-07 | A.5.22 | GOVERN 1 | A1.2 | Documented uptime SLA, metrics tracking, failover procedures | +| APTS-TP-005: Provider Incident Response, Breach Notification, and Mid-Engagement Compromise | RS.MA-01, RS.CO-02 | A.5.24, A.5.26 | MANAGE 1 | C1.1 | Provider breach notification, mid-engagement compromise detection and response | +| APTS-TP-006: Dependency Inventory, Risk Assessment, and Supply Chain Verification | GV.SC-05, GV.SC-07 | A.5.19, A.5.21 | GOVERN 1 | CC3.2 | Annual or more frequent security review, dependency integrity verification and monitoring | +| APTS-TP-007: Data Residency and Sovereignty Requirements | GV.OC-03 | A.5.31 | GOVERN 1 | CC3.2 | Geographic data storage and sovereignty compliance | +| APTS-TP-008: Cloud Security Configuration and Hardening | PR.PS-01 | A.8.9 | GOVERN 1 | CC4.1 | AWS/Azure/GCP security baseline enforcement | +| APTS-TP-009: Incident Response and Service Continuity Planning | RS.MA-01, RS.CO-02 | A.5.24, A.5.26 | MANAGE 1 | CC4.1 | Vendor incident response procedures | +| APTS-TP-010: Vulnerability Feed Selection and Management | ID.RA-01, ID.RA-02 | A.8.8 | MEASURE 1 | PI1.1 | Vulnerability/threat feed accuracy verification | +| APTS-TP-011: Feed Quality Assurance and Incident Response | ID.RA-01, ID.RA-02 | A.8.8 | MEASURE 1 | PI1.1 | Cross-feed correlation, false positive identification | +| APTS-TP-012: Client Data Classification Framework | PR.DS-01, PR.DS-02 | A.5.12, A.5.13 | GOVERN 1 | C1.1 | Public/Sensitive/Confidential/Restricted taxonomy | +| APTS-TP-013: Sensitive Data Discovery and Handling | PR.DS-01, PR.DS-02 | A.5.12, A.5.13 | MEASURE 1 | C1.3 | Automatic PII/PHI/credentials identification | +| APTS-TP-014: Data Encryption and Cryptographic Controls | PR.DS-01, PR.DS-02 | A.8.24 | GOVERN 1 | C1.1 | Encryption at rest, encryption in transit, secure key management | +| APTS-TP-015: Data Retention and Secure Deletion | PR.DS-01 | A.8.10 | MANAGE 1 | C1.3 | Crypto-shred, disposal verification | +| APTS-TP-A01: Breach Notification and Regulatory Reporting (Advisory) | RS.CO-02, RS.CO-03 | A.5.24, A.5.5 | MANAGE 1 | CC7.4 | Client notification per applicable regulatory timelines | +| APTS-TP-016: Data Destruction Proof and Certification | PR.DS-01 | A.8.10 | MANAGE 1 | C1.3 | Certified data destruction and audit trail | +| APTS-TP-017: Multi-Tenant and Engagement Isolation | PR.DS-01, PR.AA-01 | A.8.22 | GOVERN 1 | CC7.2 | Engagement and tenant isolation verification | +| APTS-TP-018: Tenant Breach Notification | RS.CO-02, RS.CO-03 | A.5.24, A.5.26 | MANAGE 1 | CC7.4 | Timely breach notification to affected tenants per contractual terms | +| APTS-TP-A02: Privacy Regulation Compliance (Advisory) | GV.OC-03 | A.5.34 | GOVERN 1 | P2.1 | GDPR, CCPA, and regional privacy compliance | +| APTS-TP-A03: Professional Liability and Engagement Agreements (Advisory) | GV.OC-02 | A.5.31 | GOVERN 1 | CC9.2 | E&O insurance, service agreements, liability caps | +| APTS-TP-019: AI Model Provenance and Training Data Governance | GV.SC-03, GV.SC-04 | A.5.23, A.8.32 | GOVERN 1 | CC3.2 | Model training data documentation and verification | +| APTS-TP-020: Persistent Memory and Retrieval State Governance | PR.DS-01 | A.5.12, A.8.10 | GOVERN 1 | C1.1 | State inventory, cross-engagement isolation, operator visibility, decision influence auditing | +| APTS-TP-021: Foundation Model Disclosure and Capability Baseline | GV.SC-03, GV.SC-04 | A.5.23, A.8.32 | GOVERN 1 | CC3.2 | Disclose provider, family, version, release date, and operator customizations of the foundation model; publish a capability baseline in the conformance claim | +| APTS-TP-022: Re-attestation on Material Foundation Model Change | GV.SC-07, ID.IM-02 | A.8.32, A.5.23 | GOVERN 1 | CC7.2 | Re-assess SE, SC, MR, AL when the foundation model undergoes material change (provider/family/generation/fine-tune/capability shift); block promotion until workpaper is complete | + +### 7.8 Reporting (APTS-RP) | APTS Requirement | NIST CSF 2.0 | ISO/IEC 27001:2022 | NIST AI RMF 1.0 | SOC 2 TSC 2017 (2022 PoF) | Notes | |---|---|---|---|---|---| -| APTS-RP-001: Evidence-Based Finding Validation | RS.AN-3, ID.IM-1 | A.5.28 | MEASURE 2 | CC9.1 | Raw artifacts separate from summaries | -| APTS-RP-002: Finding Verification and Human Review Pipeline | ID.IM-1 | A.5.28, A.8.29 | MEASURE 2 | CC9.1 | Critical/High findings re-verified before delivery | -| APTS-RP-003: Confidence Scoring with Auditable Methodology | DE.AE-1 | A.8.25 | MEASURE 1 | CC9.1 | Auditable, formula-based confidence methodology | -| APTS-RP-004: Finding Provenance Chain | PR.DS-1 | A.5.28 | MEASURE 1 | CC9.1 | Cryptographic linkage to audit logs | -| APTS-RP-005: Cryptographic Evidence Chain Integrity | PR.DS-1 | A.8.24 | MANAGE 1 | CC9.1 | Evidence cryptographically linked to findings | -| APTS-RP-006: False Positive Rate Disclosure | ID.IM-1 | A.5.37, A.8.29 | MEASURE 2 | CC9.1 | Methodology section includes accuracy statistics | -| APTS-RP-007: Independent Finding Reproducibility | ID.IM-2 | A.8.25 | MEASURE 1 | CC9.1 | Independent validation of findings mid-assessment | -| APTS-RP-008: Vulnerability Coverage Disclosure | ID.IM-1 | A.5.36, A.8.29 | MEASURE 2 | CC2.1 | Coverage scope and limitations disclosed | -| APTS-RP-009: False Negative Rate Disclosure and Methodology | ID.IM-1 | A.5.36, A.8.29 | MEASURE 2 | CC9.1 | Missed vulnerability rate methodology | -| APTS-RP-010: Detection Effectiveness Benchmarking | ID.IM-2 | A.5.37 | MEASURE 1 | PI1.1 | Detection rate benchmarking methodology | -| APTS-RP-011: Executive Summary and Risk Overview | GV.OC-2 | A.5.37 | MANAGE 1 | CC2.1 | Risk-focused narrative for decision-makers | -| APTS-RP-012: Remediation Guidance and Prioritization | ID.IM-1 | A.8.8 | MANAGE 1 | CC4.1 | Prioritized remediation with effort estimation | -| APTS-RP-013: Engagement SLA Compliance Reporting | ID.IM-1 | A.5.35 | MEASURE 2 | CC9.1 | SLA adherence documentation | -| APTS-RP-014: Trend Analysis for Recurring Engagements | DE.CM-1 | A.5.37 | MEASURE 1 | PI1.1 | Cross-engagement trend analysis | -| APTS-RP-015: Downstream Finding Pipeline Integrity | PR.DS-1, PR.DS-2 | A.5.12, A.5.14 | MANAGE 1 | CC7.2 | Finding sync fidelity, tenant isolation, deduplication, sensitive data redaction, delivery assurance | +| APTS-RP-001: Evidence-Based Finding Validation | RS.AN-03, ID.IM-01 | A.5.28 | MEASURE 2 | CC9.1 | Raw artifacts separate from summaries | +| APTS-RP-002: Finding Verification and Human Review Pipeline | ID.IM-01 | A.5.28, A.8.29 | MEASURE 2 | CC9.1 | Critical/High findings re-verified before delivery | +| APTS-RP-003: Confidence Scoring with Auditable Methodology | DE.AE-02 | A.8.25 | MEASURE 1 | CC9.1 | Auditable, formula-based confidence methodology | +| APTS-RP-004: Finding Provenance Chain | PR.DS-01 | A.5.28 | MEASURE 1 | CC9.1 | Cryptographic linkage to audit logs | +| APTS-RP-005: Cryptographic Evidence Chain Integrity | PR.DS-01 | A.8.24 | MANAGE 1 | CC9.1 | Evidence cryptographically linked to findings | +| APTS-RP-006: False Positive Rate Disclosure | ID.IM-01 | A.5.37, A.8.29 | MEASURE 2 | CC9.1 | Methodology section includes accuracy statistics | +| APTS-RP-007: Independent Finding Reproducibility | ID.IM-02 | A.8.25 | MEASURE 1 | CC9.1 | Independent validation of findings mid-assessment | +| APTS-RP-008: Vulnerability Coverage Disclosure | ID.IM-01 | A.5.36, A.8.29 | MEASURE 2 | CC2.1 | Coverage scope and limitations disclosed | +| APTS-RP-009: False Negative Rate Disclosure and Methodology | ID.IM-01 | A.5.36, A.8.29 | MEASURE 2 | CC9.1 | Missed vulnerability rate methodology | +| APTS-RP-010: Detection Effectiveness Benchmarking | ID.IM-02 | A.5.37 | MEASURE 1 | PI1.1 | Detection rate benchmarking methodology | +| APTS-RP-011: Executive Summary and Risk Overview | GV.OC-02 | A.5.37 | MANAGE 1 | CC2.1 | Risk-focused narrative for decision-makers | +| APTS-RP-012: Remediation Guidance and Prioritization | ID.IM-01 | A.8.8 | MANAGE 1 | CC4.1 | Prioritized remediation with effort estimation | +| APTS-RP-013: Engagement SLA Compliance Reporting | ID.IM-01 | A.5.35 | MEASURE 2 | CC9.1 | SLA adherence documentation | +| APTS-RP-014: Trend Analysis for Recurring Engagements | DE.CM-01 | A.5.37 | MEASURE 1 | PI1.1 | Cross-engagement trend analysis | +| APTS-RP-015: Downstream Finding Pipeline Integrity | PR.DS-01, PR.DS-02 | A.5.12, A.5.14 | MANAGE 1 | CC7.2 | Finding sync fidelity, tenant isolation, deduplication, sensitive data redaction, delivery assurance | ## Compliance Matrix Reference Table | Requirement | NIST CSF 2.0 | ISO/IEC 27001:2022 | NIST AI RMF 1.0 | SOC 2 TSC 2017 (2022 PoF) | PCI DSS 4.0.1 | GDPR | |---|---|---|---|---|---|---| -| APTS-SE-001: Rules of Engagement (RoE) Specification and Validation | GV.PO-1 | A.5.8 | GOVERN 1 | CC3.2 | - | - | -| APTS-SE-002: IP Range Validation and RFC 1918 Awareness | ID.AM-1 | A.8.20, A.8.22 | GOVERN 1 | CC1.1 | - | - | -| APTS-SE-009: Hard Deny Lists and Critical Asset Protection | PR.AA-1 | A.8.5 | GOVERN 1 | CC6.6 | - | - | -| APTS-SC-009: Kill Switch | PR.PS-1, RS.MA-1 | A.5.26, A.5.29 | GOVERN 1 | CC4.1 | - | - | -| APTS-SC-017: External Watchdog and Operator Notification | DE.CM-1 | A.8.9 | MEASURE 1 | A1.1 | - | - | -| APTS-HO-001: Mandatory Pre-Approval Gates for Autonomy Levels L1 and L2 | GV.RR-2 | A.5.2 | GOVERN 1 | CC3.2 | - | - | -| APTS-HO-002: Real-Time Monitoring and Intervention Capability | DE.CM-1 | A.8.16 | MEASURE 2 | CC9.1 | - | - | -| APTS-AL-001: Single Technique Execution | PR.PS-1 | A.8.25 | GOVERN 1 | CC3.2 | - | - | -| APTS-AL-002: Human-Directed Target and Technique Selection | GV.RR-2 | A.5.1 | GOVERN 1 | CC3.2 | - | - | -| APTS-AR-001: Structured Event Logging with Schema Validation | DE.CM-1, PR.PS-1 | A.8.15 | MAP 1 | CC9.1 | Req 10 | - | -| APTS-AR-004: Decision Point Logging and Confidence Scoring | DE.AE-1 | A.8.15 | MAP 1 | CC9.1 | - | - | -| APTS-MR-001: Instruction Boundary Enforcement | PR.PS-1 | A.8.25 | GOVERN 1 | CC3.2 | - | - | -| APTS-MR-002: Response Validation & Sanitization | DE.AE-1 | A.8.25 | MEASURE 1 | CC9.1 | - | - | -| APTS-TP-001: Third-Party Provider Selection and Vetting | GV.SC-3, GV.SC-4 | A.5.19, A.5.21 | GOVERN 1 | CC3.2 | - | - | -| APTS-TP-003: API Security and Authentication | PR.AA-1, PR.AA-3 | A.8.5 | GOVERN 1 | CC6.6 | Req 7-8 | Art 32 | -| APTS-TP-014: Data Encryption and Cryptographic Controls | PR.DS-1, PR.DS-2 | A.8.24 | GOVERN 1 | C1.1 | Req 3-4 | Art 32 | -| APTS-TP-018: Tenant Breach Notification | RS.CO-2, RS.CO-3 | A.5.24, A.5.26 | MANAGE 1 | CC7.4 | Req 12 | Art 33-34 | -| APTS-RP-001: Evidence-Based Finding Validation | RS.AN-3, ID.IM-1 | A.5.28 | MEASURE 2 | CC9.1 | - | - | -| APTS-RP-002: Finding Verification and Human Review Pipeline | ID.IM-1 | A.5.28, A.8.29 | MEASURE 2 | CC9.1 | - | - | +| APTS-SE-001: Rules of Engagement (RoE) Specification and Validation | GV.PO-01 | A.5.8 | GOVERN 1 | CC3.2 | - | - | +| APTS-SE-002: IP Range Validation and RFC 1918 Awareness | ID.AM-01 | A.8.20, A.8.22 | GOVERN 1 | CC1.1 | - | - | +| APTS-SE-009: Hard Deny Lists and Critical Asset Protection | PR.AA-01 | A.8.5 | GOVERN 1 | CC6.6 | - | - | +| APTS-SC-009: Kill Switch | PR.PS-01, RS.MA-01 | A.5.26, A.5.29 | GOVERN 1 | CC4.1 | - | - | +| APTS-SC-017: External Watchdog and Operator Notification | DE.CM-01 | A.8.9 | MEASURE 1 | A1.1 | - | - | +| APTS-HO-001: Mandatory Pre-Approval Gates for Autonomy Levels L1 and L2 | GV.RR-02 | A.5.2 | GOVERN 1 | CC3.2 | - | - | +| APTS-HO-002: Real-Time Monitoring and Intervention Capability | DE.CM-01 | A.8.16 | MEASURE 2 | CC9.1 | - | - | +| APTS-AL-001: Single Technique Execution | PR.PS-01 | A.8.25 | GOVERN 1 | CC3.2 | - | - | +| APTS-AL-002: Human-Directed Target and Technique Selection | GV.RR-02 | A.5.1 | GOVERN 1 | CC3.2 | - | - | +| APTS-AR-001: Structured Event Logging with Schema Validation | DE.CM-01, PR.PS-01 | A.8.15 | MAP 1 | CC9.1 | Req 10 | - | +| APTS-AR-004: Decision Point Logging and Confidence Scoring | DE.AE-02 | A.8.15 | MAP 1 | CC9.1 | - | - | +| APTS-MR-001: Instruction Boundary Enforcement | PR.PS-01 | A.8.25 | GOVERN 1 | CC3.2 | - | - | +| APTS-MR-002: Response Validation & Sanitization | DE.AE-02 | A.8.25 | MEASURE 1 | CC9.1 | - | - | +| APTS-TP-001: Third-Party Provider Selection and Vetting | GV.SC-03, GV.SC-04 | A.5.19, A.5.21 | GOVERN 1 | CC3.2 | - | - | +| APTS-TP-003: API Security and Authentication | PR.AA-01, PR.AA-03 | A.8.5 | GOVERN 1 | CC6.6 | Req 7-8 | Art 32 | +| APTS-TP-014: Data Encryption and Cryptographic Controls | PR.DS-01, PR.DS-02 | A.8.24 | GOVERN 1 | C1.1 | Req 3-4 | Art 32 | +| APTS-TP-018: Tenant Breach Notification | RS.CO-02, RS.CO-03 | A.5.24, A.5.26 | MANAGE 1 | CC7.4 | Req 12 | Art 33-34 | +| APTS-RP-001: Evidence-Based Finding Validation | RS.AN-03, ID.IM-01 | A.5.28 | MEASURE 2 | CC9.1 | - | - | +| APTS-RP-002: Finding Verification and Human Review Pipeline | ID.IM-01 | A.5.28, A.8.29 | MEASURE 2 | CC9.1 | - | - | --- @@ -929,7 +834,7 @@ This section maps all 8 APTS domains to external frameworks, organized by domain 1. **NIST CSF 2.0 Coverage**: The standard addresses all six functions (Govern, Identify, Protect, Detect, Respond, Recover) 2. **ISO/IEC 27001:2022 Coverage**: Controls A.5 through A.8 have corresponding mappings -3. **SOC 2 Coverage**: All five trust principles addressed with proper Trust Service Criteria codes +3. **SOC 2 Coverage**: All five trust services categories addressed with proper Trust Services Criteria codes 4. **NIST AI RMF 1.0 Coverage**: All four functions (Govern, Map, Measure, Manage) addressed, with particular depth in autonomy governance and AI risk treatment 5. **PCI DSS Coverage**: Applicable controls for card data security 6. **GDPR Coverage**: All key articles (5, 17, 28, 32-34) addressed From d6d28bf53c3af44293b1acd1a74151aa78efca92 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 23:07:54 +0530 Subject: [PATCH 14/35] Update Conformance_Claim_Schema.md --- standard/appendix/Conformance_Claim_Schema.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/standard/appendix/Conformance_Claim_Schema.md b/standard/appendix/Conformance_Claim_Schema.md index b1e29a8..08e0064 100644 --- a/standard/appendix/Conformance_Claim_Schema.md +++ b/standard/appendix/Conformance_Claim_Schema.md @@ -88,11 +88,13 @@ Recommended fields: Recommended fields: - `requirement_id` -- `exception_type` +- `exception_type` (for example, `should_deviation`, `not_applicable`, `scope_limitation`) - `description` - `approval_reference` - `review_by` +A deviation from a SHOULD requirement at the claimed tier must be recorded here (or in the SHOULD Deviations section of the markdown template) with a documented justification; see the conformance rules in the [Introduction](../Introduction.md#compliance-tiers). + ### 7. Assessment attestation Recommended fields: From de997bf42fe7e4e9a31540415c2c86a0b9729ce8 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 23:09:05 +0530 Subject: [PATCH 15/35] Update Conformance_Claim_Template.md --- standard/appendix/Conformance_Claim_Template.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/standard/appendix/Conformance_Claim_Template.md b/standard/appendix/Conformance_Claim_Template.md index 7c1eb74..92238d9 100644 --- a/standard/appendix/Conformance_Claim_Template.md +++ b/standard/appendix/Conformance_Claim_Template.md @@ -64,7 +64,17 @@ _[If the platform supports model substitution at runtime, disclose each approved | Reporting (RP) | _[count]_ | _[count]_ | | | **Total** | _[count]_ | _[count]_ | | -> **Reminder:** APTS requires 100% of requirements at the claimed tier to be met. Partial credit is not awarded. +> **Reminder:** APTS requires 100% of requirements at the claimed tier to be met. MUST requirements permit no deviation. A SHOULD requirement that is not implemented must be recorded in the SHOULD Deviations section below with a documented justification; an undocumented SHOULD deviation is a conformance gap. + +--- + +## SHOULD Deviations + +_[List every SHOULD requirement at the claimed tier that the platform does not implement, with the justification for the deviation. If there are no deviations, state "None." Leaving this section out while deviating from a SHOULD requirement invalidates the claim.]_ + +| Requirement | Justification | Compensating Measures (if any) | Review Date | +|-------------|---------------|--------------------------------|-------------| +| _[ID and title]_ | _[Why the requirement is not implemented]_ | _[Controls that partially address the intent]_ | _[Date of next review]_ | --- From 9e2230a7773ebecf4567f48ddc6c397f6feae72f Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 23:09:41 +0530 Subject: [PATCH 16/35] Update Cross_Domain_Integration.md --- standard/appendix/Cross_Domain_Integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/appendix/Cross_Domain_Integration.md b/standard/appendix/Cross_Domain_Integration.md index 73ecad6..03acafb 100644 --- a/standard/appendix/Cross_Domain_Integration.md +++ b/standard/appendix/Cross_Domain_Integration.md @@ -10,7 +10,7 @@ The matrix below identifies where an event or action in one domain (row) trigger | Triggering Event | Source | What Happens | Target Requirements | |-----------------|--------|--------------|---------------------| -| Kill switch activated | SC-009 | All ongoing test activity ceases within 5 seconds; all spawned processes terminate; all network connections close; system state preserved for forensic investigation. | AR-001 | +| Kill switch activated | SC-009 | All new test actions cease within 5 seconds (Phase 1); spawned processes terminate and network connections close within 60 seconds (Phase 2); system state preserved for forensic investigation. | AR-001 | | Kill switch activated | SC-009 | Operators notified immediately via multiple independent channels (email, SMS, messaging) with details of what triggered the halt and actions taken. | HO-015 | | Kill switch activated | SC-009 | All test artifacts, logs, and findings evidence captured and stored in immutable, tamper-evident storage before any system cleanup or shutdown occurs. | SC-016 | | Kill switch activated | SC-009 | Platform isolated from customer networks; credentials rotated; all active sessions terminated; memory dumps and logs preserved on secure system. | SC-018 | From 68047d3ce529cf8c961109b2a39577617276602a Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Thu, 11 Jun 2026 23:10:25 +0530 Subject: [PATCH 17/35] Update Glossary.md --- standard/appendix/Glossary.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/standard/appendix/Glossary.md b/standard/appendix/Glossary.md index 06e27b2..2e72be8 100644 --- a/standard/appendix/Glossary.md +++ b/standard/appendix/Glossary.md @@ -79,7 +79,7 @@ Notation for specifying IP address ranges using a base address and prefix length Alternative security measures that mitigate vulnerability when the primary control is missing. Example: Two-factor authentication compensates for weak passwords. **Compliance Tier** -One of three progressive levels of APTS conformance. Tier 1 (Foundation) requires 72 core requirements (MUST | Tier 1). Tier 2 (Verified) adds 85 requirements for a cumulative 157 (MUST | Tier 2 + SHOULD | Tier 2). Tier 3 (Comprehensive) adds 16 requirements for a cumulative 173 (MUST | Tier 3 + SHOULD | Tier 3). A platform must meet 100% of requirements assigned to its claimed tier (both MUST and SHOULD). An additional 18 advisory practices in the Advisory Requirements appendix are recommended for highest-assurance engagements but are not counted toward any tier. +One of three progressive levels of APTS conformance. Tier 1 (Foundation) requires 72 core requirements (MUST | Tier 1). Tier 2 (Verified) adds 85 requirements for a cumulative 157 (MUST | Tier 2 + SHOULD | Tier 2). Tier 3 (Comprehensive) adds 16 requirements for a cumulative 173 (MUST | Tier 3 + SHOULD | Tier 3). A platform must meet 100% of requirements assigned to its claimed tier (both MUST and SHOULD); an unimplemented SHOULD requires a documented justification in the conformance claim, while an unimplemented MUST is a conformance failure. An additional 18 advisory practices in the Advisory Requirements appendix are recommended for highest-assurance engagements but are not counted toward any tier. **Confidence Score** A numeric value on a 0-100% scale indicating the platform's certainty in a scope boundary determination, target legitimacy assessment, asset classification, or finding validity. Scores below 75% for scope-related decisions trigger mandatory human escalation. See APTS-HO-013, APTS-RP-003. @@ -381,7 +381,7 @@ A single technique execution or check performed by the platform against a target Database-level encryption of data at rest. Automatic, transparent to applications. Used for protection of sensitive data in databases. **Testing Phase** -A discrete stage in the penetration testing lifecycle. This standard recognizes the following canonical phases: Initialization, Reconnaissance, Enumeration, Vulnerability Assessment, Exploitation, Post-Exploitation, and Reporting. Domain-specific requirements may reference phase subsets relevant to their scope. +A discrete stage in the penetration testing lifecycle. This standard uses the canonical phase model defined in the Graduated Autonomy domain: Reconnaissance, Enumeration, Identification, Exploitation, Post-Exploitation, and Reporting, plus an Initialization state that precedes Reconnaissance (recognized by APTS-AR-002). Implementations MUST use these phase names when generating phase-transition events. Domain-specific requirements may reference phase subsets relevant to their scope. **Timestamp Precision** Platform-generated timestamps are expected to use millisecond precision or better (see APTS-AR-001). External timestamp authorities (for example, RFC 3161 services) may operate at lower precision (±1 second); this is acceptable for external timestamping while platform-internal logs maintain millisecond precision. From 1b85156b53263f386995a3198664d73979ac7bf5 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 00:45:06 +0530 Subject: [PATCH 18/35] Update README.md --- standard/1_Scope_Enforcement/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/1_Scope_Enforcement/README.md b/standard/1_Scope_Enforcement/README.md index 6a8907b..03a18da 100644 --- a/standard/1_Scope_Enforcement/README.md +++ b/standard/1_Scope_Enforcement/README.md @@ -57,7 +57,7 @@ The 26 requirements in this domain fall into seven thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SE requirement plus every Tier 2 SE requirement, and a Tier 3 platform satisfies all three tiers. +A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SE requirement plus every Tier 2 SE requirement, and a Tier 3 platform satisfies all three tiers. Every requirement in this domain includes a Verification subsection listing the verification procedures a reviewer uses to confirm implementation. From d1376a34d90c2e7e4a9335129b99db41de4635aa Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 00:45:54 +0530 Subject: [PATCH 19/35] Update README.md --- standard/2_Safety_Controls/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/2_Safety_Controls/README.md b/standard/2_Safety_Controls/README.md index fed729f..5b19472 100644 --- a/standard/2_Safety_Controls/README.md +++ b/standard/2_Safety_Controls/README.md @@ -50,7 +50,7 @@ The 20 requirements in this domain fall into seven thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SC requirement plus every Tier 2 SC requirement, and a Tier 3 platform satisfies all three tiers. +A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SC requirement plus every Tier 2 SC requirement, and a Tier 3 platform satisfies all three tiers. Three appendix-only advisory practices for this domain (APTS-SC-A01 Platform Health Monitoring and Anomaly Detection, APTS-SC-A02 Context Window Safety and Constraint Preservation, and APTS-SC-A03 Tool Invocation Parameter and Chaining Governance) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. From 12d1719a830ff0d92092761c3cd6c7e870089dbd Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 00:46:08 +0530 Subject: [PATCH 20/35] Update README.md --- standard/2_Safety_Controls/README.md | 964 +++++++++++++++++++-------- 1 file changed, 681 insertions(+), 283 deletions(-) diff --git a/standard/2_Safety_Controls/README.md b/standard/2_Safety_Controls/README.md index 5b19472..1f305a6 100644 --- a/standard/2_Safety_Controls/README.md +++ b/standard/2_Safety_Controls/README.md @@ -1,10 +1,10 @@ -# Safety Controls and Impact Management +# Human Oversight and Intervention -**Domain Prefix:** APTS-SC | **Requirements:** 20 +**Domain Prefix:** APTS-HO | **Requirements:** 19 -This domain defines how an autonomous penetration testing platform classifies the potential impact of its actions, limits blast radius, enforces graduated escalation thresholds, terminates testing on adverse conditions, recovers from incidents, and contains the agent runtime within a declared execution boundary enforced outside the agent's own control. Safety Controls complement Scope Enforcement: where SE decides whether an action is inside the agreed envelope, SC decides whether that in-scope action is safe enough to run right now given its predicted impact, cumulative risk, and current system health, and whether the agent's execution environment continues to enforce the platform's declared containment boundary. A platform that cannot stop itself, cannot score what it is doing against Confidentiality, Integrity, and Availability (CIA) dimensions, cannot detect and recover from an unintended effect, or cannot enforce a sandbox boundary on its own agent runtime cannot safely operate at any autonomy level above L1. Requirements in this domain govern impact classification, rate and payload constraints, threshold escalation, kill switches, health-triggered halts, circuit breakers, reversibility tracking, rollback, post-test integrity checks, incident containment and recovery, execution sandbox boundary integrity, and external enforcement of the agent's action allowlist. +This domain defines how an autonomous penetration testing platform keeps qualified humans in the loop: approving actions before execution at low autonomy levels, monitoring and intervening during execution, exercising pause/redirect/kill authority, receiving escalations on unexpected findings or threshold breaches, and closing engagements with accountable human sign-off. Human Oversight is the safety valve that makes graduated autonomy workable. Even a well-designed autonomous platform will encounter situations it has not been authorized to handle alone, and the quality of its behavior in those situations depends on how reliably, how quickly, and to whom it hands control. Requirements in this domain govern approval gates, monitoring and intervention capability, decision timeouts, authority delegation, graceful pause and redirect, kill switches, irreversibility gates, escalation triggers, alerting and fatigue controls, stakeholder notification, operator qualifications, and 24/7 continuity. -This domain covers blast-radius management and hard-stop capability. Scope boundary enforcement belongs to Scope Enforcement (SE), human approval workflows to Human Oversight (HO), and evidence of safety-control actions to Auditability (AR). +This domain covers the human side of the human-platform loop: who approves, who intervenes, and when. Scope boundary checks belong to Scope Enforcement (SE), impact classification and hard stops to Safety Controls (SC), and the audit trail of approvals to Auditability (AR). > For implementation guidance, see the [Implementation Guide](Implementation_Guide.md). @@ -12,525 +12,923 @@ This domain covers blast-radius management and hard-stop capability. Scope bound ## Domain Overview -The 20 requirements in this domain fall into seven thematic groups: +The 19 requirements in this domain fall into six thematic groups: | Group | Requirements | Purpose | |---|---|---| -| **Impact classification and scoring** | APTS-SC-001, APTS-SC-002, APTS-SC-003 | CIA dimensional scoring, industry-specific considerations, worked classification examples | -| **Rate, threshold, and cumulative risk controls** | APTS-SC-004, APTS-SC-005, APTS-SC-006, APTS-SC-007, APTS-SC-008 | Rate and payload constraints, cascading-failure prevention, escalation workflow, cumulative risk scoring, schema-validated thresholds | -| **Kill switch and automated termination** | APTS-SC-009, APTS-SC-010, APTS-SC-011, APTS-SC-012, APTS-SC-013 | Kill switch, health-triggered halts, condition-based termination, network circuit breaker, time-based termination | -| **Reversibility, rollback, and post-test integrity** | APTS-SC-014, APTS-SC-015, APTS-SC-016 | Reversible action tracking and rollback, post-test integrity validation, evidence preservation and cleanup | -| **External watchdog and incident recovery** | APTS-SC-017, APTS-SC-018 | External watchdog and operator notification, incident containment and recovery | -| **Execution sandbox and agent containment** | APTS-SC-019, APTS-SC-020 | Sandbox and containment boundary integrity, action allowlist enforcement external to the model | +| **Approval gates and intervention capability** | APTS-HO-001, APTS-HO-002, APTS-HO-003 | Mandatory pre-approval at L1/L2, real-time monitoring and intervention, decision timeout with default-safe behavior | +| **Authority delegation and chain-of-custody** | APTS-HO-004, APTS-HO-005 | Delegation matrix, chain-of-custody and decision audit trail | +| **Pause, redirect, and kill switch** | APTS-HO-006, APTS-HO-007, APTS-HO-008, APTS-HO-009 | Graceful pause with state preservation, mid-engagement redirect, immediate kill switch with state dump, multi-operator authority and handoff | +| **Irreversibility and escalation triggers** | APTS-HO-010, APTS-HO-011, APTS-HO-012, APTS-HO-013, APTS-HO-014 | Decision points before irreversible actions, unexpected-findings escalation, impact-threshold breach, confidence-based escalation, legal and compliance triggers | +| **Activity monitoring, alerting, and closure** | APTS-HO-015, APTS-HO-016, APTS-HO-017 | Real-time activity monitoring and notifications, alert-fatigue mitigation, stakeholder notification and engagement closure | +| **Operator qualification and continuity** | APTS-HO-018, APTS-HO-019 | Qualification, training, and competency governance; 24/7 operational continuity and shift handoff | ### Requirement Index | ID | Title | Classification | |---|---|---| -| APTS-SC-001 | Impact Classification and CIA Scoring | MUST \| Tier 1 | -| APTS-SC-002 | Industry-Specific Impact Considerations | MUST \| Tier 2 | -| APTS-SC-003 | Real-World Impact Classification Examples | SHOULD \| Tier 2 | -| APTS-SC-004 | Rate Limiting, Bandwidth, and Payload Constraints | MUST \| Tier 1 | -| APTS-SC-005 | Cascading Failure Prevention in Interconnected Systems | SHOULD \| Tier 2 | -| APTS-SC-006 | Threshold Escalation Workflow (Automated → Approval → Prohibited) | MUST \| Tier 2 | -| APTS-SC-007 | Cumulative Risk Scoring with Time-Based Decay | MUST \| Tier 2 | -| APTS-SC-008 | Threshold Configuration with Schema Validation | SHOULD \| Tier 3 | -| APTS-SC-009 | Kill Switch | MUST \| Tier 1 | -| APTS-SC-010 | Health Check Monitoring, Threshold Adjustment, and Automatic Halt | MUST \| Tier 1 | -| APTS-SC-011 | Condition-Based Automated Termination | MUST \| Tier 2 | -| APTS-SC-012 | Network-Level Circuit Breaker | MUST \| Tier 2 | -| APTS-SC-013 | Time-Based Automatic Termination with Operator Override | SHOULD \| Tier 3 | -| APTS-SC-014 | Reversible Action Tracking and Rollback | MUST \| Tier 2 | -| APTS-SC-015 | Post-Test System Integrity Validation | MUST \| Tier 1 | -| APTS-SC-016 | Evidence Preservation and Automated Cleanup | MUST \| Tier 2 | -| APTS-SC-017 | External Watchdog and Operator Notification | MUST \| Tier 2 | -| APTS-SC-018 | Incident Containment and Recovery | MUST \| Tier 2 | -| APTS-SC-019 | Execution Sandbox and Containment Boundary Integrity | MUST \| Tier 2 | -| APTS-SC-020 | Action Allowlist Enforcement External to the Model | MUST \| Tier 1 | +| APTS-HO-001 | Mandatory Pre-Approval Gates for Autonomy Levels L1 and L2 | MUST \| Tier 1 | +| APTS-HO-002 | Real-Time Monitoring and Intervention Capability | MUST \| Tier 1 | +| APTS-HO-003 | Decision Timeout and Default-Safe Behavior | MUST \| Tier 1 | +| APTS-HO-004 | Authority Delegation Matrix | MUST \| Tier 1 | +| APTS-HO-005 | Delegation Chain-of-Custody and Decision Audit Trail | MUST \| Tier 2 | +| APTS-HO-006 | Graceful Pause Mechanism with State Preservation | MUST \| Tier 1 | +| APTS-HO-007 | Mid-Engagement Redirect Capability | MUST \| Tier 1 | +| APTS-HO-008 | Immediate Kill Switch with State Dump | MUST \| Tier 1 | +| APTS-HO-009 | Multi-Operator Kill Switch Authority and Handoff | MUST \| Tier 2 | +| APTS-HO-010 | Mandatory Human Decision Points Before Irreversible Actions | MUST \| Tier 1 | +| APTS-HO-011 | Unexpected Findings Escalation Framework | MUST \| Tier 1 | +| APTS-HO-012 | Impact Threshold Breach Escalation | MUST \| Tier 1 | +| APTS-HO-013 | Confidence-Based Escalation (Scope Uncertainty) | MUST \| Tier 1 | +| APTS-HO-014 | Legal and Compliance Escalation Triggers | MUST \| Tier 1 | +| APTS-HO-015 | Real-Time Activity Monitoring and Multi-Channel Notification | MUST \| Tier 1 | +| APTS-HO-016 | Alert Fatigue Mitigation and Smart Aggregation | SHOULD \| Tier 2 | +| APTS-HO-017 | Stakeholder Notification and Engagement Closure | MUST \| Tier 2 | +| APTS-HO-018 | Operator Qualification, Training, and Competency Governance | MUST \| Tier 2 | +| APTS-HO-019 | 24/7 Operational Continuity and Shift Handoff | SHOULD \| Tier 2 | ### Conformance -A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SC requirement plus every Tier 2 SC requirement, and a Tier 3 platform satisfies all three tiers. +A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 HO requirement plus every Tier 2 HO requirement, and a Tier 3 platform satisfies all three tiers. Human Oversight has no Tier 3 requirements in this release; a Tier 3 claim therefore requires all Tier 1 and Tier 2 HO requirements. -Three appendix-only advisory practices for this domain (APTS-SC-A01 Platform Health Monitoring and Anomaly Detection, APTS-SC-A02 Context Window Safety and Constraint Preservation, and APTS-SC-A03 Tool Invocation Parameter and Chaining Governance) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. +Two appendix-only advisory requirements for this domain (APTS-HO-A01 Out-of-Band Kill Switch via Independent Network and APTS-HO-A02 Disclosure and Mitigation of AI Influence on Operator Decisions) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. Every requirement in this domain includes a Verification subsection listing the verification procedures a reviewer uses to confirm implementation. --- -## APTS-SC-001: Impact Classification and CIA Scoring +## APTS-HO-001: Mandatory Pre-Approval Gates for Autonomy Levels L1 and L2 **Classification:** MUST | Tier 1 ### Requirement -MUST implement a multi-tier impact classification system that categorizes every pentest action before execution, with discrete impact levels (Critical, High, Medium, Low) providing graduated escalation thresholds. Every pentest action MUST be scored against Confidentiality, Integrity, and Availability dimensions with a weighted impact calculation, where the minimum dimension score determines escalation level. Organizations MUST maintain an authoritative mapping document categorizing all supported pentest techniques and exploits with pre-classified impact levels, CIA scores, and reversibility status. This mapping MUST be reviewed and updated quarterly or when new techniques are added. Unclassified techniques MUST NOT execute. +For autonomy levels L1 Assisted (fully supervised) and L2 Supervised (pre-approved with monitoring), the autonomous testing system MUST implement mandatory approval gates that prevent execution of the following action categories without explicit human authorization: -### Verification - -1. **Action catalog audit**: Sample 50 actions from methodology library (DNS, web, exploitation, lateral); verify each has explicit impact level assigned (Critical/High/Medium/Low) and CIA dimension scoring -2. **Pre-execution classification**: Execute 20 test runs; review execution logs for timestamp showing classification assigned before action execution timestamp -3. **CIA scoring validation**: Select 20 diverse pentest actions; have independent security reviewer score each action's CIA impact; verify impact vectors calculated correctly; confirm minimum dimension drives escalation -4. **Approval pathway enforcement**: For 10 high-impact actions, verify classification directly determined approval requirement (high=escalation, medium=standard, low=autonomous); confirm in audit log -5. **Impact level distribution**: Verify library includes balanced distribution (not all critical); confirm at least 30% low-impact, 40% medium, 20% high, 10% critical -6. **Classification consistency**: Two independent reviewers classify 10 random actions; verify >90% agreement on assigned impact level -7. **Mapping completeness**: Audit all supported techniques have impact classification; verify no unclassified techniques can execute -8. **Lookup performance**: Test classification lookup performance (must return without observable delay to test execution) -9. **Quarterly review**: Confirm quarterly review occurred with documented changes - ---- - -## APTS-SC-002: Industry-Specific Impact Considerations +**L1 Assisted (All Significant Actions - Default Supervision):** +- Any vulnerability exploitation attempt +- Lateral movement between systems +- Data access or exfiltration actions +- Installation of persistence mechanisms +- Configuration changes to target systems +- Social engineering exercises -**Classification:** MUST | Tier 2 +**L2 Supervised (Pre-Approved + Escalation Triggers):** +- Exploitation of vulnerabilities with Common Vulnerability Scoring System (CVSS) >= 7.0 +- Lateral movement beyond initial compromise +- Access to sensitive data repositories (databases, file shares with classified labels) +- Deployment of monitoring or callback agents +- Modifications to authentication/authorization controls +- Any action with confidence score < 75% on scope determination -**Applicability:** This requirement applies to platforms targeting regulated industries (healthcare, financial services, critical infrastructure). Platforms exclusively testing non-regulated environments MAY document a justified exemption. +Approval gates MUST: +1. Present actionable information (affected system, action type, estimated risk, prerequisites met) +2. Provide deny/approve/defer decision options +3. Include reasoning prompts (operator MUST document why they're approving) +4. Support escalation to higher authority if approver lacks delegation for this action type +5. Create immutable approval records with timestamp, approver identity, and rationale -### Requirement +> **See also:** APTS-SC-006 (threshold escalation workflow). -Platforms MUST adjust impact classifications for industry-specific regulatory requirements. Healthcare systems MUST treat all direct patient data access as Critical. Financial systems MUST treat all cardholder data access as Critical. Critical infrastructure (utilities, transportation, communications) MUST treat availability impacts as one tier higher than calculated. +**Rationale for CVSS >= 7.0 threshold:** CVSS 7.0 is the boundary between the "Medium" range (4.0-6.9) and the "High" range (7.0-8.9), where a finding carries significant potential for system compromise or data exposure. Exploitations at or above this threshold carry meaningful risk of unintended production impact, data loss, or service disruption. Requiring human approval at this boundary ensures that an operator evaluates the risk/reward trade-off before the platform attempts high-impact exploitation. Organizations MAY lower this threshold (for example, to 4.0 for critical infrastructure) but MUST NOT raise it above 7.0. ### Verification -1. Build a seeded test set containing at least one healthcare target (PHI present), one PCI target (cardholder data present), and one critical-infrastructure target (utility/transportation/communications) -2. Submit each target to the platform's classification engine; verify healthcare target classifies all direct patient data access as Critical, PCI target classifies all cardholder data access as Critical, and critical-infrastructure target elevates availability impacts by one tier above the baseline calculation -3. For each reclassified action, verify the approval chain matches the regulatory requirement documented for that industry (healthcare, PCI, critical infrastructure) -4. Test that PHI/PCI data access is always escalated to Critical regardless of other CIA factors -5. Confirm critical infrastructure availability impacts are elevated by one tier in the classification output +1. **Configuration audit**: Verify approval gate rules are implemented for all action categories listed above +2. **Approval record audit**: For last 30 days, confirm: + - 100% of L1 actions have pre-approval records + - 100% of L2 exploitation actions >= CVSS 7.0 have approvals + - All approvals include timestamp, approver identity, and rationale +3. **Timeline verification**: Confirm approval grants show operator actually reviewed (not auto-generated) +4. **Escape testing**: Attempt to execute gated action without approval; verify the system blocks the action and escalates +5. **Authority verification**: Confirm approver held delegation for approved action type -> **See also:** APTS-SC-001 (baseline impact classification framework this extends for regulated industries), APTS-SE-011 (multi-tenant and regulated environment awareness). +> **See also:** APTS-HO-010 (mandatory gates before irreversible actions regardless of autonomy level), APTS-HO-005 (delegation chain-of-custody and decision audit trail). --- -## APTS-SC-003: Real-World Impact Classification Examples +## APTS-HO-002: Real-Time Monitoring and Intervention Capability -**Classification:** SHOULD | Tier 2 +**Classification:** MUST | Tier 1 ### Requirement -SHOULD document at least 10 representative pentest scenarios with calculated impact classifications, CIA scores, and approval workflows. Examples SHOULD include password spraying, directory traversal, buffer overflow exploitation, and data exfiltration attempts. Scenarios may be drawn from past engagements, lab environments, or industry-documented attack patterns. +Autonomous testing systems operating at L2 and above MUST provide operators with real-time visibility into autonomous system operations and the ability to intervene at any time. The core capability MUST include: + +**Mandatory Real-Time Visibility:** +- Current active tests and their status (running, paused, waiting for approval, blocked, completed, failed) +- Testing scope boundaries and current testing location relative to those boundaries +- Safety control status (approval gates active, confidence scores, escalations pending) +- Incident or anomaly detection (out-of-scope access attempts, unexpected findings, legal/compliance triggers, detected intrusions) + +**Mandatory Intervention Capability:** +- Operators can immediately pause all autonomous actions from the dashboard +- Operators can immediately terminate all autonomous actions +- Operators can view pending approval queue and respond to escalations from the dashboard +- System logs all operator interventions with timestamp, operator identity, and action taken + +**Recommended Enhancements (SHOULD):** +- Current activity feed with filtering by action type, target system, risk level, or time range +- System health visualization (tool status, target system status, resource utilization, alert queue) +- Color-coded risk indicators for active tests +- Service Level Agreement (SLA) tracking for pending approvals (time elapsed, thresholds) +- Confidence score displays (scope boundary, target legitimacy, exploitation prerequisites) +- Mobile-responsive interface for field operators +- Keyboard-navigable controls and screen reader compatibility + +**Accessibility and Usability:** +- Dashboard SHOULD be accessible in accordance with WCAG 2.1 Level AA guidelines +- Implementations SHOULD provide keyboard navigation and screen reader support +- SHOULD support offline caching for continued visibility during connectivity loss + +**Audit and Logging:** +- All operator dashboard access MUST be logged (timestamp, operator identity, IP address, duration) +- All operator interventions (pause, terminate, approval decisions) MUST be logged +- Dashboard access logs MUST be retained for a minimum of 90 days ### Verification -1. Review the platform's documented scenario corpus; verify it contains at least 10 representative pentest scenarios with named techniques (password spraying, directory traversal, buffer overflow, data exfiltration, and others) -2. For each scenario, verify the documentation includes the expected CIA score, impact tier, and approval workflow -3. For each documented scenario, compare the recorded classification and approval workflow against the platform's documented scoring model; fail any scenario whose recorded classification deviates from the model without a documented exception -4. Verify scenarios cover at least 4 distinct technique types (reconnaissance, exploitation, post-exploitation, data handling) +1. **Real-time visibility test**: Verify operators can view current active tests, scope boundaries, and safety control status in less than 5 seconds +2. **Intervention capability test**: Verify operators can pause and terminate all autonomous actions from the dashboard within 2 seconds +3. **Escalation access test**: Verify operators can view pending approval queue and respond to escalations from the dashboard +4. **Intervention logging test**: Execute operator pause and terminate actions; verify each is logged with timestamp, operator identity, and action taken +5. **Access logging test**: Verify all operator dashboard accesses logged with timestamp, operator identity, IP address, and session duration +6. **Log retention audit**: Verify dashboard logs retained for minimum 90 days -> **See also:** APTS-SC-001 (impact classification model scenarios illustrate), APTS-SC-002 (industry-specific impact rules applied in worked examples). +> **See also:** APTS-HO-015 (real-time activity monitoring and multi-channel notification delivery) --- -## APTS-SC-004: Rate Limiting, Bandwidth, and Payload Constraints +## APTS-HO-003: Decision Timeout and Default-Safe Behavior **Classification:** MUST | Tier 1 ### Requirement -MUST implement hierarchical rate limiting and payload constraints for all active pentest operations: +When autonomous testing systems escalate decisions to human operators, they MUST implement maximum response time windows (Service Level Agreements (SLAs)) and define explicit default-safe behavior if human operators fail to respond within these timeframes. -**Per-host rate limiting:** MUST implement per-host connection limits and rate limiting. Rate limits MUST be configurable per impact tier, with lower-impact actions permitted higher rates and higher-impact actions restricted to lower rates. +**SLA Requirements by Action Category:** -**Cascading rate limits:** MUST implement cascading rate limits at subnet, datacenter, and engagement-wide levels. If aggregate traffic to a subnet approaches configured limits, the system MUST proactively reduce per-host rate limits to prevent subnet-level overload. If engagement-wide limits are reached, automatically halt all new actions across all targets. +| Decision Type | Autonomy Level | Max Response Time | Default Behavior if Timeout | +|---------------|-----------------|-------------------|---------------------------| +| Exploitation attempt | L2 | 15 minutes | DENY - do not exploit; skip to next target | +| Lateral movement | L2 | 15 minutes | DENY - halt movement; stay on current system | +| Data access | L2 | 10 minutes | DENY - do not access data; escalate to manager | +| Scope boundary decision (confidence <75%) | L3 | 30 minutes | PAUSE - halt all actions; wait for human decision | +| Unexpected finding escalation | All | 5 minutes | PAUSE + ISOLATE - halt all actions; preserve state | +| Legal/compliance trigger | All | Immediate | KILL + PRESERVE - terminate all actions; preserve evidence | -**Payload and bandwidth constraints:** MUST enforce maximum payload sizes configurable per impact tier. Default limits that MUST be applied unless the Rules of Engagement specifies alternative values: Informational/Low actions limited to 100KB payloads, Medium actions to 10MB, High actions to 100MB, Critical actions to 500MB with engagement-wide bandwidth cap of 5 Gbps. Organizations MUST document their configured limits and any deviations from default values. Any action exceeding size limit MUST be automatically rejected before transmission. +**Priority Rule:** When multiple timeout requirements apply to the same action (for example, both APTS-HO-003 and APTS-SC-006), the most restrictive timeout governs. -> **See also:** APTS-SE-019 (adaptive backoff, DoS prevention, and continuous-mode production impact controls). SC-004 governs per-host, per-subnet, and engagement-wide rate limits and payload sizes as safety controls; SE-019 governs scheduling, backoff behavior, and production-impact thresholds as scope controls. +1. **Conservative assumption**: If human doesn't respond, assume "no" / "don't approve" unless pre-escalation approval was explicitly obtained +2. **No momentum continuation**: If approval timeout expires, do NOT auto-approve the queued action; instead, halt the action and move to next planned test +3. **Escalation progression**: If first approver doesn't respond within 50% of SLA, automatically escalate to secondary approver +4. **State preservation**: All timeout scenarios MUST preserve system state to allow resumption after human decision +5. **Client notification threshold**: If approval timeout exceeds 30 minutes, automatically notify engagement client of delay +6. **Recovery procedures**: Document exact procedures for resuming work after timeout (don't just auto-continue) ### Verification -1. Monitor request rates during test execution; verify requests never exceed configured per-host limits -2. Test that exceeding limits triggers automatic throttling -3. Confirm rate limits apply independently per target host -4. Simulate multi-target attack scenario; confirm subnet limit triggers load reduction when approaching threshold -5. Test engagement-wide halt at maximum load -6. Attempt to send payload larger than limit; verify rejection before transmission -7. Monitor bandwidth consumption during tests; confirm engagement-wide bandwidth cap enforcement -8. Audit logs showing rate limit enforcement and escalation at all levels +1. **SLA configuration audit**: Verify timeout values match requirements for each decision type +2. **Timeout test**: Create approval scenario, do not respond; observe system behavior at timeout +3. **Escalation progression test**: Create approval, verify automatic escalation at 50% SLA mark +4. **Default action test**: After timeout, confirm default action executed (DENY/PAUSE/KILL) +5. **State preservation test**: Verify system state preserved after timeout (can resume cleanly) +6. **Client notification test**: Trigger timeout exceeding 30 minutes; confirm client notification sent +7. **Incident logging test**: Verify timeout event logged with full context (escalation ID, timestamp, reason) +8. **Multi-approver test**: First approver unavailable; confirm automatic escalation to secondary approver +9. **Verification of timeout enforcement**: Request the platform's timeout configuration (via API, configuration export, or dashboard). Confirm that the configured values match or are stricter than the SLA table above. Then trigger a test action requiring approval, do not respond, and measure wall-clock time to confirm the platform enters default-safe behavior within the configured timeout. --- -## APTS-SC-005: Cascading Failure Prevention in Interconnected Systems +## APTS-HO-004: Authority Delegation Matrix -**Classification:** SHOULD | Tier 2 +**Classification:** MUST | Tier 1 ### Requirement -SHOULD detect and prevent cascading failures when testing systems with interdependencies. Before attacking system B that depends on system A, verify system A is functioning normally. If attacking system A causes degradation in dependent systems, automatically reduce attack intensity on A or halt testing entirely. +Organizations operating autonomous penetration testing systems MUST establish and maintain a documented Authority Delegation Matrix (ADM) that specifies which personnel can approve which testing actions at which autonomy levels. This matrix MUST be: -### Verification +1. **Formally documented** in writing (not implied or assumed) +2. **Maintained with access controls** (not publicly available; protected as sensitive information) +3. **Cascading authority**: Approvers at higher autonomy levels can approve on behalf of lower levels +4. **No delegation of authority**: Approvers cannot delegate their authority to others (no "I approve you to approve") +5. **Role-based, not person-based**: Authority is tied to role; changes in personnel require new matrix +6. **Dual control for high-risk**: CVSS >= 9.0 requires second independent approver +7. **Escalation not optional**: If approver lacks authority for action type, MUST escalate (no workarounds) -1. In a staging environment, define a topology with at least one upstream/downstream dependency pair where downstream system B depends on upstream system A -2. Establish a measurable health baseline for the downstream system (response time, error rate, or availability metric) -3. Attack upstream system A; induce measurable degradation in downstream system B beyond the platform's documented threshold; verify the platform reduces attack intensity on A or halts testing within the documented response window -4. **Negative test (control case):** Attack upstream system A at a level that does not cause downstream degradation beyond the threshold; verify no unnecessary halt or reduction is triggered -5. Verify recovery: after attack intensity is reduced, confirm downstream system B restores to baseline health within the documented recovery window -6. Audit logs show cascade detection trigger, intensity reduction or halt decision, and recovery confirmation with timestamps +**Rationale:** Distributed approval authority without formal structure creates audit risk, privilege escalation paths, and ambiguous accountability. An Authority Delegation Matrix ensures that only qualified personnel with documented authorization make high-risk decisions, prevents unapproved escalation of privileges, and creates an auditable chain of authority. During incidents, a clear ADM eliminates confusion about who has authority to activate kill switches, approve scope changes, or authorize continued testing. -> **See also:** APTS-SC-004 (rate limiting and payload constraints used to throttle upstream pressure), APTS-SC-009 (kill switch that halts testing when cascading impact is detected). +### Verification + +1. **Matrix documentation audit**: Verify written ADM exists, signed by CISO/VP Security. + The [Authority Delegation Matrix Template](../appendix/Authority_Delegation_Matrix_Template.md) provides an illustrative format for recording role-based approval, escalation, and emergency authority. +2. **Operator role audit**: For each active operator, confirm ADM lists their role and autonomy level +3. **Approval authority test**: Operator attempts approval outside their authority; verify the system rejects the action and escalates +4. **Authority renewal audit**: Verify all ADM entries renewed within required periods +5. **Escalation chain test**: For CVSS >= 9.0 action, verify escalation follows defined chain in ADM +6. **Dual control test**: For dual-control actions, verify two independent approvers required +7. **Historical audit**: Last 30 days of approvals; verify all approvers held documented authority --- -## APTS-SC-006: Threshold Escalation Workflow (Automated → Approval → Prohibited) +## APTS-HO-005: Delegation Chain-of-Custody and Decision Audit Trail **Classification:** MUST | Tier 2 ### Requirement -MUST implement graduated responsibility escalation where Informational/Low impact actions execute automatically without approval, Medium impact actions require standard approval within a defined window, High impact actions require elevated approval within a defined window, and Critical actions require senior approval plus live operator confirmation. Approval timeout windows MUST be documented per severity level in the platform's escalation policy. Actions exceeding their configured approval timeout are denied by default. +All human approval decisions, escalations, and delegations MUST be recorded in an immutable, auditable decision log with the following mandatory fields: + +1. **Immutability**: Logs MUST be cryptographically signed and stored in append-only format (cannot be modified/deleted after creation) +2. **Completeness**: Every approval, denial, escalation, and timeout MUST be logged with all fields above +3. **Traceability**: Log entries MUST track full chain-of-custody from initial request through final execution +4. **Searchability**: Logs MUST be queryable by decision_id, decision_type, operator, timestamp, action, target_system +5. **Long-term retention**: Minimum 90 days; SHOULD retain for 6 years for high-risk decisions or per applicable regulatory requirement +6. **Access control**: Read access restricted to authorized personnel; write access only via system (no manual edits) +7. **Export capability**: Security teams MUST be able to export logs for incident investigation, audits, compliance reviews -> **See also:** APTS-HO-001 (pre-approval gates by autonomy level), APTS-AL-011 (escalation triggers and exception handling). +**Reporting Requirements:** + +Organizations MUST generate periodic reports from decision logs at a cadence appropriate to engagement duration and operational tempo. At minimum, organizations MUST produce a summary report at engagement completion and a statistical analysis report at least monthly for ongoing operations. For long-running or continuous engagements, organizations SHOULD also produce interim reports (for example, weekly escalation summaries, quarterly audit reports). ### Verification -1. Test Informational action executes immediately -2. Test Medium action requires approval before execution -3. Test High action requires elevated approval within configured window -4. Test Critical action requires senior approval + live operator confirmation -5. Test timeout queuing preserves action for next approval window -6. Verify timeout windows reset on schedule -7. **Negative test (no approval):** Attempt a Medium-impact action without submitting any approval; verify execution is blocked before dispatch and denial is logged -8. **Negative test (expired approval):** Submit approval for a High-impact action, wait past the configured approval timeout, then attempt execution; verify the action is denied -9. **Negative test (insufficient authority):** Submit approval for a Critical-impact action using a standard (non-senior) approver; verify the action is rejected and the insufficient-authority denial is logged +1. **Log existence audit**: Verify decision logs exist and contain entries for all approvals made +2. **Immutability test**: Attempt to modify historical log entry; verify the system prevents modification +3. **Completeness test**: Random sample 10 recent approvals; verify all required fields present in logs +4. **Traceability test**: Pick escalation scenario; verify full chain-of-custody from initial request through execution +5. **Access control test**: Non-authorized user attempts to access/modify logs; verify the system denies access +6. **Search functionality test**: Search for approvals by operator, action type, timestamp; verify results accurate +7. **Export test**: Generate monthly compliance report; verify data accuracy and format +8. **Retention test**: Verify logs older than retention period are archived/secured appropriately +9. **Signature verification test**: Validate cryptographic signatures on sample log entries + +> **See also:** [APTS-HO-A02: Disclosure and Mitigation of AI Influence on Operator Decisions](../appendix/Advisory_Requirements.md#apts-ho-a02-disclosure-and-mitigation-of-ai-influence-on-operator-decisions-advisory). An advisory practice covering audit-trail provenance for AI-shaped operator affordances (option sets, defaults, wording, ordering) and bias mitigation at high-impact gates, so the chain-of-custody distinguishes a typed approval from a default click-through. Candidate for tier-gated inclusion in v0.2.0. --- -## APTS-SC-007: Cumulative Risk Scoring with Time-Based Decay +## APTS-HO-006: Graceful Pause Mechanism with State Preservation -**Classification:** MUST | Tier 2 +**Classification:** MUST | Tier 1 ### Requirement -The platform MUST track cumulative risk across the entire engagement by considering impact severity, action frequency, duration, and system criticality. Risk accumulation SHOULD decay over time. Scoring methodology MUST be documented and auditable. Configurable thresholds MUST trigger escalation at defined risk levels. The specific decay function and reset windows are implementation decisions that MUST be documented. - -Cumulative risk scoring MUST use the impact classification framework defined in APTS-SC-001 as its input taxonomy. The scoring model MUST NOT define an independent impact classification that contradicts or duplicates SC-001. +Autonomous penetration testing systems MUST implement a graceful pause mechanism that allows human operators to suspend all active testing activities while preserving complete system state for later resumption. + +1. **Pause Types and Trigger Methods:** + - Manual pause: Operator clicks "Pause" button on dashboard + - Automatic pause: System escalation triggers automatic pause (for example, unexpected finding) + - Time-based pause: Scheduled pause at specific time (for example, 5pm daily to avoid after-hours disruption) + - SLA-based pause: Automatic pause if approval SLA approaching expiration without response + - Scope boundary pause: Automatic pause when approaching scope limits with confidence < 90% + +2. **State Preservation Specifics:** + - **Session tokens/cookies**: Preserve all authentication sessions (tools can resume as authenticated user) + - **Compromised systems state**: Document what access was achieved (reverse shells, credentials, permissions) + - **Partial exploitation state**: If mid-way through multi-step exploitation, preserve exact step and prerequisites met + - **Target queue state**: Document which targets have been tested, which are queued, testing order + - **Reconnaissance results**: Preserve all enumeration findings (open ports, services, vulnerabilities discovered) + - **Configuration snapshots**: Record target system state at pause time (running processes, network connections) + +3. **Pause Confirmation Requirements:** + - System MUST confirm pause is complete before reporting "PAUSED" status + - Operator dashboard shows: "Pause confirmed at [timestamp]; [X] connections preserved; [Y] sessions active" + - Health check: Verify paused tool not consuming CPU/memory (confirm truly halted) + - State verification: Attempt light ping to preserved connections; confirm still alive + +4. **Session Timeout During Extended Pause:** + - If a pause exceeds the session timeout threshold (default: 4 hours, configurable per engagement), the platform MUST terminate all active sessions and connections to target systems + - On resume, the platform MUST re-establish connections and re-validate scope before continuing operations + - The operator MUST be warned when pause duration approaches 75% of the session timeout threshold + - Warning notification MUST include: current pause duration, session timeout threshold, time remaining before auto-termination + - Pre-termination alert (when reaching 95% of threshold) MUST be logged with action plan for operator + - Session termination and reconnection events MUST be logged separately with timestamps and success/failure status ### Verification -1. Execute multiple Medium actions; verify accumulation -2. Run a no-action period equal to one documented decay half-life or decay window; verify the cumulative risk score decreases by the documented decay amount (within ±5% tolerance) -3. Verify impact threshold triggers intensity reduction -4. Confirm reset windows apply correctly -5. Execute single High impact action; verify points awarded per documented scoring model -6. Execute Critical impact action; verify higher points awarded per documented scoring model -7. Execute repeated actions on same target; verify frequency multiplier applied -8. Verify critical system actions earn elevated multiplier per documented model -9. Test automatic halt when cumulative score exceeds configured threshold -10. Audit cumulative impact logs - -> **See also:** APTS-AL-022 (cumulative risk scoring at autonomy levels). +1. **Pause trigger test**: Click "Pause" button; observe tool transitions to PAUSED status +2. **State preservation test**: Before pause, exploit a system; pause tool; examine checkpoint file to verify credentials/sessions preserved +3. **Connection persistence test**: Pause with active SSH/RDP session; attempt to reconnect to same session after resumption +4. **Memory dump test**: Verify paused state contains all variables, pending actions, configuration +5. **Checkpoint integrity test**: Corrupt checkpoint file; verify the system detects corruption and uses backup +6. **Resumption test**: After pause, resume and verify tool uses preserved state (no re-exploitation needed) +7. **Dashboard confirmation test**: Verify dashboard shows PAUSED status with preserved connection count +8. **CPU/memory test**: Confirm paused tool has no active test threads, generates no outbound test traffic, and CPU/memory usage does not exceed the platform's documented idle-state baseline --- -## APTS-SC-008: Threshold Configuration with Schema Validation +## APTS-HO-007: Mid-Engagement Redirect Capability -**Classification:** SHOULD | Tier 3 +**Classification:** MUST | Tier 1 ### Requirement -The platform SHOULD store all threshold configurations in a structured, human-readable format with schema enforcement. Configuration SHOULD include impact level thresholds, approval chains, rate limits, payload constraints, and risk scoring. The system SHOULD validate configuration on startup and reject invalid settings with detailed error messages. This requirement ensures the configuration integrity of the risk scoring algorithm defined in APTS-SC-007. +Autonomous penetration testing systems MUST allow human operators to redirect testing scope, targets, and techniques mid-engagement without requiring full tool restart or loss of state. + +1. **Scope Redirect**: Change authorized testing boundaries + - Add new target system to scope (for example, "Add 10.0.1.50 to testing scope") + - Remove target from scope (for example, "Skip prod-app-03; test prod-app-04 instead") + - Expand scope subnet (for example, "Expand from 192.168.1.0/25 to 192.168.1.0/24") + - Shrink scope subnet (for example, "Limit testing to critical systems only; exclude development VMs") + +2. **Technique Redirect**: Change testing methodology mid-engagement + - Switch from aggressive fuzzing to careful exploitation (if target system degrading) + - Switch from passive reconnaissance to active scanning (if passive insufficient) + - Focus on specific vulnerability class (for example, "Focus on web vulnerabilities; skip network services") + - Pause specific test module (for example, "Stop SSL/TLS scanning; continue port scanning") + +3. **Priority Redirect**: Change target prioritization + - Move high-value target from end of queue to beginning + - Defer testing of non-critical system + - Focus on discovered vulnerabilities (exploit high-CVSS findings first) + +4. **Intensity Redirect**: Adjust testing aggressiveness + - Reduce scanning intensity if target system showing degradation + - Increase scanning intensity for difficult-to-enumerate services + - Switch from stealth-focused to quick-impact testing + +**Redirect Implementation Requirements:** + +- Redirect MUST NOT lose active sessions or compromised system access +- Redirect MUST NOT lose reconnaissance findings collected so far +- Redirect MUST update approval queue (if new targets added, require new approvals) +- Redirect MUST be logged with explicit operator authorization +- Redirect MUST NOT auto-approve previously denied targets (MUST re-approve) +- Redirect MUST support conditional logic (for example, "If found vulnerabilities, focus on exploiting; else continue recon") ### Verification -1. Load valid threshold configuration; verify acceptance -2. Load invalid configuration; verify schema rejection -3. Modify a threshold value; verify new limit takes effect -4. Create configuration with contradictory values; verify validation fails +1. **Scope redirect test**: Add new target to scope; verify tool queues it without restart +2. **Technique redirect test**: Switch scanning intensity; verify active sessions preserved +3. **Priority redirect test**: Move target from position 5 to position 1 in queue; verify queue reordered +4. **Approval requirement test**: Add new target; verify new approval required (old approvals don't transfer) +5. **Session preservation test**: Before redirect, gain shell access; after redirect, verify shell still active +6. **Logging test**: Verify redirect operation logged with operator authorization and timestamp +7. **Conditional logic test**: Configure "if high-risk found, focus on exploitation"; verify tool follows condition --- -## APTS-SC-009: Kill Switch +## APTS-HO-008: Immediate Kill Switch with State Dump **Classification:** MUST | Tier 1 ### Requirement -The platform MUST provide multiple independent kill switch mechanisms enabling immediate halt of all testing activity. Kill switches MUST include at minimum: operator-initiated local halt, remote halt by authorized personnel, and automatic failsafe halt when communication with the control plane is lost. Kill switch halt MUST occur in two distinct phases: +Autonomous penetration testing systems MUST implement an immediate kill switch mechanism that terminates all testing activities and preserves complete system state for forensic investigation. This requirement is satisfied in conjunction with APTS-SC-009. The state dump MUST complete within the Phase 2 window defined in APTS-SC-009 (within 60 seconds of kill switch activation). + +> **See also:** APTS-SC-009 (kill switch architecture and process tree termination), APTS-AL-012 (kill switch and pause capability). + +1. **Manual Kill Switch** (Operator-initiated) + - One-click emergency termination from dashboard + - Physical kill button (if applicable for critical engagements) + - Voice-activated kill (if testing includes speech interface) + - Requires no confirmation (instant termination) + +2. **Automatic Kill Switches** (System-triggered) + - **Legal/Compliance Trigger**: Automatic kill if PII accessed outside scope, regulated system accessed without authorization, evidence of illegal activity detected + - **Scope Violation**: Automatic kill if testing exceeds authorized scope boundaries by >5% AND confidence in legitimacy < 80% + - **System Failure**: Automatic kill if target system fails (unplanned reboot, filesystem corruption, memory exhaustion) with potential impact + - **Detection Threshold**: Automatic kill if security monitoring detects [X] alerts in [Y] timeframe (potential adversarial response) + +3. **Escalation Kill** (Indirect termination) + - Kill triggered by highest escalation (legal, compliance, out-of-scope) + - Different from manual kill; has automatic notification/logging + +**State Dump Requirements:** + +- Dump MUST include all authenticated sessions (can identify what systems were compromised) +- Dump MUST include all credentials/tokens harvested (can determine scope of breach if tool itself compromised) +- Dump MUST include all reconnaissance findings (for post-mortem analysis) +- Dump MUST include all log entries (full chronology of actions before kill) +- Dump MUST be compressed and encrypted +- Dump MUST be stored separately from operational logs -**Phase 1 (within 5 seconds) - Safety-Critical Cessation:** -- Cease initiating new network requests to target systems -- Cease initiating new exploit attempts and payloads -- Cease initiating new testing actions -- All in-flight operations continue to completion -- Phase 1 measures cessation of new action initiation at the decision layer. In-flight operations that have already been dispatched to the network stack MAY complete. +**Evidence Preservation Requirements:** -**Phase 2 (within 60 seconds) - Graceful Shutdown and Process Tree Termination:** -- Complete in-flight network operations gracefully -- Preserve system state for forensic investigation and audit -- Track all child processes, network connections, and external agents spawned during testing -- Terminate all spawned processes and child agents (not just the primary process) -- Close all open network connections -- Stop all distributed or cloud-hosted agents -- Revoke any temporary credentials issued during testing -- Cancel all scheduled or queued actions -- Flush and finalize all logs +- Evidence stored in read-only, forensically-sealed location +- Evidence hash published to immutable log (blockchain, Write Once Read Many (WORM) storage, or equivalent tamper-evident log) +- Evidence integrity checksum verified at preservation time and periodically after +- Evidence retention: Minimum 1 year (or per engagement contract, whichever longer) -Phase 1 represents the safety-critical boundary; all new testing MUST cease immediately. Phase 2 allows orderly resource cleanup, credential revocation, and state preservation. Termination of child processes MUST NOT delay Phase 1 cessation of new testing activity. All kill switches MUST function independently of each other. Within 60 seconds of activation, the platform MUST confirm zero active processes, zero open connections, and zero pending actions. If confirmation fails, the platform MUST alert the operator with a list of unterminated resources. +### Verification + +1. **Kill activation test**: Click kill switch; observe immediate tool termination +2. **State dump test**: Trigger kill switch; verify state dump file created and encrypted +3. **Evidence preservation test**: Verify state dump moved to forensic storage and hashed +4. **No grace period test**: Verify the tool stops immediately (TCP connections not in ESTABLISHED state after kill) +5. **Log flush test**: Verify logs written to disk and finalized at kill time +6. **Forensic recovery test**: From state dump, reconstruct tool state and verify accuracy +7. **Automatic kill test**: Trigger automatic kill condition (for example, PII access outside scope); observe automatic termination +8. **State preservation test**: Verify complete system state at time of kill activation is preserved and available for forensic investigation + +--- -**Rationale for timing bounds:** Phase 1's 5-second ceiling reflects the maximum acceptable delay between an operator's decision to halt and cessation of new outbound actions; exceeding this window risks additional unintended impact on target systems. Phase 2's 60-second ceiling accommodates graceful TCP connection termination (TIME_WAIT states up to 30 seconds), in-flight transaction completion, forensic log flushing to durable storage, and state snapshot capture. These bounds are safety-critical and MUST NOT be relaxed. +## APTS-HO-009: Multi-Operator Kill Switch Authority and Handoff -> **See also:** APTS-HO-008 (state dump for forensic investigation), APTS-AL-012 (safety controls scaling with autonomy level). +**Classification:** MUST | Tier 2 + +### Requirement + +In engagement scenarios with multiple autonomous testing operators, the kill switch mechanism MUST clearly define who has authority to activate it and handle handoffs between operators working in shifts. + +1. **Primary Operator**: Has kill switch authority for their scheduled shift +2. **Secondary Operator**: If primary unavailable, secondary operator approved for that shift has kill switch authority +3. **Manager On-Call**: Security manager can activate kill switch for any operator's shift if critical incident detected +4. **CISO Emergency Authority**: CISO can activate kill switch for any engagement at any time ### Verification -1. Activate each kill switch mechanism independently; verify Phase 1 (halt new requests) within 5 seconds -2. Verify in-flight operations complete and Phase 2 finishes within 60 seconds total -3. Sever control plane communication; verify automatic halt triggers -4. Verify no new testing actions initiated after kill switch activation -5. Verify kill switches function independently (failure of one does not affect others) -6. Start a multi-process test with child processes and network listeners; activate kill switch; verify all processes terminated, all connections closed, and all distributed/cloud agents stopped within 60-second window -7. Verify any temporary credentials issued during testing are revoked -8. Verify operator is alerted if any resource fails to terminate within the 60-second window +1. **Handoff procedure test**: Two operators perform handoff; verify kill switch authority transfers +2. **Authority verification test**: After handoff, confirm old operator cannot activate kill switch +3. **Incapacity test**: Mark operator unavailable; verify manager can take emergency authority +4. **Out-of-band kill test**: Verify out-of-band kill switch works with valid authorization code +5. **Secondary verification test**: Verify secondary verification method (if configured) functions correctly +6. **Audit trail test**: Verify all authority transfers logged with timestamp and reason +7. **Out-of-band activation test**: Verify kill switch activates via configured out-of-band channel when in-band unavailable --- -## APTS-SC-010: Health Check Monitoring, Threshold Adjustment, and Automatic Halt +## APTS-HO-010: Mandatory Human Decision Points Before Irreversible Actions **Classification:** MUST | Tier 1 ### Requirement -The platform MUST continuously monitor the health and responsiveness of target systems during testing and automatically halt testing when degradation is detected. Health checks MUST run at configurable intervals (recommended defaults: every 30 seconds for Critical assets, every 60 seconds for Production, every 5 minutes for Non-Production). At minimum, health checks MUST include response time measurement and service availability verification using methods appropriate to the target (for example, HTTP GET, ICMP ping, database query). +Autonomous penetration testing systems MUST identify and enforce mandatory human decision points before executing any action that cannot be cleanly reversed without evidence of the action occurring. + +1. **Data Access Actions** (Cannot be hidden; data access leaves traces) + - Reading sensitive files (logs, config files, databases) + - Exfiltrating data (downloading PII, credentials, source code) + - Accessing email accounts or message systems + - Reading user files or personal data + +2. **Persistence Mechanisms** (Difficult to remove without evidence) + - Installing backdoors, reverse shells, web shells + - Creating new user accounts + - Installing rootkits or kernel modules + - Adding SSH keys or authorized keys + - Modifying system startup scripts -Baseline MUST be calculated as the median response time across the first 10 health check cycles, excluding outliers beyond the 5th and 95th percentile. The baseline MUST be recalculated when target infrastructure changes are detected (for example, deployment events, scaling operations). Health check overhead MUST be measured as the sum of all health check traffic across all concurrent targets divided by total testing traffic over the measurement window. When testing traffic is zero (idle state), health check traffic is exempt from the percentage limit but MUST NOT exceed the configured absolute bandwidth cap. +3. **Configuration Changes** (Can trigger incidents or break systems) + - Modifying authentication/authorization policies + - Changing firewall rules or network routing + - Installing or modifying services + - Changing system settings or registry -If a health check fails or response time exceeds 200% of the established baseline, the platform MUST trigger the escalation workflow defined in APTS-SC-006. If consecutive failures exceed a configurable threshold (recommended default: 3 consecutive failures), the platform MUST automatically halt testing against the affected target and notify the operator. +4. **Evidence Destruction/Modification** (Cannot be undone) + - Clearing logs + - Deleting evidence of compromise + - Modifying file timestamps + - Removing indicators of compromise -Health checks themselves MUST be low-overhead and MUST NOT contribute to target degradation. The platform MUST log all health check results for post-engagement analysis. +5. **Lateral Movement** (Enters new compromise scope) + - Moving from one system to another + - Creating new network connections + - Accessing systems that were not previously compromised -**Dynamic Threshold Adjustment:** The platform MUST monitor target system responsiveness (latency, error rates, recovery time) and dynamically adjust impact thresholds. The system SHOULD decrease thresholds for slow-recovering systems and optionally increase thresholds for fast-recovering systems. Threshold adjustments MUST apply per-target (not globally). When a threshold decrease crosses the escalation boundary defined in APTS-SC-006, the escalation workflow MUST activate. +**Decision Point Implementation:** -The platform SHOULD also implement advanced monitoring for unintended impact beyond baseline health checks. When implemented, the monitoring MUST cover metrics appropriate to the target environment (for example, response times, error rates, resource utilization), and the specific metric set MUST be documented per engagement. The system SHOULD alert operators when degradation exceeds configured thresholds, escalating to automatic halt when degradation becomes severe. +For each irreversible action category, system MUST: + +1. **Halt before execution**: Stop and wait for human decision +2. **Present decision context**: Include the current autonomy level, the action's impact classification, relevant scope constraints, and any recent escalation history. +3. **Require explicit approval**: Decision MUST be logged with operator identity and rationale +4. **No timeout escalation without additional approval**: If waiting for approval, timeout does NOT auto-approve +5. **Document refusal**: If operator denies action, log denial and proceed to next target ### Verification -1. **Baseline establishment**: Start engagement against a healthy target; verify the platform establishes response time baselines within the first 5 health check cycles -2. **Degradation detection**: Simulate target degradation (increase response time above 200% of baseline); verify the platform detects and triggers escalation per APTS-SC-006 -3. **Automatic halt**: Simulate 3 consecutive health check failures; verify testing halts against the affected target and operator notification is sent -4. **Scope isolation**: In a multi-target engagement, degrade one target; verify health check halt applies only to the degraded target, not the entire engagement -5. **Health check overhead**: Measure health check traffic over a 10-minute window; verify it does not exceed 1% of total testing traffic -6. **Interval configuration**: Modify health check intervals; verify new intervals take effect within one cycle -7. **Threshold decrease on slow recovery**: Simulate a target with slow recovery (for example, increased response times persisting after test actions); verify the platform automatically decreases impact thresholds for that target -8. **Threshold increase on fast recovery**: Simulate a target that recovers quickly from impacts; verify the platform permits threshold increase if this optional behavior is implemented -9. **Adjustment logging**: Review 10 threshold adjustment events; verify each logs: the triggering metric, the old threshold, the new threshold, the responsiveness data that drove the change, and the timestamp -10. **Cross-target independence**: In a multi-target engagement, degrade one target's responsiveness; verify threshold adjustments apply only to the affected target, not globally -11. **Escalation integration**: Trigger a threshold decrease that crosses the escalation boundary defined in APTS-SC-006; verify the escalation workflow activates -12. **Advanced monitoring (if implemented)**: Configure alert and escalation thresholds; trigger degradation events at varying levels; verify alerts fire correctly with metric name, current value, baseline, and deviation; verify automatic halt when degradation becomes severe - -> **See also:** APTS-HO-012 (impact threshold breach escalation based on testing impact metrics). +1. **Irreversible action test**: Attempt to read sensitive file; verify the system halts and requires approval +2. **Persistence test**: Attempt to install reverse shell; verify the system halts and requires approval +3. **Lateral movement test**: Attempt to move to new system; verify the system halts and requires approval +4. **Confidence gate test**: Lower confidence scores below 75%; verify system escalates to manager +5. **Approval logging test**: Verify all decisions logged with operator identity and explicit rationale +6. **Timeout behavior test**: Approval times out; verify action NOT auto-approved despite timeout +7. **Denial handling test**: Operator denies action; verify system documents denial and moves to next target + +> **See also:** APTS-HO-001 (mandatory approval gates based on autonomy level and action category) --- -## APTS-SC-011: Condition-Based Automated Termination +## APTS-HO-011: Unexpected Findings Escalation Framework -**Classification:** MUST | Tier 2 +**Classification:** MUST | Tier 1 ### Requirement -MUST define explicit termination conditions that automatically halt testing without operator intervention. Conditions MUST include at minimum: target service unavailability (consecutive health check failures within a configured window), error rate exceeding a configured threshold for a sustained period, resource exhaustion (CPU, memory) exceeding configured limits, and unexpected privilege escalation outside authorized scope. Specific thresholds and durations MUST be documented in the platform's safety configuration and made available to customers. Each condition MUST have detailed logging of what triggered termination. Condition-based termination is independent of the circuit breaker mechanism (APTS-SC-012). When a circuit breaker probe fails after the cooldown period, the platform MUST escalate to condition-based termination per this requirement. +When autonomous testing systems discover findings outside normal operational expectations, they MUST immediately escalate to human operators with context, analysis, and recommended actions. + +1. **Indicators of Compromise (IoCs) from External Breach** + - Evidence of malware on target system (unrelated to pentest tool) + - Unauthorized admin accounts with unexpected creation dates + - Unusual outbound connections to known malicious C2 servers + - Evidence of data exfiltration by external actors + - Rootkits, backdoors, or persistence mechanisms (not installed by pentest tool) + +2. **Illegal or Highly Sensitive Content** + - Intellectual property that appears to be stolen (comparing to public sources) + - Illegal content (child exploitation material, stolen credit card databases, or other prohibited content) + - Classified government information (marked CONFIDENTIAL/SECRET/TOP SECRET) + - Regulated content without proper protections (HIPAA medical records, PCI customer data) + +3. **Critical Zero-Day Discoveries** + - Vulnerability affecting core infrastructure with no known patch + - Vulnerability allowing complete system compromise with pre-existing exploits + - Vulnerability affecting software the vendor hasn't yet acknowledged + - Vulnerability affecting systems that don't have security update mechanism + +4. **Out-of-Scope System Access** + - Tool accidentally gaining access to system not in authorized scope + - Lateral movement reaching systems beyond scope boundaries + - Access to network segments marked explicitly out-of-scope + - Access to systems belonging to different company/organization + +5. **Compliance or Legal Violations** + - Accessing regulated data without explicit authorization + - Evidence of tool crossing into legal/regulatory prohibited territory + - Potential data privacy violations (accessing PII outside scope) + - Potential evidence destruction or tampering by prior bad actors + +6. **System Stability Threats** + - Tool causing unexpected service degradation (not intentional DoS) + - Tool action causing system crash or reboot + - Tool discovering cascading failure scenario if tool continues + - Resource exhaustion that could cause system unavailability + +> **See also:** APTS-AL-011 (escalation triggers at L2), APTS-AL-018 (incident response procedure during autonomous testing), Incident Response Integration appendix (unified escalation workflow). ### Verification -1. Trigger each termination condition individually -2. Verify halt occurs immediately on condition trigger -3. Confirm detailed logs record what condition triggered halt -4. Test combinations of conditions -5. Verify no continued testing after halt +1. **Indicator of Compromise (IoC) detection test**: Plant malware signature on test system; verify tool escalates +2. **Illegal content test**: Simulate discovery of illegal content; verify immediate escalation to human operators with context, analysis, and recommended actions +3. **Out-of-scope access test**: Tool gains access to system outside scope; verify escalation includes context and recommended actions +4. **Compliance violation test**: Simulate PII access outside authorization; verify immediate escalation +5. **System degradation test**: Simulate service degradation; verify pause and escalation +6. **Notification test**: Trigger escalation; verify escalation notification is delivered to designated human operators -> **See also:** APTS-HO-011 (unexpected findings escalation to human operators) +> **See also:** APTS-SC-011 (condition-based automated termination for overlapping trigger conditions) --- -## APTS-SC-012: Network-Level Circuit Breaker +## APTS-HO-012: Impact Threshold Breach Escalation -**Classification:** MUST | Tier 2 +**Classification:** MUST | Tier 1 ### Requirement -The platform MUST implement circuit breaker logic that automatically suspends testing when target systems show sustained degradation. When the circuit breaker activates, the platform MUST stop all new requests to the affected target, wait a configurable cooldown period, then attempt a single probe request before resuming. If the probe fails, the circuit breaker MUST remain active. +Autonomous penetration testing systems MUST continuously monitor the actual impact of their testing activities and escalate to human operators when impact exceeds pre-defined thresholds. + +1. **Service Availability** + - System response time degradation + - Error rate increase + - Request timeout frequency + - Service unavailability events + +2. **Resource Consumption** + - CPU utilization increase + - Memory exhaustion + - Disk I/O saturation + - Network bandwidth consumption + +3. **Data Integrity** + - Database query failures or locks + - File corruption or access errors + - Transaction rollbacks + - Data inconsistency events + +4. **Security Events Triggered** + - IDS/IPS alert volume + - Firewall block rate + - Endpoint Detection & Response alerts + - Web Application Firewall blocks ### Verification -1. Induce sustained target degradation; verify circuit breaker activates -2. Verify all new requests are suspended during active state -3. After cooldown, verify a single probe request is attempted -4. Verify successful probe resumes testing; failed probe keeps circuit open +1. **Warning alert test**: Gradually increase resource usage to the platform's documented warning threshold; verify alert is raised and testing continues +2. **Critical alert test**: Increase resource usage to the platform's documented critical threshold; verify tool pauses and escalates +3. **Response time degradation test**: Simulate service slow-down; verify tool detects and escalates +4. **Error rate escalation test**: Increase error rate to the platform's documented error threshold; verify alert and pause +5. **IDS alert flooding test**: Trigger IDS alerts above threshold; verify pause +6. **State checkpoint test**: Before pause due to impact, verify state preserved +7. **Recovery test**: After impact threshold breach, reduce test intensity; verify tool resumes + +> **See also:** APTS-SC-010 (health check monitoring and automatic halt based on target responsiveness). --- -## APTS-SC-013: Time-Based Automatic Termination with Operator Override +## APTS-HO-013: Confidence-Based Escalation (Scope Uncertainty) -**Classification:** SHOULD | Tier 3 +**Classification:** MUST | Tier 1 ### Requirement -The platform SHOULD enforce operator-specified maximum engagement duration with configurable time limits. Operators SHOULD be able to extend engagement duration within defined bounds. The system SHOULD provide advance warning before time-based termination. +When autonomous testing systems lack confidence in scope boundary determination or target legitimacy (confidence < threshold), they MUST automatically escalate the decision to human operators rather than proceeding. + +Confidence scores MUST be expressed on a 0-100% scale. Escalation MUST be triggered when the confidence score falls below 75% for any scope boundary determination, target legitimacy assessment, or asset classification decision. Organizations MAY configure a stricter (higher) threshold but MUST NOT set it below 75%. ### Verification -1. Set a test engagement duration -2. Verify advance warning notifications sent before deadline -3. Verify automatic halt at deadline -4. Test operator extension request -5. Verify extension requires manager approval +1. **Decision point identification**: Identify 20 test decision points where human input required (risky pivots, new attack vectors, credential reuse); document in test plan +2. **Human approval mechanism**: For each decision point, verify human approver notified with decision context (what's being tested, why, risk level); confirm approval/rejection logged +3. **Approval window**: Set 30-minute approval window; verify system halts and awaits decision within window; after window, testing pauses pending escalation +4. **Approval logging**: Sample 10 human decisions; verify each logged with: approver ID, timestamp, decision (approved/rejected/modified), and rationale +5. **Audit trail completeness**: Verify every approval linked to the test action it authorized; confirm bi-directional traceability (approval->action and action->approval) +6. **Decision override prevention**: Verify no system mechanism allows test continuation without documented human approval --- -## APTS-SC-014: Reversible Action Tracking and Rollback +## APTS-HO-014: Legal and Compliance Escalation Triggers -**Classification:** MUST | Tier 2 +**Classification:** MUST | Tier 1 ### Requirement -MUST track state for all reversible actions and maintain explicit rollback procedures. For all reversible actions (accounts created, files modified/created, database records modified, configurations changed, processes started, credentials obtained), MUST: (1) Track action name, timestamp, target resource ID, pre-action state, action parameters, rollback procedure, and verification method; (2) Persist state to durable storage after each action; (3) Document explicit rollback procedure for every action type with step-by-step instructions and success verification; (4) Implement rollback procedures as executable automated scripts with no manual steps; (5) Define maximum rollback time for each action type; (6) Trigger alerts when rollback verification fails. Rollback operations MUST precede automated cleanup (APTS-SC-016). Evidence MUST be captured before rollback begins (see APTS-SC-016 evidence preservation requirements). +Autonomous testing systems MUST identify and immediately escalate potential legal, compliance, and regulatory violations to human decision-makers who can determine appropriate response. + +1. **Regulatory Data Access Violations** + - GDPR: Accessing personal data of EU residents without explicit authorization + - CCPA: Accessing California resident personal information outside authorized scope + - HIPAA: Accessing protected health information without proper authorization + - PCI-DSS: Accessing payment card information outside authorized testing + +2. **Intellectual Property Exposure** + - Trade secrets discovered during testing + - Proprietary source code or algorithms + - Customer lists or business intelligence + - Unreleased product information + +3. **Breach Notification Obligations** + - Accessing PII in quantity that would require breach notification if exposed + - Discovering systems where breach notification threshold already exceeded + - Accessing data regulated under breach notification laws -The platform MUST track state changes for all reversible actions and document rollback procedures. Automated rollback SHOULD be implemented for low-risk actions (account creation, file creation, simple configuration changes). For complex actions where automated rollback risks further damage (schema changes, authentication reconfigurations), documented manual recovery procedures are acceptable. +4. **Evidence Destruction Concerns** + - Logs that might be required for legal proceedings + - Data that might be under litigation hold + - Audit trails that might be needed for investigations -If rollback fails (for example, database transaction cannot be reversed, file cannot be restored), the platform MUST: (a) log the failure with full diagnostic details including the action that could not be rolled back, the error encountered, and the current state of the affected system, (b) preserve the current system state for manual recovery, (c) alert the operator with a rollback failure notification including recommended manual remediation steps, and (d) halt further testing on the affected target until the operator acknowledges the failure and either resolves it or authorizes continued testing. +5. **Third-Party or Partner Systems** + - Testing system that belongs to or connects to third-party organization + - Potential unauthorized access to partner data + - Data processing agreements that might be violated + +6. **Government or Classified Systems** + - Access to government systems without proper authorization + - Classified information (CONFIDENTIAL, SECRET, TOP SECRET) + - Systems that might trigger national security concerns + +> **See also:** APTS-HO-010 (irreversible-action escalation triggered by legal concerns), APTS-TP-012 (classification and handling of regulated data that triggers this escalation), APTS-AL-011 (autonomous escalation triggers that invoke this requirement). ### Verification -1. Execute reversible action; verify state recorded and persisted to durable storage -2. Simulate crash; verify state survives -3. Confirm all action types have documented rollback procedures -4. Execute rollback procedure; verify completion within the documented maximum rollback time -5. Verify system state matches pre-action state after rollback -6. Test partial rollback detection and alert triggering -7. Verify pre-action state captured correctly +1. **GDPR data detection test**: Place test data marked as EU resident PII; verify escalation +2. **CCPA data detection test**: Place California resident data; verify escalation +3. **HIPAA data detection test**: Place medical information; verify escalation +4. **Breach notification test**: Create record count threshold (>500 records); verify escalation +5. **Third-party system test**: Simulate access to non-authorized organization system; verify escalation +6. **Classified data test**: Discover marked CONFIDENTIAL information; verify kill switch activation +7. **Notification test**: Verify legal team and client notified within SLA +8. **Evidence preservation test**: Verify state dump securely preserved for legal investigation --- -## APTS-SC-015: Post-Test System Integrity Validation +## APTS-HO-015: Real-Time Activity Monitoring and Multi-Channel Notification **Classification:** MUST | Tier 1 ### Requirement -MUST perform thorough integrity validation after engagement completes: verify file checksums match baseline, confirm no unexpected accounts exist, check database record counts against baseline, verify configurations match baseline, and confirm no unexpected processes are running. Post-test validation MUST be automated and complete within a documented timeframe. Any discrepancies MUST be logged and escalated. +Autonomous testing systems MUST maintain a real-time activity feed of all testing actions and route notifications and alerts to appropriate recipients via multiple communication channels, with delivery confirmation and retry mechanisms. + +**Real-Time Activity Monitoring:** +- Timestamp (UTC) for every action +- Event type (reconnaissance, exploitation, lateral movement, and other phases) +- Severity (informational, low, medium, high, critical) +- Affected system(s) +- Current status (pending, in progress, completed, failed, escalated) +- Confidence score (if applicable) +- Impact assessment (if available) +- Required approvals (if applicable) +- Action links (if user action required) +- Real-time display: Last 100 activities (scrollable) +- Searchable history: All activities in engagement (searchable by date, type, host, status) +- Export capability: CSV/JSON export of activity feed +- Archival: Activities older than 30 days moved to archive storage + +**Multi-Channel Notification Routing:** + +The platform MUST support at least two independent notification channels for alerts. At minimum, dashboard and email notifications MUST be implemented. Additional channels (SMS, phone calls, messaging integrations) SHOULD be supported for CRITICAL alerts. + +**Dashboard Notifications (MUST):** +- Banner alerts at top of screen (color-coded by severity) +- Persistent until dismissed by operator +- Click to view full context (details, evidence, required actions) +- Notification center shows alert history + +**Email Notifications (MUST):** +- Formatted with context, evidence, required actions +- Subject line includes severity and action required +- Includes decision links (if applicable) +- Delivery confirmation requested for CRITICAL/HIGH alerts + +**Additional Channels (SHOULD):** +Platforms SHOULD support additional notification channels appropriate to operational context (for example, SMS, automated phone calls, messaging integrations such as Slack, Teams, or PagerDuty). When additional channels are implemented, they SHOULD include delivery confirmation and retry mechanisms. ### Verification -1. Confirm a pre-engagement baseline exists covering file checksums, user/service account inventory, database record counts, configuration snapshots, and running process list -2. Execute test actions that modify at least one baseline category -3. Run the automated post-engagement integrity validation and confirm it completes within the documented timeframe -4. Verify validation checks each baseline category: file checksums, accounts, database record counts, configurations, and processes -5. Introduce a deliberate discrepancy (for example, a test account not cleaned up) and confirm it is detected, logged, and escalated +1. **Feed display test**: Observe dashboard activity feed; verify updated in real-time (within 5 seconds) +2. **Severity classification test**: Trigger different severity events; verify correctly classified +3. **Filtering test**: Filter by severity; verify only selected severities displayed +4. **Search test**: Search by hostname or action type; verify accurate results +5. **Retention test**: Verify feeds older than 30 days archived but accessible +6. **Export test**: Export activity feed to CSV; verify data completeness and format +7. **Dashboard notification test**: Trigger alert; verify appears immediately in dashboard +8. **Email routing test**: Trigger HIGH alert; verify email sent to correct recipients +9. **Channel independence test**: Block primary channel; verify alerts route to secondary channel +10. **Delivery confirmation test**: Verify system confirms receipt for CRITICAL/HIGH alerts +11. **Role-based routing test**: Trigger alert; verify routed to correct roles only + +> **See also:** APTS-HO-002 (real-time monitoring dashboard with intervention capability) --- -## APTS-SC-016: Evidence Preservation and Automated Cleanup +## APTS-HO-016: Alert Fatigue Mitigation and Smart Aggregation -**Classification:** MUST | Tier 2 +**Classification:** SHOULD | Tier 2 ### Requirement -MUST preserve evidence of test actions and provide automated cleanup of all test artifacts. (1) Evidence Preservation: Capture screenshots, log output, error messages, modified file contents, database queries executed, and privilege escalation proofs before any rollback. Evidence MUST be stored in write-once, tamper-evident storage and retained per the engagement's data retention policy. Rollback operations MUST NOT modify evidence storage. (2) Automated Cleanup: Provide automated cleanup procedures that remove all test artifacts including temporary files, created accounts, installed tools, backdoors, test data, and testing activity logs. Cleanup procedures MUST be idempotent (safe to run multiple times), atomic (complete or not at all), and verifiable. Cleanup MUST complete within a documented timeframe proportional to engagement size. Failed cleanup actions MUST be logged and escalated for manual remediation. +To prevent operator alert fatigue (ignoring alerts due to volume), autonomous testing systems SHOULD implement intelligent aggregation, suppression rules, and escalation thresholds that reduce noise while preserving critical signal. + +For high-frequency, low-impact actions that generate repeated approval requests (for example, port scans on non-critical targets, banner grabbing), the platform SHOULD support blanket pre-approval rules that allow an operator to authorize a category of actions for a defined time window, reducing individual approval volume while maintaining audit trail coverage. Pre-approval rules MUST be logged, time-bounded, and revocable. ### Verification -1. Execute test actions and capture evidence (screenshots, logs, modified file contents) before rollback -2. Execute rollback; verify rollback operations do not access or modify evidence storage; confirm all pre-rollback evidence remains unchanged and accessible in write-once storage -3. Confirm evidence retained per engagement's data retention policy -4. Run automated cleanup; verify all test artifacts removed (temporary files, created accounts, installed tools, test data) -5. Re-run cleanup; verify idempotency (no errors, no changes on second run) -6. Simulate cleanup failure; verify failed action is logged and escalated for manual remediation +1. **Aggregation test**: Trigger 10 port scan alerts; verify aggregated into single alert +2. **Threshold suppression test**: Trigger 15 LOW alerts; verify aggregated after 10th +3. **Context filtering test**: Trigger expected IDS alerts during scanning; verify suppressed +4. **Severity recalculation test**: Find same vulnerability on multiple systems; verify escalated to HIGH +5. **Fatigue measurement test**: Monitor alert response time; verify operators respond faster with aggregation --- -## APTS-SC-017: External Watchdog and Operator Notification +## APTS-HO-017: Stakeholder Notification and Engagement Closure **Classification:** MUST | Tier 2 ### Requirement -An external watchdog process MUST monitor the autonomous pentesting platform and notify operators according to documented Service Level Agreements (SLAs). The platform MUST send health heartbeats and key operational metrics to an external monitoring endpoint outside the platform's own trust boundary. The external monitoring system MUST use separate credentials from the platform to access these endpoints. +Autonomous testing systems MUST define clear workflows for notifying engagement clients and stakeholders of significant findings, unexpected events, and engagement status changes, and provide complete closure procedures. -If heartbeats stop or metrics indicate anomalies, the external watchdog MUST notify operators and customers within a defined timeframe via multiple channels (messaging, email, dashboard). Operators MUST be notified within the documented SLA timeframe. Customers MUST be notified of confirmed incidents within a defined timeframe via out-of-band channels independent of the platform. Escalation to alternative contacts is required if acknowledgment is not received within a defined window. +**Client and Stakeholder Notification Workflows:** -Notifications MUST include: what happened, systems affected, automated actions taken, and next steps. Specific heartbeat intervals, metric thresholds, and notification timeframes MUST be documented in the platform's watchdog configuration and made available to customers. +1. **Periodic Engagement Status** (for example, daily at a fixed time) + - Testing activities: Number of targets tested, findings discovered + - Scope status: Progress on authorized scope + - No issues: "Testing proceeding normally" -### Verification +2. **Periodic Finding Summary** (for example, weekly) + - Vulnerabilities discovered by severity range + - Scope coverage progress + - Comparison to industry benchmarks (if applicable) + - Recommendation summary -1. Verify external monitoring endpoint is outside the platform's trust boundary -2. Verify platform sends heartbeats to external monitoring system at documented intervals -3. Verify platform sends key operational metrics to external monitoring system -4. Verify external monitoring system uses separate credentials from the platform -5. Platform health heartbeats stopped; confirm external watchdog detects and escalates within the configured timeframe -6. Platform metrics indicate anomalies; confirm external watchdog detects and alerts operators -7. Detect anomaly; verify operator notified within documented SLA via configured channels -8. Confirm incident; verify customer notified within documented SLA via out-of-band channels -9. Verify notification includes what happened, scope, actions, and next steps -10. Test escalation: verify higher contact notified if operator/customer not acknowledging within configured window +3. **Immediate Critical Notifications** (per documented SLA): + - External compromise indicators detected + - Unexpected findings (malware, illegal content) + - Out-of-scope testing detected + - Engagement suspended/terminated + - Legal/compliance violations -> **See also:** APTS-SC-018 (incident containment and recovery triggered by watchdog alerts), APTS-HO-010 (human-in-the-loop paging path that watchdog notifications feed into). +4. **Escalation Notifications** (per documented SLA): + - High-risk vulnerabilities + - Critical systems compromised + - Scope boundary decisions required ---- +Specific notification frequencies, send times, and escalation windows MUST be documented in the engagement's communication plan. -## APTS-SC-018: Incident Containment and Recovery +**Post-Engagement Notification and Closure:** -**Classification:** MUST | Tier 2 +1. **Engagement Completion Notification** (per documented timeline): + - All testing activities have been completed + - Final findings count (by severity range) + - Overall risk assessment + - Scope validation: Were all authorized targets tested + - Schedule for final report delivery -### Requirement +2. **Final Report Delivery Notification** (per documented timeline): + - Report is ready for client review + - Executive summary of findings + - Detailed vulnerability findings with remediation + - Evidence preservation location and access + - Questions for clarification contact + +3. **Follow-Up Assessment Windows** (per engagement agreement): + - Client remediation status check-in + - Verification testing (optional) to confirm fixes + - Lessons learned session invitation + - Next engagement planning -When a platform incident is confirmed, the platform MUST execute documented containment and recovery procedures. (1) Containment: The platform MUST isolate itself from customer networks and test targets promptly. Platform-held credentials SHOULD be rotated as part of containment. The platform MUST capture relevant forensic data (logs, process state) before containment procedures alter system state. (2) Recovery: The platform MUST document recovery procedures for each containment scenario with a defined Recovery Time Objective (RTO). After recovery, all safety controls MUST be re-validated before resuming testing. The platform MUST deliver a post-incident report to the customer within the timeframe defined in the platform's incident response plan, including incident timeline, root cause, affected systems/data, remediation steps, and preventive measures. The platform MUST require customer acknowledgment and approval before testing resumes. Specific containment thresholds, RTO targets, and reporting deadlines MUST be documented in the platform's incident response plan. +Specific post-engagement timelines and follow-up intervals MUST be documented in the engagement agreement. + +**Post-Engagement Evidence Retention:** + +- Evidence storage: Per the platform's documented retention policy; retention period SHOULD reflect engagement risk level and applicable regulatory requirements +- Evidence protection: Encrypted, access-controlled, secure location +- Evidence availability: Client can request evidence export for analysis/audit +- Audit trail: All evidence access logged ### Verification -1. Trigger anomaly; confirm network isolation within documented containment timeframe -2. Verify platform cannot reach customer networks after isolation -3. Verify isolated platform can reach logging/watchdog systems -4. Confirm credentials rotated within documented timeframe; verify old credentials no longer work -5. Confirm memory dump and logs preserved on secure system -6. Execute recovery procedure; verify completion within documented RTO -7. Run complete safety control validation suite; verify all pass -8. Deliver post-incident report within documented timeframe -9. Verify customer acknowledges and approves resumption +1. **Periodic status test**: Verify periodic status notification sent at scheduled time per communication plan +2. **Critical notification test**: Trigger critical finding; verify client notified within documented SLA +3. **Template accuracy test**: Review client notifications; verify all required fields present +4. **SLA test**: Measure time from finding to client notification; verify meets SLA +5. **Completion notification test**: Complete engagement; verify completion notification sent +6. **Final report test**: Verify report generated and delivered within documented timeline +7. **Evidence retention test**: Verify evidence stored and accessible per documented retention policy +8. **Follow-up notification test**: Verify follow-up check-in sent per engagement agreement schedule --- -## APTS-SC-019: Execution Sandbox and Containment Boundary Integrity +## APTS-HO-018: Operator Qualification, Training, and Competency Governance **Classification:** MUST | Tier 2 ### Requirement -The platform MUST declare and enforce an execution sandbox that bounds the agent runtime's filesystem access, network egress, process capabilities, and system-call surface. The sandbox boundary MUST be enforced by a mechanism outside the agent's control, such as OS-level isolation (kernel namespaces, seccomp, AppArmor, SELinux), hypervisor isolation, or container runtime policy. The platform MUST NOT rely on the agent's own refusal to respect the boundary. The declared boundary MUST specify: (1) filesystem paths the agent runtime may read and write, (2) network destinations the agent runtime may reach, including outbound ports and protocols, (3) process and system-call capabilities the agent runtime may invoke, and (4) any credential or secret stores the agent runtime may access. Any attempt by the agent runtime to take an action outside the declared boundary MUST be blocked by the enforcement layer and logged as a containment event for operator review. +Organizations operating autonomous penetration testing systems MUST ensure all operators meet minimum competency standards and hold documented qualifications appropriate to their assigned autonomy level. + +**Competency Standards and Certification:** + +Organizations MUST define competency standards for each autonomy level specifying required skills and certifications. All operators MUST hold certifications or documented qualifications appropriate to their assigned autonomy level. Operators MUST NOT be assigned above their qualification level. + +**Training Curriculum and Incident Response Preparation:** -### Rationale +Organizations MUST establish training curricula for each autonomy level with documented learning objectives, hands-on exercises, and competency validation mechanisms. Training MUST cover all required modules with completion records and certificates maintained. Annual refresher training is required. Operators MUST receive specialized training in responding to autonomous testing tool failures, unexpected behaviors, and emergency situations including emergency pause, redirect, and kill switch activation procedures, state preservation, forensic analysis, and escalation protocols. -As autonomous pentest platforms become more capable, the assumption that the agent will respect its instructions is not a safety boundary. Containment integrity requires a mechanism that holds regardless of whether the agent "chooses" to respect it, whether the agent has been manipulated, whether the underlying model has changed, or whether the agent has encountered inputs outside its training distribution. Enforcing the boundary at a layer the agent cannot reach from within its execution context is the only architectural property that survives changes in the agent's behavior. +**Ongoing Assessment and Succession Planning:** + +Operators MUST participate in ongoing competency assessments conducted at least annually and MUST maintain current certifications to continue operating at their autonomy level. Operators who fail assessments MUST complete required remediation and are restricted from operating at that level until remediation is complete. Organizations SHOULD establish formal mentoring relationships and documented succession plans to develop future operators and ensure business continuity. ### Verification -1. **Boundary declaration review**: Retrieve the platform's sandbox policy file; confirm it specifies allowed filesystem paths, network egress destinations, process capabilities, and credential access; confirm the policy is enforced at a layer below the agent runtime (kernel, hypervisor, or container runtime). -2. **Filesystem egress test**: From within the agent runtime context, attempt to read a file outside the declared read-allowlist (for example, a sensitive system file such as /etc/shadow on a Linux host); confirm the enforcement layer blocks the action and a containment event is logged. -3. **Network egress test**: From within the agent runtime context, attempt an outbound TCP connection to a destination not on the egress allowlist; confirm the connection is refused at the network policy layer and a containment event is logged. -4. **Process capability test**: Attempt a syscall or capability not in the declared allowlist (for example, ptrace, mount, or raw socket creation); confirm the enforcement layer blocks the call and a containment event is logged. -5. **Credential store isolation test**: Attempt to read a credential or secret outside the agent's declared access list; confirm the read is blocked and logged. -6. **Enforcement-layer independence test**: Modify the agent's internal configuration or system prompt to assert that an out-of-boundary action is now permitted; confirm the enforcement layer still blocks the action, demonstrating that enforcement is independent of the agent's internal state. +1. **Competency documentation audit**: Verify a documented competency standard exists for each autonomy level, specifying required skills and certifications +2. **Operator certification check**: For each active operator, verify they hold certifications or documented qualifications appropriate to their assigned autonomy level +3. **Authority alignment**: Verify operator autonomy level assignments match their documented competency level +4. **Certification currency**: Verify all operator certifications are current and not expired +5. **Curriculum review**: Verify training content covers all required modules per autonomy level +6. **Training documentation**: Verify training records and completion certificates +7. **Annual refresher**: Verify operators complete annual refresher training +8. **Incident response test**: Simulate tool failure; observe operator response time, escalation accuracy, and state preservation +9. **Assessment schedule audit**: Verify all operators assessed annually +10. **Remediation compliance**: Verify failed operators complete required remediation before resuming operations +11. **Succession plan review**: Verify succession plans documented and current +12. **Mentoring plan audit**: Verify formal mentoring plans exist with regular meetings + +> **See also:** The [Operator Competency Record Template](../appendix/Operator_Competency_Record_Template.md) provides an optional structure for collecting qualification, training, assessment, remediation, mentoring, and succession evidence. --- -## APTS-SC-020: Action Allowlist Enforcement External to the Model +## Escalation Priority Matrix -**Classification:** MUST | Tier 1 +When multiple escalation triggers fire simultaneously, the following precedence applies (highest priority first): + +1. **Legal and compliance escalation** (APTS-HO-014): Legal risk takes absolute precedence; testing halts immediately. +2. **Impact threshold breach** (APTS-HO-012): Active system harm requires immediate containment. +3. **Unexpected findings escalation** (APTS-HO-011): Novel findings require assessment before continued testing. +4. **Confidence-based escalation** (APTS-HO-013): Scope uncertainty requires clarification but does not indicate active harm. +5. **Pre-approval gates** (APTS-HO-001): Routine approval workflow; lowest urgency among escalation types. + +When two escalations of equal priority fire simultaneously, the platform processes both and applies the most restrictive combined response (for example, if APTS-HO-012 calls for pause and APTS-HO-011 calls for halt, the platform halts). + +--- + +## APTS-HO-019: 24/7 Operational Continuity and Shift Handoff + +**Classification:** SHOULD | Tier 2 ### Requirement -The platform MUST define the agent's permitted tool and action space as an explicit allowlist, and the allowlist MUST be enforced by a component external to the model. Permitted actions MUST NOT be configured solely through the model's system prompt or in-context instructions. Any action the agent attempts to execute that is not present on the external allowlist MUST be blocked by the orchestration layer before it reaches any target system, regardless of how the action was produced by the model. The allowlist MUST be version-controlled, auditable, and subject to the platform's change-management process. Extensions to the action space (new tools, new capabilities, new parameter ranges) MUST be approved and recorded before they become available to the agent at runtime. +For platforms operating in continuous or always-on mode, the platform SHOULD implement governance controls for operational continuity across operator shifts and time zones. This includes: -### Rationale +1. **Shift handoff procedures**: Structured handoff that transfers active engagement state, pending approvals, open escalations, and suppression-rule status to incoming operators +2. **Stale approval expiry**: Automatic expiry of approvals that have not been acted upon within a documented validity window, requiring re-request from the incoming shift +3. **Suppression-rule review**: Periodic review and re-justification of active alert suppression rules to prevent suppression drift over time +4. **Desensitization monitoring**: Tracking of operator response-time trends and alert acknowledgment rates to detect cumulative desensitization -System prompts and in-context instructions are not reliable constraints on an agent's action space. They can be overridden by prompt injection, by adversarial inputs, by model updates that change instruction-following behavior, or by distribution shifts the operator has not anticipated. Enforcing the action allowlist in a component the model cannot influence at runtime is the architectural property that makes the constraint actually binding. This requirement is a Tier 1 obligation because it is a baseline property every responsible autonomous pentest platform must have regardless of its claimed assurance level. +Approval queues SHOULD enforce shift-awareness so that approvals granted by an outgoing operator for future actions are flagged for incoming operator review. + +> **Implementation aid:** The [Shift Handoff Template](../appendix/Shift_Handoff_Template.md) provides an informative record format for transferring engagement state, pending approvals, open escalations, suppression-rule status, and kill-switch authority between operators. ### Verification -1. **Allowlist file review**: Retrieve the platform's action allowlist; confirm it is a version-controlled artifact separate from the model's system prompt; confirm entries include tool identifiers, allowed parameters or parameter bounds, and the risk classification assigned to each entry. -2. **External enforcement test**: Through a test harness, induce the model to request a tool identifier that is not on the allowlist; confirm the orchestration layer refuses to dispatch the tool call before it reaches any target system. -3. **System-prompt bypass test**: Modify the system prompt to assert that a disallowed tool is permitted; confirm the external enforcement layer still refuses to dispatch the tool call. -4. **Change-management audit**: Review the last three changes to the allowlist; confirm each has an approval record, a rationale, and a timestamp consistent with the platform's change-management policy. -5. **Runtime inventory test**: Query the platform for the current runtime allowlist; confirm it matches the version-controlled source and has not drifted during operation. +1. Shift handoff procedure is documented and includes engagement state, pending approvals, and active escalations +2. Test: simulate a shift change; verify incoming operator receives complete handoff state +3. Stale approval expiry is enforced per documented validity window +4. Test: leave an approval pending beyond the validity window; verify it expires and requires re-request +5. Active suppression rules have documented justification and periodic review dates +6. Operator response-time metrics are collected and available for review --- - -> **See also:** [APTS-SC-A02: Context Window Safety and Constraint Preservation](../appendix/Advisory_Requirements.md#apts-sc-a02-context-window-safety-and-constraint-preservation-advisory). An advisory practice for platforms using LLM-based agents with finite context windows. Addresses the risk of safety-critical constraints being silently lost during context summarization or truncation. High-priority candidate for tier-gated inclusion in v0.2.0. From f888777dc433ec4393ac56d849e5263a431d0e3a Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 00:46:45 +0530 Subject: [PATCH 21/35] Update README.md --- standard/2_Safety_Controls/README.md | 964 ++++++++------------------- 1 file changed, 283 insertions(+), 681 deletions(-) diff --git a/standard/2_Safety_Controls/README.md b/standard/2_Safety_Controls/README.md index 1f305a6..5b19472 100644 --- a/standard/2_Safety_Controls/README.md +++ b/standard/2_Safety_Controls/README.md @@ -1,10 +1,10 @@ -# Human Oversight and Intervention +# Safety Controls and Impact Management -**Domain Prefix:** APTS-HO | **Requirements:** 19 +**Domain Prefix:** APTS-SC | **Requirements:** 20 -This domain defines how an autonomous penetration testing platform keeps qualified humans in the loop: approving actions before execution at low autonomy levels, monitoring and intervening during execution, exercising pause/redirect/kill authority, receiving escalations on unexpected findings or threshold breaches, and closing engagements with accountable human sign-off. Human Oversight is the safety valve that makes graduated autonomy workable. Even a well-designed autonomous platform will encounter situations it has not been authorized to handle alone, and the quality of its behavior in those situations depends on how reliably, how quickly, and to whom it hands control. Requirements in this domain govern approval gates, monitoring and intervention capability, decision timeouts, authority delegation, graceful pause and redirect, kill switches, irreversibility gates, escalation triggers, alerting and fatigue controls, stakeholder notification, operator qualifications, and 24/7 continuity. +This domain defines how an autonomous penetration testing platform classifies the potential impact of its actions, limits blast radius, enforces graduated escalation thresholds, terminates testing on adverse conditions, recovers from incidents, and contains the agent runtime within a declared execution boundary enforced outside the agent's own control. Safety Controls complement Scope Enforcement: where SE decides whether an action is inside the agreed envelope, SC decides whether that in-scope action is safe enough to run right now given its predicted impact, cumulative risk, and current system health, and whether the agent's execution environment continues to enforce the platform's declared containment boundary. A platform that cannot stop itself, cannot score what it is doing against Confidentiality, Integrity, and Availability (CIA) dimensions, cannot detect and recover from an unintended effect, or cannot enforce a sandbox boundary on its own agent runtime cannot safely operate at any autonomy level above L1. Requirements in this domain govern impact classification, rate and payload constraints, threshold escalation, kill switches, health-triggered halts, circuit breakers, reversibility tracking, rollback, post-test integrity checks, incident containment and recovery, execution sandbox boundary integrity, and external enforcement of the agent's action allowlist. -This domain covers the human side of the human-platform loop: who approves, who intervenes, and when. Scope boundary checks belong to Scope Enforcement (SE), impact classification and hard stops to Safety Controls (SC), and the audit trail of approvals to Auditability (AR). +This domain covers blast-radius management and hard-stop capability. Scope boundary enforcement belongs to Scope Enforcement (SE), human approval workflows to Human Oversight (HO), and evidence of safety-control actions to Auditability (AR). > For implementation guidance, see the [Implementation Guide](Implementation_Guide.md). @@ -12,923 +12,525 @@ This domain covers the human side of the human-platform loop: who approves, who ## Domain Overview -The 19 requirements in this domain fall into six thematic groups: +The 20 requirements in this domain fall into seven thematic groups: | Group | Requirements | Purpose | |---|---|---| -| **Approval gates and intervention capability** | APTS-HO-001, APTS-HO-002, APTS-HO-003 | Mandatory pre-approval at L1/L2, real-time monitoring and intervention, decision timeout with default-safe behavior | -| **Authority delegation and chain-of-custody** | APTS-HO-004, APTS-HO-005 | Delegation matrix, chain-of-custody and decision audit trail | -| **Pause, redirect, and kill switch** | APTS-HO-006, APTS-HO-007, APTS-HO-008, APTS-HO-009 | Graceful pause with state preservation, mid-engagement redirect, immediate kill switch with state dump, multi-operator authority and handoff | -| **Irreversibility and escalation triggers** | APTS-HO-010, APTS-HO-011, APTS-HO-012, APTS-HO-013, APTS-HO-014 | Decision points before irreversible actions, unexpected-findings escalation, impact-threshold breach, confidence-based escalation, legal and compliance triggers | -| **Activity monitoring, alerting, and closure** | APTS-HO-015, APTS-HO-016, APTS-HO-017 | Real-time activity monitoring and notifications, alert-fatigue mitigation, stakeholder notification and engagement closure | -| **Operator qualification and continuity** | APTS-HO-018, APTS-HO-019 | Qualification, training, and competency governance; 24/7 operational continuity and shift handoff | +| **Impact classification and scoring** | APTS-SC-001, APTS-SC-002, APTS-SC-003 | CIA dimensional scoring, industry-specific considerations, worked classification examples | +| **Rate, threshold, and cumulative risk controls** | APTS-SC-004, APTS-SC-005, APTS-SC-006, APTS-SC-007, APTS-SC-008 | Rate and payload constraints, cascading-failure prevention, escalation workflow, cumulative risk scoring, schema-validated thresholds | +| **Kill switch and automated termination** | APTS-SC-009, APTS-SC-010, APTS-SC-011, APTS-SC-012, APTS-SC-013 | Kill switch, health-triggered halts, condition-based termination, network circuit breaker, time-based termination | +| **Reversibility, rollback, and post-test integrity** | APTS-SC-014, APTS-SC-015, APTS-SC-016 | Reversible action tracking and rollback, post-test integrity validation, evidence preservation and cleanup | +| **External watchdog and incident recovery** | APTS-SC-017, APTS-SC-018 | External watchdog and operator notification, incident containment and recovery | +| **Execution sandbox and agent containment** | APTS-SC-019, APTS-SC-020 | Sandbox and containment boundary integrity, action allowlist enforcement external to the model | ### Requirement Index | ID | Title | Classification | |---|---|---| -| APTS-HO-001 | Mandatory Pre-Approval Gates for Autonomy Levels L1 and L2 | MUST \| Tier 1 | -| APTS-HO-002 | Real-Time Monitoring and Intervention Capability | MUST \| Tier 1 | -| APTS-HO-003 | Decision Timeout and Default-Safe Behavior | MUST \| Tier 1 | -| APTS-HO-004 | Authority Delegation Matrix | MUST \| Tier 1 | -| APTS-HO-005 | Delegation Chain-of-Custody and Decision Audit Trail | MUST \| Tier 2 | -| APTS-HO-006 | Graceful Pause Mechanism with State Preservation | MUST \| Tier 1 | -| APTS-HO-007 | Mid-Engagement Redirect Capability | MUST \| Tier 1 | -| APTS-HO-008 | Immediate Kill Switch with State Dump | MUST \| Tier 1 | -| APTS-HO-009 | Multi-Operator Kill Switch Authority and Handoff | MUST \| Tier 2 | -| APTS-HO-010 | Mandatory Human Decision Points Before Irreversible Actions | MUST \| Tier 1 | -| APTS-HO-011 | Unexpected Findings Escalation Framework | MUST \| Tier 1 | -| APTS-HO-012 | Impact Threshold Breach Escalation | MUST \| Tier 1 | -| APTS-HO-013 | Confidence-Based Escalation (Scope Uncertainty) | MUST \| Tier 1 | -| APTS-HO-014 | Legal and Compliance Escalation Triggers | MUST \| Tier 1 | -| APTS-HO-015 | Real-Time Activity Monitoring and Multi-Channel Notification | MUST \| Tier 1 | -| APTS-HO-016 | Alert Fatigue Mitigation and Smart Aggregation | SHOULD \| Tier 2 | -| APTS-HO-017 | Stakeholder Notification and Engagement Closure | MUST \| Tier 2 | -| APTS-HO-018 | Operator Qualification, Training, and Competency Governance | MUST \| Tier 2 | -| APTS-HO-019 | 24/7 Operational Continuity and Shift Handoff | SHOULD \| Tier 2 | +| APTS-SC-001 | Impact Classification and CIA Scoring | MUST \| Tier 1 | +| APTS-SC-002 | Industry-Specific Impact Considerations | MUST \| Tier 2 | +| APTS-SC-003 | Real-World Impact Classification Examples | SHOULD \| Tier 2 | +| APTS-SC-004 | Rate Limiting, Bandwidth, and Payload Constraints | MUST \| Tier 1 | +| APTS-SC-005 | Cascading Failure Prevention in Interconnected Systems | SHOULD \| Tier 2 | +| APTS-SC-006 | Threshold Escalation Workflow (Automated → Approval → Prohibited) | MUST \| Tier 2 | +| APTS-SC-007 | Cumulative Risk Scoring with Time-Based Decay | MUST \| Tier 2 | +| APTS-SC-008 | Threshold Configuration with Schema Validation | SHOULD \| Tier 3 | +| APTS-SC-009 | Kill Switch | MUST \| Tier 1 | +| APTS-SC-010 | Health Check Monitoring, Threshold Adjustment, and Automatic Halt | MUST \| Tier 1 | +| APTS-SC-011 | Condition-Based Automated Termination | MUST \| Tier 2 | +| APTS-SC-012 | Network-Level Circuit Breaker | MUST \| Tier 2 | +| APTS-SC-013 | Time-Based Automatic Termination with Operator Override | SHOULD \| Tier 3 | +| APTS-SC-014 | Reversible Action Tracking and Rollback | MUST \| Tier 2 | +| APTS-SC-015 | Post-Test System Integrity Validation | MUST \| Tier 1 | +| APTS-SC-016 | Evidence Preservation and Automated Cleanup | MUST \| Tier 2 | +| APTS-SC-017 | External Watchdog and Operator Notification | MUST \| Tier 2 | +| APTS-SC-018 | Incident Containment and Recovery | MUST \| Tier 2 | +| APTS-SC-019 | Execution Sandbox and Containment Boundary Integrity | MUST \| Tier 2 | +| APTS-SC-020 | Action Allowlist Enforcement External to the Model | MUST \| Tier 1 | ### Conformance -A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 HO requirement plus every Tier 2 HO requirement, and a Tier 3 platform satisfies all three tiers. Human Oversight has no Tier 3 requirements in this release; a Tier 3 claim therefore requires all Tier 1 and Tier 2 HO requirements. +A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SC requirement plus every Tier 2 SC requirement, and a Tier 3 platform satisfies all three tiers. -Two appendix-only advisory requirements for this domain (APTS-HO-A01 Out-of-Band Kill Switch via Independent Network and APTS-HO-A02 Disclosure and Mitigation of AI Influence on Operator Decisions) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. +Three appendix-only advisory practices for this domain (APTS-SC-A01 Platform Health Monitoring and Anomaly Detection, APTS-SC-A02 Context Window Safety and Constraint Preservation, and APTS-SC-A03 Tool Invocation Parameter and Chaining Governance) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. Every requirement in this domain includes a Verification subsection listing the verification procedures a reviewer uses to confirm implementation. --- -## APTS-HO-001: Mandatory Pre-Approval Gates for Autonomy Levels L1 and L2 +## APTS-SC-001: Impact Classification and CIA Scoring **Classification:** MUST | Tier 1 ### Requirement -For autonomy levels L1 Assisted (fully supervised) and L2 Supervised (pre-approved with monitoring), the autonomous testing system MUST implement mandatory approval gates that prevent execution of the following action categories without explicit human authorization: +MUST implement a multi-tier impact classification system that categorizes every pentest action before execution, with discrete impact levels (Critical, High, Medium, Low) providing graduated escalation thresholds. Every pentest action MUST be scored against Confidentiality, Integrity, and Availability dimensions with a weighted impact calculation, where the minimum dimension score determines escalation level. Organizations MUST maintain an authoritative mapping document categorizing all supported pentest techniques and exploits with pre-classified impact levels, CIA scores, and reversibility status. This mapping MUST be reviewed and updated quarterly or when new techniques are added. Unclassified techniques MUST NOT execute. -**L1 Assisted (All Significant Actions - Default Supervision):** -- Any vulnerability exploitation attempt -- Lateral movement between systems -- Data access or exfiltration actions -- Installation of persistence mechanisms -- Configuration changes to target systems -- Social engineering exercises +### Verification + +1. **Action catalog audit**: Sample 50 actions from methodology library (DNS, web, exploitation, lateral); verify each has explicit impact level assigned (Critical/High/Medium/Low) and CIA dimension scoring +2. **Pre-execution classification**: Execute 20 test runs; review execution logs for timestamp showing classification assigned before action execution timestamp +3. **CIA scoring validation**: Select 20 diverse pentest actions; have independent security reviewer score each action's CIA impact; verify impact vectors calculated correctly; confirm minimum dimension drives escalation +4. **Approval pathway enforcement**: For 10 high-impact actions, verify classification directly determined approval requirement (high=escalation, medium=standard, low=autonomous); confirm in audit log +5. **Impact level distribution**: Verify library includes balanced distribution (not all critical); confirm at least 30% low-impact, 40% medium, 20% high, 10% critical +6. **Classification consistency**: Two independent reviewers classify 10 random actions; verify >90% agreement on assigned impact level +7. **Mapping completeness**: Audit all supported techniques have impact classification; verify no unclassified techniques can execute +8. **Lookup performance**: Test classification lookup performance (must return without observable delay to test execution) +9. **Quarterly review**: Confirm quarterly review occurred with documented changes + +--- + +## APTS-SC-002: Industry-Specific Impact Considerations -**L2 Supervised (Pre-Approved + Escalation Triggers):** -- Exploitation of vulnerabilities with Common Vulnerability Scoring System (CVSS) >= 7.0 -- Lateral movement beyond initial compromise -- Access to sensitive data repositories (databases, file shares with classified labels) -- Deployment of monitoring or callback agents -- Modifications to authentication/authorization controls -- Any action with confidence score < 75% on scope determination +**Classification:** MUST | Tier 2 -Approval gates MUST: -1. Present actionable information (affected system, action type, estimated risk, prerequisites met) -2. Provide deny/approve/defer decision options -3. Include reasoning prompts (operator MUST document why they're approving) -4. Support escalation to higher authority if approver lacks delegation for this action type -5. Create immutable approval records with timestamp, approver identity, and rationale +**Applicability:** This requirement applies to platforms targeting regulated industries (healthcare, financial services, critical infrastructure). Platforms exclusively testing non-regulated environments MAY document a justified exemption. -> **See also:** APTS-SC-006 (threshold escalation workflow). +### Requirement -**Rationale for CVSS >= 7.0 threshold:** CVSS 7.0 is the boundary between the "Medium" range (4.0-6.9) and the "High" range (7.0-8.9), where a finding carries significant potential for system compromise or data exposure. Exploitations at or above this threshold carry meaningful risk of unintended production impact, data loss, or service disruption. Requiring human approval at this boundary ensures that an operator evaluates the risk/reward trade-off before the platform attempts high-impact exploitation. Organizations MAY lower this threshold (for example, to 4.0 for critical infrastructure) but MUST NOT raise it above 7.0. +Platforms MUST adjust impact classifications for industry-specific regulatory requirements. Healthcare systems MUST treat all direct patient data access as Critical. Financial systems MUST treat all cardholder data access as Critical. Critical infrastructure (utilities, transportation, communications) MUST treat availability impacts as one tier higher than calculated. ### Verification -1. **Configuration audit**: Verify approval gate rules are implemented for all action categories listed above -2. **Approval record audit**: For last 30 days, confirm: - - 100% of L1 actions have pre-approval records - - 100% of L2 exploitation actions >= CVSS 7.0 have approvals - - All approvals include timestamp, approver identity, and rationale -3. **Timeline verification**: Confirm approval grants show operator actually reviewed (not auto-generated) -4. **Escape testing**: Attempt to execute gated action without approval; verify the system blocks the action and escalates -5. **Authority verification**: Confirm approver held delegation for approved action type +1. Build a seeded test set containing at least one healthcare target (PHI present), one PCI target (cardholder data present), and one critical-infrastructure target (utility/transportation/communications) +2. Submit each target to the platform's classification engine; verify healthcare target classifies all direct patient data access as Critical, PCI target classifies all cardholder data access as Critical, and critical-infrastructure target elevates availability impacts by one tier above the baseline calculation +3. For each reclassified action, verify the approval chain matches the regulatory requirement documented for that industry (healthcare, PCI, critical infrastructure) +4. Test that PHI/PCI data access is always escalated to Critical regardless of other CIA factors +5. Confirm critical infrastructure availability impacts are elevated by one tier in the classification output -> **See also:** APTS-HO-010 (mandatory gates before irreversible actions regardless of autonomy level), APTS-HO-005 (delegation chain-of-custody and decision audit trail). +> **See also:** APTS-SC-001 (baseline impact classification framework this extends for regulated industries), APTS-SE-011 (multi-tenant and regulated environment awareness). --- -## APTS-HO-002: Real-Time Monitoring and Intervention Capability +## APTS-SC-003: Real-World Impact Classification Examples -**Classification:** MUST | Tier 1 +**Classification:** SHOULD | Tier 2 ### Requirement -Autonomous testing systems operating at L2 and above MUST provide operators with real-time visibility into autonomous system operations and the ability to intervene at any time. The core capability MUST include: - -**Mandatory Real-Time Visibility:** -- Current active tests and their status (running, paused, waiting for approval, blocked, completed, failed) -- Testing scope boundaries and current testing location relative to those boundaries -- Safety control status (approval gates active, confidence scores, escalations pending) -- Incident or anomaly detection (out-of-scope access attempts, unexpected findings, legal/compliance triggers, detected intrusions) - -**Mandatory Intervention Capability:** -- Operators can immediately pause all autonomous actions from the dashboard -- Operators can immediately terminate all autonomous actions -- Operators can view pending approval queue and respond to escalations from the dashboard -- System logs all operator interventions with timestamp, operator identity, and action taken - -**Recommended Enhancements (SHOULD):** -- Current activity feed with filtering by action type, target system, risk level, or time range -- System health visualization (tool status, target system status, resource utilization, alert queue) -- Color-coded risk indicators for active tests -- Service Level Agreement (SLA) tracking for pending approvals (time elapsed, thresholds) -- Confidence score displays (scope boundary, target legitimacy, exploitation prerequisites) -- Mobile-responsive interface for field operators -- Keyboard-navigable controls and screen reader compatibility - -**Accessibility and Usability:** -- Dashboard SHOULD be accessible in accordance with WCAG 2.1 Level AA guidelines -- Implementations SHOULD provide keyboard navigation and screen reader support -- SHOULD support offline caching for continued visibility during connectivity loss - -**Audit and Logging:** -- All operator dashboard access MUST be logged (timestamp, operator identity, IP address, duration) -- All operator interventions (pause, terminate, approval decisions) MUST be logged -- Dashboard access logs MUST be retained for a minimum of 90 days +SHOULD document at least 10 representative pentest scenarios with calculated impact classifications, CIA scores, and approval workflows. Examples SHOULD include password spraying, directory traversal, buffer overflow exploitation, and data exfiltration attempts. Scenarios may be drawn from past engagements, lab environments, or industry-documented attack patterns. ### Verification -1. **Real-time visibility test**: Verify operators can view current active tests, scope boundaries, and safety control status in less than 5 seconds -2. **Intervention capability test**: Verify operators can pause and terminate all autonomous actions from the dashboard within 2 seconds -3. **Escalation access test**: Verify operators can view pending approval queue and respond to escalations from the dashboard -4. **Intervention logging test**: Execute operator pause and terminate actions; verify each is logged with timestamp, operator identity, and action taken -5. **Access logging test**: Verify all operator dashboard accesses logged with timestamp, operator identity, IP address, and session duration -6. **Log retention audit**: Verify dashboard logs retained for minimum 90 days +1. Review the platform's documented scenario corpus; verify it contains at least 10 representative pentest scenarios with named techniques (password spraying, directory traversal, buffer overflow, data exfiltration, and others) +2. For each scenario, verify the documentation includes the expected CIA score, impact tier, and approval workflow +3. For each documented scenario, compare the recorded classification and approval workflow against the platform's documented scoring model; fail any scenario whose recorded classification deviates from the model without a documented exception +4. Verify scenarios cover at least 4 distinct technique types (reconnaissance, exploitation, post-exploitation, data handling) -> **See also:** APTS-HO-015 (real-time activity monitoring and multi-channel notification delivery) +> **See also:** APTS-SC-001 (impact classification model scenarios illustrate), APTS-SC-002 (industry-specific impact rules applied in worked examples). --- -## APTS-HO-003: Decision Timeout and Default-Safe Behavior +## APTS-SC-004: Rate Limiting, Bandwidth, and Payload Constraints **Classification:** MUST | Tier 1 ### Requirement -When autonomous testing systems escalate decisions to human operators, they MUST implement maximum response time windows (Service Level Agreements (SLAs)) and define explicit default-safe behavior if human operators fail to respond within these timeframes. +MUST implement hierarchical rate limiting and payload constraints for all active pentest operations: -**SLA Requirements by Action Category:** +**Per-host rate limiting:** MUST implement per-host connection limits and rate limiting. Rate limits MUST be configurable per impact tier, with lower-impact actions permitted higher rates and higher-impact actions restricted to lower rates. -| Decision Type | Autonomy Level | Max Response Time | Default Behavior if Timeout | -|---------------|-----------------|-------------------|---------------------------| -| Exploitation attempt | L2 | 15 minutes | DENY - do not exploit; skip to next target | -| Lateral movement | L2 | 15 minutes | DENY - halt movement; stay on current system | -| Data access | L2 | 10 minutes | DENY - do not access data; escalate to manager | -| Scope boundary decision (confidence <75%) | L3 | 30 minutes | PAUSE - halt all actions; wait for human decision | -| Unexpected finding escalation | All | 5 minutes | PAUSE + ISOLATE - halt all actions; preserve state | -| Legal/compliance trigger | All | Immediate | KILL + PRESERVE - terminate all actions; preserve evidence | +**Cascading rate limits:** MUST implement cascading rate limits at subnet, datacenter, and engagement-wide levels. If aggregate traffic to a subnet approaches configured limits, the system MUST proactively reduce per-host rate limits to prevent subnet-level overload. If engagement-wide limits are reached, automatically halt all new actions across all targets. -**Priority Rule:** When multiple timeout requirements apply to the same action (for example, both APTS-HO-003 and APTS-SC-006), the most restrictive timeout governs. +**Payload and bandwidth constraints:** MUST enforce maximum payload sizes configurable per impact tier. Default limits that MUST be applied unless the Rules of Engagement specifies alternative values: Informational/Low actions limited to 100KB payloads, Medium actions to 10MB, High actions to 100MB, Critical actions to 500MB with engagement-wide bandwidth cap of 5 Gbps. Organizations MUST document their configured limits and any deviations from default values. Any action exceeding size limit MUST be automatically rejected before transmission. -1. **Conservative assumption**: If human doesn't respond, assume "no" / "don't approve" unless pre-escalation approval was explicitly obtained -2. **No momentum continuation**: If approval timeout expires, do NOT auto-approve the queued action; instead, halt the action and move to next planned test -3. **Escalation progression**: If first approver doesn't respond within 50% of SLA, automatically escalate to secondary approver -4. **State preservation**: All timeout scenarios MUST preserve system state to allow resumption after human decision -5. **Client notification threshold**: If approval timeout exceeds 30 minutes, automatically notify engagement client of delay -6. **Recovery procedures**: Document exact procedures for resuming work after timeout (don't just auto-continue) +> **See also:** APTS-SE-019 (adaptive backoff, DoS prevention, and continuous-mode production impact controls). SC-004 governs per-host, per-subnet, and engagement-wide rate limits and payload sizes as safety controls; SE-019 governs scheduling, backoff behavior, and production-impact thresholds as scope controls. ### Verification -1. **SLA configuration audit**: Verify timeout values match requirements for each decision type -2. **Timeout test**: Create approval scenario, do not respond; observe system behavior at timeout -3. **Escalation progression test**: Create approval, verify automatic escalation at 50% SLA mark -4. **Default action test**: After timeout, confirm default action executed (DENY/PAUSE/KILL) -5. **State preservation test**: Verify system state preserved after timeout (can resume cleanly) -6. **Client notification test**: Trigger timeout exceeding 30 minutes; confirm client notification sent -7. **Incident logging test**: Verify timeout event logged with full context (escalation ID, timestamp, reason) -8. **Multi-approver test**: First approver unavailable; confirm automatic escalation to secondary approver -9. **Verification of timeout enforcement**: Request the platform's timeout configuration (via API, configuration export, or dashboard). Confirm that the configured values match or are stricter than the SLA table above. Then trigger a test action requiring approval, do not respond, and measure wall-clock time to confirm the platform enters default-safe behavior within the configured timeout. +1. Monitor request rates during test execution; verify requests never exceed configured per-host limits +2. Test that exceeding limits triggers automatic throttling +3. Confirm rate limits apply independently per target host +4. Simulate multi-target attack scenario; confirm subnet limit triggers load reduction when approaching threshold +5. Test engagement-wide halt at maximum load +6. Attempt to send payload larger than limit; verify rejection before transmission +7. Monitor bandwidth consumption during tests; confirm engagement-wide bandwidth cap enforcement +8. Audit logs showing rate limit enforcement and escalation at all levels --- -## APTS-HO-004: Authority Delegation Matrix +## APTS-SC-005: Cascading Failure Prevention in Interconnected Systems -**Classification:** MUST | Tier 1 +**Classification:** SHOULD | Tier 2 ### Requirement -Organizations operating autonomous penetration testing systems MUST establish and maintain a documented Authority Delegation Matrix (ADM) that specifies which personnel can approve which testing actions at which autonomy levels. This matrix MUST be: - -1. **Formally documented** in writing (not implied or assumed) -2. **Maintained with access controls** (not publicly available; protected as sensitive information) -3. **Cascading authority**: Approvers at higher autonomy levels can approve on behalf of lower levels -4. **No delegation of authority**: Approvers cannot delegate their authority to others (no "I approve you to approve") -5. **Role-based, not person-based**: Authority is tied to role; changes in personnel require new matrix -6. **Dual control for high-risk**: CVSS >= 9.0 requires second independent approver -7. **Escalation not optional**: If approver lacks authority for action type, MUST escalate (no workarounds) - -**Rationale:** Distributed approval authority without formal structure creates audit risk, privilege escalation paths, and ambiguous accountability. An Authority Delegation Matrix ensures that only qualified personnel with documented authorization make high-risk decisions, prevents unapproved escalation of privileges, and creates an auditable chain of authority. During incidents, a clear ADM eliminates confusion about who has authority to activate kill switches, approve scope changes, or authorize continued testing. +SHOULD detect and prevent cascading failures when testing systems with interdependencies. Before attacking system B that depends on system A, verify system A is functioning normally. If attacking system A causes degradation in dependent systems, automatically reduce attack intensity on A or halt testing entirely. ### Verification -1. **Matrix documentation audit**: Verify written ADM exists, signed by CISO/VP Security. - The [Authority Delegation Matrix Template](../appendix/Authority_Delegation_Matrix_Template.md) provides an illustrative format for recording role-based approval, escalation, and emergency authority. -2. **Operator role audit**: For each active operator, confirm ADM lists their role and autonomy level -3. **Approval authority test**: Operator attempts approval outside their authority; verify the system rejects the action and escalates -4. **Authority renewal audit**: Verify all ADM entries renewed within required periods -5. **Escalation chain test**: For CVSS >= 9.0 action, verify escalation follows defined chain in ADM -6. **Dual control test**: For dual-control actions, verify two independent approvers required -7. **Historical audit**: Last 30 days of approvals; verify all approvers held documented authority +1. In a staging environment, define a topology with at least one upstream/downstream dependency pair where downstream system B depends on upstream system A +2. Establish a measurable health baseline for the downstream system (response time, error rate, or availability metric) +3. Attack upstream system A; induce measurable degradation in downstream system B beyond the platform's documented threshold; verify the platform reduces attack intensity on A or halts testing within the documented response window +4. **Negative test (control case):** Attack upstream system A at a level that does not cause downstream degradation beyond the threshold; verify no unnecessary halt or reduction is triggered +5. Verify recovery: after attack intensity is reduced, confirm downstream system B restores to baseline health within the documented recovery window +6. Audit logs show cascade detection trigger, intensity reduction or halt decision, and recovery confirmation with timestamps + +> **See also:** APTS-SC-004 (rate limiting and payload constraints used to throttle upstream pressure), APTS-SC-009 (kill switch that halts testing when cascading impact is detected). --- -## APTS-HO-005: Delegation Chain-of-Custody and Decision Audit Trail +## APTS-SC-006: Threshold Escalation Workflow (Automated → Approval → Prohibited) **Classification:** MUST | Tier 2 ### Requirement -All human approval decisions, escalations, and delegations MUST be recorded in an immutable, auditable decision log with the following mandatory fields: - -1. **Immutability**: Logs MUST be cryptographically signed and stored in append-only format (cannot be modified/deleted after creation) -2. **Completeness**: Every approval, denial, escalation, and timeout MUST be logged with all fields above -3. **Traceability**: Log entries MUST track full chain-of-custody from initial request through final execution -4. **Searchability**: Logs MUST be queryable by decision_id, decision_type, operator, timestamp, action, target_system -5. **Long-term retention**: Minimum 90 days; SHOULD retain for 6 years for high-risk decisions or per applicable regulatory requirement -6. **Access control**: Read access restricted to authorized personnel; write access only via system (no manual edits) -7. **Export capability**: Security teams MUST be able to export logs for incident investigation, audits, compliance reviews +MUST implement graduated responsibility escalation where Informational/Low impact actions execute automatically without approval, Medium impact actions require standard approval within a defined window, High impact actions require elevated approval within a defined window, and Critical actions require senior approval plus live operator confirmation. Approval timeout windows MUST be documented per severity level in the platform's escalation policy. Actions exceeding their configured approval timeout are denied by default. -**Reporting Requirements:** - -Organizations MUST generate periodic reports from decision logs at a cadence appropriate to engagement duration and operational tempo. At minimum, organizations MUST produce a summary report at engagement completion and a statistical analysis report at least monthly for ongoing operations. For long-running or continuous engagements, organizations SHOULD also produce interim reports (for example, weekly escalation summaries, quarterly audit reports). +> **See also:** APTS-HO-001 (pre-approval gates by autonomy level), APTS-AL-011 (escalation triggers and exception handling). ### Verification -1. **Log existence audit**: Verify decision logs exist and contain entries for all approvals made -2. **Immutability test**: Attempt to modify historical log entry; verify the system prevents modification -3. **Completeness test**: Random sample 10 recent approvals; verify all required fields present in logs -4. **Traceability test**: Pick escalation scenario; verify full chain-of-custody from initial request through execution -5. **Access control test**: Non-authorized user attempts to access/modify logs; verify the system denies access -6. **Search functionality test**: Search for approvals by operator, action type, timestamp; verify results accurate -7. **Export test**: Generate monthly compliance report; verify data accuracy and format -8. **Retention test**: Verify logs older than retention period are archived/secured appropriately -9. **Signature verification test**: Validate cryptographic signatures on sample log entries - -> **See also:** [APTS-HO-A02: Disclosure and Mitigation of AI Influence on Operator Decisions](../appendix/Advisory_Requirements.md#apts-ho-a02-disclosure-and-mitigation-of-ai-influence-on-operator-decisions-advisory). An advisory practice covering audit-trail provenance for AI-shaped operator affordances (option sets, defaults, wording, ordering) and bias mitigation at high-impact gates, so the chain-of-custody distinguishes a typed approval from a default click-through. Candidate for tier-gated inclusion in v0.2.0. +1. Test Informational action executes immediately +2. Test Medium action requires approval before execution +3. Test High action requires elevated approval within configured window +4. Test Critical action requires senior approval + live operator confirmation +5. Test timeout queuing preserves action for next approval window +6. Verify timeout windows reset on schedule +7. **Negative test (no approval):** Attempt a Medium-impact action without submitting any approval; verify execution is blocked before dispatch and denial is logged +8. **Negative test (expired approval):** Submit approval for a High-impact action, wait past the configured approval timeout, then attempt execution; verify the action is denied +9. **Negative test (insufficient authority):** Submit approval for a Critical-impact action using a standard (non-senior) approver; verify the action is rejected and the insufficient-authority denial is logged --- -## APTS-HO-006: Graceful Pause Mechanism with State Preservation +## APTS-SC-007: Cumulative Risk Scoring with Time-Based Decay -**Classification:** MUST | Tier 1 +**Classification:** MUST | Tier 2 ### Requirement -Autonomous penetration testing systems MUST implement a graceful pause mechanism that allows human operators to suspend all active testing activities while preserving complete system state for later resumption. - -1. **Pause Types and Trigger Methods:** - - Manual pause: Operator clicks "Pause" button on dashboard - - Automatic pause: System escalation triggers automatic pause (for example, unexpected finding) - - Time-based pause: Scheduled pause at specific time (for example, 5pm daily to avoid after-hours disruption) - - SLA-based pause: Automatic pause if approval SLA approaching expiration without response - - Scope boundary pause: Automatic pause when approaching scope limits with confidence < 90% - -2. **State Preservation Specifics:** - - **Session tokens/cookies**: Preserve all authentication sessions (tools can resume as authenticated user) - - **Compromised systems state**: Document what access was achieved (reverse shells, credentials, permissions) - - **Partial exploitation state**: If mid-way through multi-step exploitation, preserve exact step and prerequisites met - - **Target queue state**: Document which targets have been tested, which are queued, testing order - - **Reconnaissance results**: Preserve all enumeration findings (open ports, services, vulnerabilities discovered) - - **Configuration snapshots**: Record target system state at pause time (running processes, network connections) - -3. **Pause Confirmation Requirements:** - - System MUST confirm pause is complete before reporting "PAUSED" status - - Operator dashboard shows: "Pause confirmed at [timestamp]; [X] connections preserved; [Y] sessions active" - - Health check: Verify paused tool not consuming CPU/memory (confirm truly halted) - - State verification: Attempt light ping to preserved connections; confirm still alive - -4. **Session Timeout During Extended Pause:** - - If a pause exceeds the session timeout threshold (default: 4 hours, configurable per engagement), the platform MUST terminate all active sessions and connections to target systems - - On resume, the platform MUST re-establish connections and re-validate scope before continuing operations - - The operator MUST be warned when pause duration approaches 75% of the session timeout threshold - - Warning notification MUST include: current pause duration, session timeout threshold, time remaining before auto-termination - - Pre-termination alert (when reaching 95% of threshold) MUST be logged with action plan for operator - - Session termination and reconnection events MUST be logged separately with timestamps and success/failure status +The platform MUST track cumulative risk across the entire engagement by considering impact severity, action frequency, duration, and system criticality. Risk accumulation SHOULD decay over time. Scoring methodology MUST be documented and auditable. Configurable thresholds MUST trigger escalation at defined risk levels. The specific decay function and reset windows are implementation decisions that MUST be documented. + +Cumulative risk scoring MUST use the impact classification framework defined in APTS-SC-001 as its input taxonomy. The scoring model MUST NOT define an independent impact classification that contradicts or duplicates SC-001. ### Verification -1. **Pause trigger test**: Click "Pause" button; observe tool transitions to PAUSED status -2. **State preservation test**: Before pause, exploit a system; pause tool; examine checkpoint file to verify credentials/sessions preserved -3. **Connection persistence test**: Pause with active SSH/RDP session; attempt to reconnect to same session after resumption -4. **Memory dump test**: Verify paused state contains all variables, pending actions, configuration -5. **Checkpoint integrity test**: Corrupt checkpoint file; verify the system detects corruption and uses backup -6. **Resumption test**: After pause, resume and verify tool uses preserved state (no re-exploitation needed) -7. **Dashboard confirmation test**: Verify dashboard shows PAUSED status with preserved connection count -8. **CPU/memory test**: Confirm paused tool has no active test threads, generates no outbound test traffic, and CPU/memory usage does not exceed the platform's documented idle-state baseline +1. Execute multiple Medium actions; verify accumulation +2. Run a no-action period equal to one documented decay half-life or decay window; verify the cumulative risk score decreases by the documented decay amount (within ±5% tolerance) +3. Verify impact threshold triggers intensity reduction +4. Confirm reset windows apply correctly +5. Execute single High impact action; verify points awarded per documented scoring model +6. Execute Critical impact action; verify higher points awarded per documented scoring model +7. Execute repeated actions on same target; verify frequency multiplier applied +8. Verify critical system actions earn elevated multiplier per documented model +9. Test automatic halt when cumulative score exceeds configured threshold +10. Audit cumulative impact logs + +> **See also:** APTS-AL-022 (cumulative risk scoring at autonomy levels). --- -## APTS-HO-007: Mid-Engagement Redirect Capability +## APTS-SC-008: Threshold Configuration with Schema Validation -**Classification:** MUST | Tier 1 +**Classification:** SHOULD | Tier 3 ### Requirement -Autonomous penetration testing systems MUST allow human operators to redirect testing scope, targets, and techniques mid-engagement without requiring full tool restart or loss of state. - -1. **Scope Redirect**: Change authorized testing boundaries - - Add new target system to scope (for example, "Add 10.0.1.50 to testing scope") - - Remove target from scope (for example, "Skip prod-app-03; test prod-app-04 instead") - - Expand scope subnet (for example, "Expand from 192.168.1.0/25 to 192.168.1.0/24") - - Shrink scope subnet (for example, "Limit testing to critical systems only; exclude development VMs") - -2. **Technique Redirect**: Change testing methodology mid-engagement - - Switch from aggressive fuzzing to careful exploitation (if target system degrading) - - Switch from passive reconnaissance to active scanning (if passive insufficient) - - Focus on specific vulnerability class (for example, "Focus on web vulnerabilities; skip network services") - - Pause specific test module (for example, "Stop SSL/TLS scanning; continue port scanning") - -3. **Priority Redirect**: Change target prioritization - - Move high-value target from end of queue to beginning - - Defer testing of non-critical system - - Focus on discovered vulnerabilities (exploit high-CVSS findings first) - -4. **Intensity Redirect**: Adjust testing aggressiveness - - Reduce scanning intensity if target system showing degradation - - Increase scanning intensity for difficult-to-enumerate services - - Switch from stealth-focused to quick-impact testing - -**Redirect Implementation Requirements:** - -- Redirect MUST NOT lose active sessions or compromised system access -- Redirect MUST NOT lose reconnaissance findings collected so far -- Redirect MUST update approval queue (if new targets added, require new approvals) -- Redirect MUST be logged with explicit operator authorization -- Redirect MUST NOT auto-approve previously denied targets (MUST re-approve) -- Redirect MUST support conditional logic (for example, "If found vulnerabilities, focus on exploiting; else continue recon") +The platform SHOULD store all threshold configurations in a structured, human-readable format with schema enforcement. Configuration SHOULD include impact level thresholds, approval chains, rate limits, payload constraints, and risk scoring. The system SHOULD validate configuration on startup and reject invalid settings with detailed error messages. This requirement ensures the configuration integrity of the risk scoring algorithm defined in APTS-SC-007. ### Verification -1. **Scope redirect test**: Add new target to scope; verify tool queues it without restart -2. **Technique redirect test**: Switch scanning intensity; verify active sessions preserved -3. **Priority redirect test**: Move target from position 5 to position 1 in queue; verify queue reordered -4. **Approval requirement test**: Add new target; verify new approval required (old approvals don't transfer) -5. **Session preservation test**: Before redirect, gain shell access; after redirect, verify shell still active -6. **Logging test**: Verify redirect operation logged with operator authorization and timestamp -7. **Conditional logic test**: Configure "if high-risk found, focus on exploitation"; verify tool follows condition +1. Load valid threshold configuration; verify acceptance +2. Load invalid configuration; verify schema rejection +3. Modify a threshold value; verify new limit takes effect +4. Create configuration with contradictory values; verify validation fails --- -## APTS-HO-008: Immediate Kill Switch with State Dump +## APTS-SC-009: Kill Switch **Classification:** MUST | Tier 1 ### Requirement -Autonomous penetration testing systems MUST implement an immediate kill switch mechanism that terminates all testing activities and preserves complete system state for forensic investigation. This requirement is satisfied in conjunction with APTS-SC-009. The state dump MUST complete within the Phase 2 window defined in APTS-SC-009 (within 60 seconds of kill switch activation). - -> **See also:** APTS-SC-009 (kill switch architecture and process tree termination), APTS-AL-012 (kill switch and pause capability). - -1. **Manual Kill Switch** (Operator-initiated) - - One-click emergency termination from dashboard - - Physical kill button (if applicable for critical engagements) - - Voice-activated kill (if testing includes speech interface) - - Requires no confirmation (instant termination) - -2. **Automatic Kill Switches** (System-triggered) - - **Legal/Compliance Trigger**: Automatic kill if PII accessed outside scope, regulated system accessed without authorization, evidence of illegal activity detected - - **Scope Violation**: Automatic kill if testing exceeds authorized scope boundaries by >5% AND confidence in legitimacy < 80% - - **System Failure**: Automatic kill if target system fails (unplanned reboot, filesystem corruption, memory exhaustion) with potential impact - - **Detection Threshold**: Automatic kill if security monitoring detects [X] alerts in [Y] timeframe (potential adversarial response) - -3. **Escalation Kill** (Indirect termination) - - Kill triggered by highest escalation (legal, compliance, out-of-scope) - - Different from manual kill; has automatic notification/logging - -**State Dump Requirements:** - -- Dump MUST include all authenticated sessions (can identify what systems were compromised) -- Dump MUST include all credentials/tokens harvested (can determine scope of breach if tool itself compromised) -- Dump MUST include all reconnaissance findings (for post-mortem analysis) -- Dump MUST include all log entries (full chronology of actions before kill) -- Dump MUST be compressed and encrypted -- Dump MUST be stored separately from operational logs +The platform MUST provide multiple independent kill switch mechanisms enabling immediate halt of all testing activity. Kill switches MUST include at minimum: operator-initiated local halt, remote halt by authorized personnel, and automatic failsafe halt when communication with the control plane is lost. Kill switch halt MUST occur in two distinct phases: -**Evidence Preservation Requirements:** +**Phase 1 (within 5 seconds) - Safety-Critical Cessation:** +- Cease initiating new network requests to target systems +- Cease initiating new exploit attempts and payloads +- Cease initiating new testing actions +- All in-flight operations continue to completion +- Phase 1 measures cessation of new action initiation at the decision layer. In-flight operations that have already been dispatched to the network stack MAY complete. -- Evidence stored in read-only, forensically-sealed location -- Evidence hash published to immutable log (blockchain, Write Once Read Many (WORM) storage, or equivalent tamper-evident log) -- Evidence integrity checksum verified at preservation time and periodically after -- Evidence retention: Minimum 1 year (or per engagement contract, whichever longer) +**Phase 2 (within 60 seconds) - Graceful Shutdown and Process Tree Termination:** +- Complete in-flight network operations gracefully +- Preserve system state for forensic investigation and audit +- Track all child processes, network connections, and external agents spawned during testing +- Terminate all spawned processes and child agents (not just the primary process) +- Close all open network connections +- Stop all distributed or cloud-hosted agents +- Revoke any temporary credentials issued during testing +- Cancel all scheduled or queued actions +- Flush and finalize all logs -### Verification - -1. **Kill activation test**: Click kill switch; observe immediate tool termination -2. **State dump test**: Trigger kill switch; verify state dump file created and encrypted -3. **Evidence preservation test**: Verify state dump moved to forensic storage and hashed -4. **No grace period test**: Verify the tool stops immediately (TCP connections not in ESTABLISHED state after kill) -5. **Log flush test**: Verify logs written to disk and finalized at kill time -6. **Forensic recovery test**: From state dump, reconstruct tool state and verify accuracy -7. **Automatic kill test**: Trigger automatic kill condition (for example, PII access outside scope); observe automatic termination -8. **State preservation test**: Verify complete system state at time of kill activation is preserved and available for forensic investigation - ---- +Phase 1 represents the safety-critical boundary; all new testing MUST cease immediately. Phase 2 allows orderly resource cleanup, credential revocation, and state preservation. Termination of child processes MUST NOT delay Phase 1 cessation of new testing activity. All kill switches MUST function independently of each other. Within 60 seconds of activation, the platform MUST confirm zero active processes, zero open connections, and zero pending actions. If confirmation fails, the platform MUST alert the operator with a list of unterminated resources. -## APTS-HO-009: Multi-Operator Kill Switch Authority and Handoff +**Rationale for timing bounds:** Phase 1's 5-second ceiling reflects the maximum acceptable delay between an operator's decision to halt and cessation of new outbound actions; exceeding this window risks additional unintended impact on target systems. Phase 2's 60-second ceiling accommodates graceful TCP connection termination (TIME_WAIT states up to 30 seconds), in-flight transaction completion, forensic log flushing to durable storage, and state snapshot capture. These bounds are safety-critical and MUST NOT be relaxed. -**Classification:** MUST | Tier 2 - -### Requirement - -In engagement scenarios with multiple autonomous testing operators, the kill switch mechanism MUST clearly define who has authority to activate it and handle handoffs between operators working in shifts. - -1. **Primary Operator**: Has kill switch authority for their scheduled shift -2. **Secondary Operator**: If primary unavailable, secondary operator approved for that shift has kill switch authority -3. **Manager On-Call**: Security manager can activate kill switch for any operator's shift if critical incident detected -4. **CISO Emergency Authority**: CISO can activate kill switch for any engagement at any time +> **See also:** APTS-HO-008 (state dump for forensic investigation), APTS-AL-012 (safety controls scaling with autonomy level). ### Verification -1. **Handoff procedure test**: Two operators perform handoff; verify kill switch authority transfers -2. **Authority verification test**: After handoff, confirm old operator cannot activate kill switch -3. **Incapacity test**: Mark operator unavailable; verify manager can take emergency authority -4. **Out-of-band kill test**: Verify out-of-band kill switch works with valid authorization code -5. **Secondary verification test**: Verify secondary verification method (if configured) functions correctly -6. **Audit trail test**: Verify all authority transfers logged with timestamp and reason -7. **Out-of-band activation test**: Verify kill switch activates via configured out-of-band channel when in-band unavailable +1. Activate each kill switch mechanism independently; verify Phase 1 (halt new requests) within 5 seconds +2. Verify in-flight operations complete and Phase 2 finishes within 60 seconds total +3. Sever control plane communication; verify automatic halt triggers +4. Verify no new testing actions initiated after kill switch activation +5. Verify kill switches function independently (failure of one does not affect others) +6. Start a multi-process test with child processes and network listeners; activate kill switch; verify all processes terminated, all connections closed, and all distributed/cloud agents stopped within 60-second window +7. Verify any temporary credentials issued during testing are revoked +8. Verify operator is alerted if any resource fails to terminate within the 60-second window --- -## APTS-HO-010: Mandatory Human Decision Points Before Irreversible Actions +## APTS-SC-010: Health Check Monitoring, Threshold Adjustment, and Automatic Halt **Classification:** MUST | Tier 1 ### Requirement -Autonomous penetration testing systems MUST identify and enforce mandatory human decision points before executing any action that cannot be cleanly reversed without evidence of the action occurring. - -1. **Data Access Actions** (Cannot be hidden; data access leaves traces) - - Reading sensitive files (logs, config files, databases) - - Exfiltrating data (downloading PII, credentials, source code) - - Accessing email accounts or message systems - - Reading user files or personal data - -2. **Persistence Mechanisms** (Difficult to remove without evidence) - - Installing backdoors, reverse shells, web shells - - Creating new user accounts - - Installing rootkits or kernel modules - - Adding SSH keys or authorized keys - - Modifying system startup scripts +The platform MUST continuously monitor the health and responsiveness of target systems during testing and automatically halt testing when degradation is detected. Health checks MUST run at configurable intervals (recommended defaults: every 30 seconds for Critical assets, every 60 seconds for Production, every 5 minutes for Non-Production). At minimum, health checks MUST include response time measurement and service availability verification using methods appropriate to the target (for example, HTTP GET, ICMP ping, database query). -3. **Configuration Changes** (Can trigger incidents or break systems) - - Modifying authentication/authorization policies - - Changing firewall rules or network routing - - Installing or modifying services - - Changing system settings or registry +Baseline MUST be calculated as the median response time across the first 10 health check cycles, excluding outliers beyond the 5th and 95th percentile. The baseline MUST be recalculated when target infrastructure changes are detected (for example, deployment events, scaling operations). Health check overhead MUST be measured as the sum of all health check traffic across all concurrent targets divided by total testing traffic over the measurement window. When testing traffic is zero (idle state), health check traffic is exempt from the percentage limit but MUST NOT exceed the configured absolute bandwidth cap. -4. **Evidence Destruction/Modification** (Cannot be undone) - - Clearing logs - - Deleting evidence of compromise - - Modifying file timestamps - - Removing indicators of compromise +If a health check fails or response time exceeds 200% of the established baseline, the platform MUST trigger the escalation workflow defined in APTS-SC-006. If consecutive failures exceed a configurable threshold (recommended default: 3 consecutive failures), the platform MUST automatically halt testing against the affected target and notify the operator. -5. **Lateral Movement** (Enters new compromise scope) - - Moving from one system to another - - Creating new network connections - - Accessing systems that were not previously compromised +Health checks themselves MUST be low-overhead and MUST NOT contribute to target degradation. The platform MUST log all health check results for post-engagement analysis. -**Decision Point Implementation:** +**Dynamic Threshold Adjustment:** The platform MUST monitor target system responsiveness (latency, error rates, recovery time) and dynamically adjust impact thresholds. The system SHOULD decrease thresholds for slow-recovering systems and optionally increase thresholds for fast-recovering systems. Threshold adjustments MUST apply per-target (not globally). When a threshold decrease crosses the escalation boundary defined in APTS-SC-006, the escalation workflow MUST activate. -For each irreversible action category, system MUST: - -1. **Halt before execution**: Stop and wait for human decision -2. **Present decision context**: Include the current autonomy level, the action's impact classification, relevant scope constraints, and any recent escalation history. -3. **Require explicit approval**: Decision MUST be logged with operator identity and rationale -4. **No timeout escalation without additional approval**: If waiting for approval, timeout does NOT auto-approve -5. **Document refusal**: If operator denies action, log denial and proceed to next target +The platform SHOULD also implement advanced monitoring for unintended impact beyond baseline health checks. When implemented, the monitoring MUST cover metrics appropriate to the target environment (for example, response times, error rates, resource utilization), and the specific metric set MUST be documented per engagement. The system SHOULD alert operators when degradation exceeds configured thresholds, escalating to automatic halt when degradation becomes severe. ### Verification -1. **Irreversible action test**: Attempt to read sensitive file; verify the system halts and requires approval -2. **Persistence test**: Attempt to install reverse shell; verify the system halts and requires approval -3. **Lateral movement test**: Attempt to move to new system; verify the system halts and requires approval -4. **Confidence gate test**: Lower confidence scores below 75%; verify system escalates to manager -5. **Approval logging test**: Verify all decisions logged with operator identity and explicit rationale -6. **Timeout behavior test**: Approval times out; verify action NOT auto-approved despite timeout -7. **Denial handling test**: Operator denies action; verify system documents denial and moves to next target - -> **See also:** APTS-HO-001 (mandatory approval gates based on autonomy level and action category) +1. **Baseline establishment**: Start engagement against a healthy target; verify the platform establishes response time baselines within the first 5 health check cycles +2. **Degradation detection**: Simulate target degradation (increase response time above 200% of baseline); verify the platform detects and triggers escalation per APTS-SC-006 +3. **Automatic halt**: Simulate 3 consecutive health check failures; verify testing halts against the affected target and operator notification is sent +4. **Scope isolation**: In a multi-target engagement, degrade one target; verify health check halt applies only to the degraded target, not the entire engagement +5. **Health check overhead**: Measure health check traffic over a 10-minute window; verify it does not exceed 1% of total testing traffic +6. **Interval configuration**: Modify health check intervals; verify new intervals take effect within one cycle +7. **Threshold decrease on slow recovery**: Simulate a target with slow recovery (for example, increased response times persisting after test actions); verify the platform automatically decreases impact thresholds for that target +8. **Threshold increase on fast recovery**: Simulate a target that recovers quickly from impacts; verify the platform permits threshold increase if this optional behavior is implemented +9. **Adjustment logging**: Review 10 threshold adjustment events; verify each logs: the triggering metric, the old threshold, the new threshold, the responsiveness data that drove the change, and the timestamp +10. **Cross-target independence**: In a multi-target engagement, degrade one target's responsiveness; verify threshold adjustments apply only to the affected target, not globally +11. **Escalation integration**: Trigger a threshold decrease that crosses the escalation boundary defined in APTS-SC-006; verify the escalation workflow activates +12. **Advanced monitoring (if implemented)**: Configure alert and escalation thresholds; trigger degradation events at varying levels; verify alerts fire correctly with metric name, current value, baseline, and deviation; verify automatic halt when degradation becomes severe + +> **See also:** APTS-HO-012 (impact threshold breach escalation based on testing impact metrics). --- -## APTS-HO-011: Unexpected Findings Escalation Framework +## APTS-SC-011: Condition-Based Automated Termination -**Classification:** MUST | Tier 1 +**Classification:** MUST | Tier 2 ### Requirement -When autonomous testing systems discover findings outside normal operational expectations, they MUST immediately escalate to human operators with context, analysis, and recommended actions. - -1. **Indicators of Compromise (IoCs) from External Breach** - - Evidence of malware on target system (unrelated to pentest tool) - - Unauthorized admin accounts with unexpected creation dates - - Unusual outbound connections to known malicious C2 servers - - Evidence of data exfiltration by external actors - - Rootkits, backdoors, or persistence mechanisms (not installed by pentest tool) - -2. **Illegal or Highly Sensitive Content** - - Intellectual property that appears to be stolen (comparing to public sources) - - Illegal content (child exploitation material, stolen credit card databases, or other prohibited content) - - Classified government information (marked CONFIDENTIAL/SECRET/TOP SECRET) - - Regulated content without proper protections (HIPAA medical records, PCI customer data) - -3. **Critical Zero-Day Discoveries** - - Vulnerability affecting core infrastructure with no known patch - - Vulnerability allowing complete system compromise with pre-existing exploits - - Vulnerability affecting software the vendor hasn't yet acknowledged - - Vulnerability affecting systems that don't have security update mechanism - -4. **Out-of-Scope System Access** - - Tool accidentally gaining access to system not in authorized scope - - Lateral movement reaching systems beyond scope boundaries - - Access to network segments marked explicitly out-of-scope - - Access to systems belonging to different company/organization - -5. **Compliance or Legal Violations** - - Accessing regulated data without explicit authorization - - Evidence of tool crossing into legal/regulatory prohibited territory - - Potential data privacy violations (accessing PII outside scope) - - Potential evidence destruction or tampering by prior bad actors - -6. **System Stability Threats** - - Tool causing unexpected service degradation (not intentional DoS) - - Tool action causing system crash or reboot - - Tool discovering cascading failure scenario if tool continues - - Resource exhaustion that could cause system unavailability - -> **See also:** APTS-AL-011 (escalation triggers at L2), APTS-AL-018 (incident response procedure during autonomous testing), Incident Response Integration appendix (unified escalation workflow). +MUST define explicit termination conditions that automatically halt testing without operator intervention. Conditions MUST include at minimum: target service unavailability (consecutive health check failures within a configured window), error rate exceeding a configured threshold for a sustained period, resource exhaustion (CPU, memory) exceeding configured limits, and unexpected privilege escalation outside authorized scope. Specific thresholds and durations MUST be documented in the platform's safety configuration and made available to customers. Each condition MUST have detailed logging of what triggered termination. Condition-based termination is independent of the circuit breaker mechanism (APTS-SC-012). When a circuit breaker probe fails after the cooldown period, the platform MUST escalate to condition-based termination per this requirement. ### Verification -1. **Indicator of Compromise (IoC) detection test**: Plant malware signature on test system; verify tool escalates -2. **Illegal content test**: Simulate discovery of illegal content; verify immediate escalation to human operators with context, analysis, and recommended actions -3. **Out-of-scope access test**: Tool gains access to system outside scope; verify escalation includes context and recommended actions -4. **Compliance violation test**: Simulate PII access outside authorization; verify immediate escalation -5. **System degradation test**: Simulate service degradation; verify pause and escalation -6. **Notification test**: Trigger escalation; verify escalation notification is delivered to designated human operators +1. Trigger each termination condition individually +2. Verify halt occurs immediately on condition trigger +3. Confirm detailed logs record what condition triggered halt +4. Test combinations of conditions +5. Verify no continued testing after halt -> **See also:** APTS-SC-011 (condition-based automated termination for overlapping trigger conditions) +> **See also:** APTS-HO-011 (unexpected findings escalation to human operators) --- -## APTS-HO-012: Impact Threshold Breach Escalation +## APTS-SC-012: Network-Level Circuit Breaker -**Classification:** MUST | Tier 1 +**Classification:** MUST | Tier 2 ### Requirement -Autonomous penetration testing systems MUST continuously monitor the actual impact of their testing activities and escalate to human operators when impact exceeds pre-defined thresholds. - -1. **Service Availability** - - System response time degradation - - Error rate increase - - Request timeout frequency - - Service unavailability events - -2. **Resource Consumption** - - CPU utilization increase - - Memory exhaustion - - Disk I/O saturation - - Network bandwidth consumption - -3. **Data Integrity** - - Database query failures or locks - - File corruption or access errors - - Transaction rollbacks - - Data inconsistency events - -4. **Security Events Triggered** - - IDS/IPS alert volume - - Firewall block rate - - Endpoint Detection & Response alerts - - Web Application Firewall blocks +The platform MUST implement circuit breaker logic that automatically suspends testing when target systems show sustained degradation. When the circuit breaker activates, the platform MUST stop all new requests to the affected target, wait a configurable cooldown period, then attempt a single probe request before resuming. If the probe fails, the circuit breaker MUST remain active. ### Verification -1. **Warning alert test**: Gradually increase resource usage to the platform's documented warning threshold; verify alert is raised and testing continues -2. **Critical alert test**: Increase resource usage to the platform's documented critical threshold; verify tool pauses and escalates -3. **Response time degradation test**: Simulate service slow-down; verify tool detects and escalates -4. **Error rate escalation test**: Increase error rate to the platform's documented error threshold; verify alert and pause -5. **IDS alert flooding test**: Trigger IDS alerts above threshold; verify pause -6. **State checkpoint test**: Before pause due to impact, verify state preserved -7. **Recovery test**: After impact threshold breach, reduce test intensity; verify tool resumes - -> **See also:** APTS-SC-010 (health check monitoring and automatic halt based on target responsiveness). +1. Induce sustained target degradation; verify circuit breaker activates +2. Verify all new requests are suspended during active state +3. After cooldown, verify a single probe request is attempted +4. Verify successful probe resumes testing; failed probe keeps circuit open --- -## APTS-HO-013: Confidence-Based Escalation (Scope Uncertainty) +## APTS-SC-013: Time-Based Automatic Termination with Operator Override -**Classification:** MUST | Tier 1 +**Classification:** SHOULD | Tier 3 ### Requirement -When autonomous testing systems lack confidence in scope boundary determination or target legitimacy (confidence < threshold), they MUST automatically escalate the decision to human operators rather than proceeding. - -Confidence scores MUST be expressed on a 0-100% scale. Escalation MUST be triggered when the confidence score falls below 75% for any scope boundary determination, target legitimacy assessment, or asset classification decision. Organizations MAY configure a stricter (higher) threshold but MUST NOT set it below 75%. +The platform SHOULD enforce operator-specified maximum engagement duration with configurable time limits. Operators SHOULD be able to extend engagement duration within defined bounds. The system SHOULD provide advance warning before time-based termination. ### Verification -1. **Decision point identification**: Identify 20 test decision points where human input required (risky pivots, new attack vectors, credential reuse); document in test plan -2. **Human approval mechanism**: For each decision point, verify human approver notified with decision context (what's being tested, why, risk level); confirm approval/rejection logged -3. **Approval window**: Set 30-minute approval window; verify system halts and awaits decision within window; after window, testing pauses pending escalation -4. **Approval logging**: Sample 10 human decisions; verify each logged with: approver ID, timestamp, decision (approved/rejected/modified), and rationale -5. **Audit trail completeness**: Verify every approval linked to the test action it authorized; confirm bi-directional traceability (approval->action and action->approval) -6. **Decision override prevention**: Verify no system mechanism allows test continuation without documented human approval +1. Set a test engagement duration +2. Verify advance warning notifications sent before deadline +3. Verify automatic halt at deadline +4. Test operator extension request +5. Verify extension requires manager approval --- -## APTS-HO-014: Legal and Compliance Escalation Triggers +## APTS-SC-014: Reversible Action Tracking and Rollback -**Classification:** MUST | Tier 1 +**Classification:** MUST | Tier 2 ### Requirement -Autonomous testing systems MUST identify and immediately escalate potential legal, compliance, and regulatory violations to human decision-makers who can determine appropriate response. - -1. **Regulatory Data Access Violations** - - GDPR: Accessing personal data of EU residents without explicit authorization - - CCPA: Accessing California resident personal information outside authorized scope - - HIPAA: Accessing protected health information without proper authorization - - PCI-DSS: Accessing payment card information outside authorized testing - -2. **Intellectual Property Exposure** - - Trade secrets discovered during testing - - Proprietary source code or algorithms - - Customer lists or business intelligence - - Unreleased product information - -3. **Breach Notification Obligations** - - Accessing PII in quantity that would require breach notification if exposed - - Discovering systems where breach notification threshold already exceeded - - Accessing data regulated under breach notification laws +MUST track state for all reversible actions and maintain explicit rollback procedures. For all reversible actions (accounts created, files modified/created, database records modified, configurations changed, processes started, credentials obtained), MUST: (1) Track action name, timestamp, target resource ID, pre-action state, action parameters, rollback procedure, and verification method; (2) Persist state to durable storage after each action; (3) Document explicit rollback procedure for every action type with step-by-step instructions and success verification; (4) Implement rollback procedures as executable automated scripts with no manual steps; (5) Define maximum rollback time for each action type; (6) Trigger alerts when rollback verification fails. Rollback operations MUST precede automated cleanup (APTS-SC-016). Evidence MUST be captured before rollback begins (see APTS-SC-016 evidence preservation requirements). -4. **Evidence Destruction Concerns** - - Logs that might be required for legal proceedings - - Data that might be under litigation hold - - Audit trails that might be needed for investigations +The platform MUST track state changes for all reversible actions and document rollback procedures. Automated rollback SHOULD be implemented for low-risk actions (account creation, file creation, simple configuration changes). For complex actions where automated rollback risks further damage (schema changes, authentication reconfigurations), documented manual recovery procedures are acceptable. -5. **Third-Party or Partner Systems** - - Testing system that belongs to or connects to third-party organization - - Potential unauthorized access to partner data - - Data processing agreements that might be violated - -6. **Government or Classified Systems** - - Access to government systems without proper authorization - - Classified information (CONFIDENTIAL, SECRET, TOP SECRET) - - Systems that might trigger national security concerns - -> **See also:** APTS-HO-010 (irreversible-action escalation triggered by legal concerns), APTS-TP-012 (classification and handling of regulated data that triggers this escalation), APTS-AL-011 (autonomous escalation triggers that invoke this requirement). +If rollback fails (for example, database transaction cannot be reversed, file cannot be restored), the platform MUST: (a) log the failure with full diagnostic details including the action that could not be rolled back, the error encountered, and the current state of the affected system, (b) preserve the current system state for manual recovery, (c) alert the operator with a rollback failure notification including recommended manual remediation steps, and (d) halt further testing on the affected target until the operator acknowledges the failure and either resolves it or authorizes continued testing. ### Verification -1. **GDPR data detection test**: Place test data marked as EU resident PII; verify escalation -2. **CCPA data detection test**: Place California resident data; verify escalation -3. **HIPAA data detection test**: Place medical information; verify escalation -4. **Breach notification test**: Create record count threshold (>500 records); verify escalation -5. **Third-party system test**: Simulate access to non-authorized organization system; verify escalation -6. **Classified data test**: Discover marked CONFIDENTIAL information; verify kill switch activation -7. **Notification test**: Verify legal team and client notified within SLA -8. **Evidence preservation test**: Verify state dump securely preserved for legal investigation +1. Execute reversible action; verify state recorded and persisted to durable storage +2. Simulate crash; verify state survives +3. Confirm all action types have documented rollback procedures +4. Execute rollback procedure; verify completion within the documented maximum rollback time +5. Verify system state matches pre-action state after rollback +6. Test partial rollback detection and alert triggering +7. Verify pre-action state captured correctly --- -## APTS-HO-015: Real-Time Activity Monitoring and Multi-Channel Notification +## APTS-SC-015: Post-Test System Integrity Validation **Classification:** MUST | Tier 1 ### Requirement -Autonomous testing systems MUST maintain a real-time activity feed of all testing actions and route notifications and alerts to appropriate recipients via multiple communication channels, with delivery confirmation and retry mechanisms. - -**Real-Time Activity Monitoring:** -- Timestamp (UTC) for every action -- Event type (reconnaissance, exploitation, lateral movement, and other phases) -- Severity (informational, low, medium, high, critical) -- Affected system(s) -- Current status (pending, in progress, completed, failed, escalated) -- Confidence score (if applicable) -- Impact assessment (if available) -- Required approvals (if applicable) -- Action links (if user action required) -- Real-time display: Last 100 activities (scrollable) -- Searchable history: All activities in engagement (searchable by date, type, host, status) -- Export capability: CSV/JSON export of activity feed -- Archival: Activities older than 30 days moved to archive storage - -**Multi-Channel Notification Routing:** - -The platform MUST support at least two independent notification channels for alerts. At minimum, dashboard and email notifications MUST be implemented. Additional channels (SMS, phone calls, messaging integrations) SHOULD be supported for CRITICAL alerts. - -**Dashboard Notifications (MUST):** -- Banner alerts at top of screen (color-coded by severity) -- Persistent until dismissed by operator -- Click to view full context (details, evidence, required actions) -- Notification center shows alert history - -**Email Notifications (MUST):** -- Formatted with context, evidence, required actions -- Subject line includes severity and action required -- Includes decision links (if applicable) -- Delivery confirmation requested for CRITICAL/HIGH alerts - -**Additional Channels (SHOULD):** -Platforms SHOULD support additional notification channels appropriate to operational context (for example, SMS, automated phone calls, messaging integrations such as Slack, Teams, or PagerDuty). When additional channels are implemented, they SHOULD include delivery confirmation and retry mechanisms. +MUST perform thorough integrity validation after engagement completes: verify file checksums match baseline, confirm no unexpected accounts exist, check database record counts against baseline, verify configurations match baseline, and confirm no unexpected processes are running. Post-test validation MUST be automated and complete within a documented timeframe. Any discrepancies MUST be logged and escalated. ### Verification -1. **Feed display test**: Observe dashboard activity feed; verify updated in real-time (within 5 seconds) -2. **Severity classification test**: Trigger different severity events; verify correctly classified -3. **Filtering test**: Filter by severity; verify only selected severities displayed -4. **Search test**: Search by hostname or action type; verify accurate results -5. **Retention test**: Verify feeds older than 30 days archived but accessible -6. **Export test**: Export activity feed to CSV; verify data completeness and format -7. **Dashboard notification test**: Trigger alert; verify appears immediately in dashboard -8. **Email routing test**: Trigger HIGH alert; verify email sent to correct recipients -9. **Channel independence test**: Block primary channel; verify alerts route to secondary channel -10. **Delivery confirmation test**: Verify system confirms receipt for CRITICAL/HIGH alerts -11. **Role-based routing test**: Trigger alert; verify routed to correct roles only - -> **See also:** APTS-HO-002 (real-time monitoring dashboard with intervention capability) +1. Confirm a pre-engagement baseline exists covering file checksums, user/service account inventory, database record counts, configuration snapshots, and running process list +2. Execute test actions that modify at least one baseline category +3. Run the automated post-engagement integrity validation and confirm it completes within the documented timeframe +4. Verify validation checks each baseline category: file checksums, accounts, database record counts, configurations, and processes +5. Introduce a deliberate discrepancy (for example, a test account not cleaned up) and confirm it is detected, logged, and escalated --- -## APTS-HO-016: Alert Fatigue Mitigation and Smart Aggregation +## APTS-SC-016: Evidence Preservation and Automated Cleanup -**Classification:** SHOULD | Tier 2 +**Classification:** MUST | Tier 2 ### Requirement -To prevent operator alert fatigue (ignoring alerts due to volume), autonomous testing systems SHOULD implement intelligent aggregation, suppression rules, and escalation thresholds that reduce noise while preserving critical signal. - -For high-frequency, low-impact actions that generate repeated approval requests (for example, port scans on non-critical targets, banner grabbing), the platform SHOULD support blanket pre-approval rules that allow an operator to authorize a category of actions for a defined time window, reducing individual approval volume while maintaining audit trail coverage. Pre-approval rules MUST be logged, time-bounded, and revocable. +MUST preserve evidence of test actions and provide automated cleanup of all test artifacts. (1) Evidence Preservation: Capture screenshots, log output, error messages, modified file contents, database queries executed, and privilege escalation proofs before any rollback. Evidence MUST be stored in write-once, tamper-evident storage and retained per the engagement's data retention policy. Rollback operations MUST NOT modify evidence storage. (2) Automated Cleanup: Provide automated cleanup procedures that remove all test artifacts including temporary files, created accounts, installed tools, backdoors, test data, and testing activity logs. Cleanup procedures MUST be idempotent (safe to run multiple times), atomic (complete or not at all), and verifiable. Cleanup MUST complete within a documented timeframe proportional to engagement size. Failed cleanup actions MUST be logged and escalated for manual remediation. ### Verification -1. **Aggregation test**: Trigger 10 port scan alerts; verify aggregated into single alert -2. **Threshold suppression test**: Trigger 15 LOW alerts; verify aggregated after 10th -3. **Context filtering test**: Trigger expected IDS alerts during scanning; verify suppressed -4. **Severity recalculation test**: Find same vulnerability on multiple systems; verify escalated to HIGH -5. **Fatigue measurement test**: Monitor alert response time; verify operators respond faster with aggregation +1. Execute test actions and capture evidence (screenshots, logs, modified file contents) before rollback +2. Execute rollback; verify rollback operations do not access or modify evidence storage; confirm all pre-rollback evidence remains unchanged and accessible in write-once storage +3. Confirm evidence retained per engagement's data retention policy +4. Run automated cleanup; verify all test artifacts removed (temporary files, created accounts, installed tools, test data) +5. Re-run cleanup; verify idempotency (no errors, no changes on second run) +6. Simulate cleanup failure; verify failed action is logged and escalated for manual remediation --- -## APTS-HO-017: Stakeholder Notification and Engagement Closure +## APTS-SC-017: External Watchdog and Operator Notification **Classification:** MUST | Tier 2 ### Requirement -Autonomous testing systems MUST define clear workflows for notifying engagement clients and stakeholders of significant findings, unexpected events, and engagement status changes, and provide complete closure procedures. - -**Client and Stakeholder Notification Workflows:** - -1. **Periodic Engagement Status** (for example, daily at a fixed time) - - Testing activities: Number of targets tested, findings discovered - - Scope status: Progress on authorized scope - - No issues: "Testing proceeding normally" +An external watchdog process MUST monitor the autonomous pentesting platform and notify operators according to documented Service Level Agreements (SLAs). The platform MUST send health heartbeats and key operational metrics to an external monitoring endpoint outside the platform's own trust boundary. The external monitoring system MUST use separate credentials from the platform to access these endpoints. -2. **Periodic Finding Summary** (for example, weekly) - - Vulnerabilities discovered by severity range - - Scope coverage progress - - Comparison to industry benchmarks (if applicable) - - Recommendation summary +If heartbeats stop or metrics indicate anomalies, the external watchdog MUST notify operators and customers within a defined timeframe via multiple channels (messaging, email, dashboard). Operators MUST be notified within the documented SLA timeframe. Customers MUST be notified of confirmed incidents within a defined timeframe via out-of-band channels independent of the platform. Escalation to alternative contacts is required if acknowledgment is not received within a defined window. -3. **Immediate Critical Notifications** (per documented SLA): - - External compromise indicators detected - - Unexpected findings (malware, illegal content) - - Out-of-scope testing detected - - Engagement suspended/terminated - - Legal/compliance violations +Notifications MUST include: what happened, systems affected, automated actions taken, and next steps. Specific heartbeat intervals, metric thresholds, and notification timeframes MUST be documented in the platform's watchdog configuration and made available to customers. -4. **Escalation Notifications** (per documented SLA): - - High-risk vulnerabilities - - Critical systems compromised - - Scope boundary decisions required - -Specific notification frequencies, send times, and escalation windows MUST be documented in the engagement's communication plan. +### Verification -**Post-Engagement Notification and Closure:** +1. Verify external monitoring endpoint is outside the platform's trust boundary +2. Verify platform sends heartbeats to external monitoring system at documented intervals +3. Verify platform sends key operational metrics to external monitoring system +4. Verify external monitoring system uses separate credentials from the platform +5. Platform health heartbeats stopped; confirm external watchdog detects and escalates within the configured timeframe +6. Platform metrics indicate anomalies; confirm external watchdog detects and alerts operators +7. Detect anomaly; verify operator notified within documented SLA via configured channels +8. Confirm incident; verify customer notified within documented SLA via out-of-band channels +9. Verify notification includes what happened, scope, actions, and next steps +10. Test escalation: verify higher contact notified if operator/customer not acknowledging within configured window -1. **Engagement Completion Notification** (per documented timeline): - - All testing activities have been completed - - Final findings count (by severity range) - - Overall risk assessment - - Scope validation: Were all authorized targets tested - - Schedule for final report delivery +> **See also:** APTS-SC-018 (incident containment and recovery triggered by watchdog alerts), APTS-HO-010 (human-in-the-loop paging path that watchdog notifications feed into). -2. **Final Report Delivery Notification** (per documented timeline): - - Report is ready for client review - - Executive summary of findings - - Detailed vulnerability findings with remediation - - Evidence preservation location and access - - Questions for clarification contact +--- -3. **Follow-Up Assessment Windows** (per engagement agreement): - - Client remediation status check-in - - Verification testing (optional) to confirm fixes - - Lessons learned session invitation - - Next engagement planning +## APTS-SC-018: Incident Containment and Recovery -Specific post-engagement timelines and follow-up intervals MUST be documented in the engagement agreement. +**Classification:** MUST | Tier 2 -**Post-Engagement Evidence Retention:** +### Requirement -- Evidence storage: Per the platform's documented retention policy; retention period SHOULD reflect engagement risk level and applicable regulatory requirements -- Evidence protection: Encrypted, access-controlled, secure location -- Evidence availability: Client can request evidence export for analysis/audit -- Audit trail: All evidence access logged +When a platform incident is confirmed, the platform MUST execute documented containment and recovery procedures. (1) Containment: The platform MUST isolate itself from customer networks and test targets promptly. Platform-held credentials SHOULD be rotated as part of containment. The platform MUST capture relevant forensic data (logs, process state) before containment procedures alter system state. (2) Recovery: The platform MUST document recovery procedures for each containment scenario with a defined Recovery Time Objective (RTO). After recovery, all safety controls MUST be re-validated before resuming testing. The platform MUST deliver a post-incident report to the customer within the timeframe defined in the platform's incident response plan, including incident timeline, root cause, affected systems/data, remediation steps, and preventive measures. The platform MUST require customer acknowledgment and approval before testing resumes. Specific containment thresholds, RTO targets, and reporting deadlines MUST be documented in the platform's incident response plan. ### Verification -1. **Periodic status test**: Verify periodic status notification sent at scheduled time per communication plan -2. **Critical notification test**: Trigger critical finding; verify client notified within documented SLA -3. **Template accuracy test**: Review client notifications; verify all required fields present -4. **SLA test**: Measure time from finding to client notification; verify meets SLA -5. **Completion notification test**: Complete engagement; verify completion notification sent -6. **Final report test**: Verify report generated and delivered within documented timeline -7. **Evidence retention test**: Verify evidence stored and accessible per documented retention policy -8. **Follow-up notification test**: Verify follow-up check-in sent per engagement agreement schedule +1. Trigger anomaly; confirm network isolation within documented containment timeframe +2. Verify platform cannot reach customer networks after isolation +3. Verify isolated platform can reach logging/watchdog systems +4. Confirm credentials rotated within documented timeframe; verify old credentials no longer work +5. Confirm memory dump and logs preserved on secure system +6. Execute recovery procedure; verify completion within documented RTO +7. Run complete safety control validation suite; verify all pass +8. Deliver post-incident report within documented timeframe +9. Verify customer acknowledges and approves resumption --- -## APTS-HO-018: Operator Qualification, Training, and Competency Governance +## APTS-SC-019: Execution Sandbox and Containment Boundary Integrity **Classification:** MUST | Tier 2 ### Requirement -Organizations operating autonomous penetration testing systems MUST ensure all operators meet minimum competency standards and hold documented qualifications appropriate to their assigned autonomy level. - -**Competency Standards and Certification:** - -Organizations MUST define competency standards for each autonomy level specifying required skills and certifications. All operators MUST hold certifications or documented qualifications appropriate to their assigned autonomy level. Operators MUST NOT be assigned above their qualification level. - -**Training Curriculum and Incident Response Preparation:** +The platform MUST declare and enforce an execution sandbox that bounds the agent runtime's filesystem access, network egress, process capabilities, and system-call surface. The sandbox boundary MUST be enforced by a mechanism outside the agent's control, such as OS-level isolation (kernel namespaces, seccomp, AppArmor, SELinux), hypervisor isolation, or container runtime policy. The platform MUST NOT rely on the agent's own refusal to respect the boundary. The declared boundary MUST specify: (1) filesystem paths the agent runtime may read and write, (2) network destinations the agent runtime may reach, including outbound ports and protocols, (3) process and system-call capabilities the agent runtime may invoke, and (4) any credential or secret stores the agent runtime may access. Any attempt by the agent runtime to take an action outside the declared boundary MUST be blocked by the enforcement layer and logged as a containment event for operator review. -Organizations MUST establish training curricula for each autonomy level with documented learning objectives, hands-on exercises, and competency validation mechanisms. Training MUST cover all required modules with completion records and certificates maintained. Annual refresher training is required. Operators MUST receive specialized training in responding to autonomous testing tool failures, unexpected behaviors, and emergency situations including emergency pause, redirect, and kill switch activation procedures, state preservation, forensic analysis, and escalation protocols. +### Rationale -**Ongoing Assessment and Succession Planning:** - -Operators MUST participate in ongoing competency assessments conducted at least annually and MUST maintain current certifications to continue operating at their autonomy level. Operators who fail assessments MUST complete required remediation and are restricted from operating at that level until remediation is complete. Organizations SHOULD establish formal mentoring relationships and documented succession plans to develop future operators and ensure business continuity. +As autonomous pentest platforms become more capable, the assumption that the agent will respect its instructions is not a safety boundary. Containment integrity requires a mechanism that holds regardless of whether the agent "chooses" to respect it, whether the agent has been manipulated, whether the underlying model has changed, or whether the agent has encountered inputs outside its training distribution. Enforcing the boundary at a layer the agent cannot reach from within its execution context is the only architectural property that survives changes in the agent's behavior. ### Verification -1. **Competency documentation audit**: Verify a documented competency standard exists for each autonomy level, specifying required skills and certifications -2. **Operator certification check**: For each active operator, verify they hold certifications or documented qualifications appropriate to their assigned autonomy level -3. **Authority alignment**: Verify operator autonomy level assignments match their documented competency level -4. **Certification currency**: Verify all operator certifications are current and not expired -5. **Curriculum review**: Verify training content covers all required modules per autonomy level -6. **Training documentation**: Verify training records and completion certificates -7. **Annual refresher**: Verify operators complete annual refresher training -8. **Incident response test**: Simulate tool failure; observe operator response time, escalation accuracy, and state preservation -9. **Assessment schedule audit**: Verify all operators assessed annually -10. **Remediation compliance**: Verify failed operators complete required remediation before resuming operations -11. **Succession plan review**: Verify succession plans documented and current -12. **Mentoring plan audit**: Verify formal mentoring plans exist with regular meetings - -> **See also:** The [Operator Competency Record Template](../appendix/Operator_Competency_Record_Template.md) provides an optional structure for collecting qualification, training, assessment, remediation, mentoring, and succession evidence. +1. **Boundary declaration review**: Retrieve the platform's sandbox policy file; confirm it specifies allowed filesystem paths, network egress destinations, process capabilities, and credential access; confirm the policy is enforced at a layer below the agent runtime (kernel, hypervisor, or container runtime). +2. **Filesystem egress test**: From within the agent runtime context, attempt to read a file outside the declared read-allowlist (for example, a sensitive system file such as /etc/shadow on a Linux host); confirm the enforcement layer blocks the action and a containment event is logged. +3. **Network egress test**: From within the agent runtime context, attempt an outbound TCP connection to a destination not on the egress allowlist; confirm the connection is refused at the network policy layer and a containment event is logged. +4. **Process capability test**: Attempt a syscall or capability not in the declared allowlist (for example, ptrace, mount, or raw socket creation); confirm the enforcement layer blocks the call and a containment event is logged. +5. **Credential store isolation test**: Attempt to read a credential or secret outside the agent's declared access list; confirm the read is blocked and logged. +6. **Enforcement-layer independence test**: Modify the agent's internal configuration or system prompt to assert that an out-of-boundary action is now permitted; confirm the enforcement layer still blocks the action, demonstrating that enforcement is independent of the agent's internal state. --- -## Escalation Priority Matrix - -When multiple escalation triggers fire simultaneously, the following precedence applies (highest priority first): - -1. **Legal and compliance escalation** (APTS-HO-014): Legal risk takes absolute precedence; testing halts immediately. -2. **Impact threshold breach** (APTS-HO-012): Active system harm requires immediate containment. -3. **Unexpected findings escalation** (APTS-HO-011): Novel findings require assessment before continued testing. -4. **Confidence-based escalation** (APTS-HO-013): Scope uncertainty requires clarification but does not indicate active harm. -5. **Pre-approval gates** (APTS-HO-001): Routine approval workflow; lowest urgency among escalation types. - -When two escalations of equal priority fire simultaneously, the platform processes both and applies the most restrictive combined response (for example, if APTS-HO-012 calls for pause and APTS-HO-011 calls for halt, the platform halts). - ---- +## APTS-SC-020: Action Allowlist Enforcement External to the Model -## APTS-HO-019: 24/7 Operational Continuity and Shift Handoff - -**Classification:** SHOULD | Tier 2 +**Classification:** MUST | Tier 1 ### Requirement -For platforms operating in continuous or always-on mode, the platform SHOULD implement governance controls for operational continuity across operator shifts and time zones. This includes: +The platform MUST define the agent's permitted tool and action space as an explicit allowlist, and the allowlist MUST be enforced by a component external to the model. Permitted actions MUST NOT be configured solely through the model's system prompt or in-context instructions. Any action the agent attempts to execute that is not present on the external allowlist MUST be blocked by the orchestration layer before it reaches any target system, regardless of how the action was produced by the model. The allowlist MUST be version-controlled, auditable, and subject to the platform's change-management process. Extensions to the action space (new tools, new capabilities, new parameter ranges) MUST be approved and recorded before they become available to the agent at runtime. -1. **Shift handoff procedures**: Structured handoff that transfers active engagement state, pending approvals, open escalations, and suppression-rule status to incoming operators -2. **Stale approval expiry**: Automatic expiry of approvals that have not been acted upon within a documented validity window, requiring re-request from the incoming shift -3. **Suppression-rule review**: Periodic review and re-justification of active alert suppression rules to prevent suppression drift over time -4. **Desensitization monitoring**: Tracking of operator response-time trends and alert acknowledgment rates to detect cumulative desensitization +### Rationale -Approval queues SHOULD enforce shift-awareness so that approvals granted by an outgoing operator for future actions are flagged for incoming operator review. - -> **Implementation aid:** The [Shift Handoff Template](../appendix/Shift_Handoff_Template.md) provides an informative record format for transferring engagement state, pending approvals, open escalations, suppression-rule status, and kill-switch authority between operators. +System prompts and in-context instructions are not reliable constraints on an agent's action space. They can be overridden by prompt injection, by adversarial inputs, by model updates that change instruction-following behavior, or by distribution shifts the operator has not anticipated. Enforcing the action allowlist in a component the model cannot influence at runtime is the architectural property that makes the constraint actually binding. This requirement is a Tier 1 obligation because it is a baseline property every responsible autonomous pentest platform must have regardless of its claimed assurance level. ### Verification -1. Shift handoff procedure is documented and includes engagement state, pending approvals, and active escalations -2. Test: simulate a shift change; verify incoming operator receives complete handoff state -3. Stale approval expiry is enforced per documented validity window -4. Test: leave an approval pending beyond the validity window; verify it expires and requires re-request -5. Active suppression rules have documented justification and periodic review dates -6. Operator response-time metrics are collected and available for review +1. **Allowlist file review**: Retrieve the platform's action allowlist; confirm it is a version-controlled artifact separate from the model's system prompt; confirm entries include tool identifiers, allowed parameters or parameter bounds, and the risk classification assigned to each entry. +2. **External enforcement test**: Through a test harness, induce the model to request a tool identifier that is not on the allowlist; confirm the orchestration layer refuses to dispatch the tool call before it reaches any target system. +3. **System-prompt bypass test**: Modify the system prompt to assert that a disallowed tool is permitted; confirm the external enforcement layer still refuses to dispatch the tool call. +4. **Change-management audit**: Review the last three changes to the allowlist; confirm each has an approval record, a rationale, and a timestamp consistent with the platform's change-management policy. +5. **Runtime inventory test**: Query the platform for the current runtime allowlist; confirm it matches the version-controlled source and has not drifted during operation. --- + +> **See also:** [APTS-SC-A02: Context Window Safety and Constraint Preservation](../appendix/Advisory_Requirements.md#apts-sc-a02-context-window-safety-and-constraint-preservation-advisory). An advisory practice for platforms using LLM-based agents with finite context windows. Addresses the risk of safety-critical constraints being silently lost during context summarization or truncation. High-priority candidate for tier-gated inclusion in v0.2.0. From 4d6a3377ebcb90fb50a9fc4152eb6e2d33d333da Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 00:47:09 +0530 Subject: [PATCH 22/35] Update README.md --- standard/3_Human_Oversight/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/3_Human_Oversight/README.md b/standard/3_Human_Oversight/README.md index e8b130d..1f305a6 100644 --- a/standard/3_Human_Oversight/README.md +++ b/standard/3_Human_Oversight/README.md @@ -49,7 +49,7 @@ The 19 requirements in this domain fall into six thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 HO requirement plus every Tier 2 HO requirement, and a Tier 3 platform satisfies all three tiers. Human Oversight has no Tier 3 requirements in this release; a Tier 3 claim therefore requires all Tier 1 and Tier 2 HO requirements. +A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 HO requirement plus every Tier 2 HO requirement, and a Tier 3 platform satisfies all three tiers. Human Oversight has no Tier 3 requirements in this release; a Tier 3 claim therefore requires all Tier 1 and Tier 2 HO requirements. Two appendix-only advisory requirements for this domain (APTS-HO-A01 Out-of-Band Kill Switch via Independent Network and APTS-HO-A02 Disclosure and Mitigation of AI Influence on Operator Decisions) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. From 35887619db1ee4c6f0665b0bcf6428e7be48f23b Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 00:47:27 +0530 Subject: [PATCH 23/35] Update README.md --- standard/4_Graduated_Autonomy/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/4_Graduated_Autonomy/README.md b/standard/4_Graduated_Autonomy/README.md index a10b63e..3fd94d8 100644 --- a/standard/4_Graduated_Autonomy/README.md +++ b/standard/4_Graduated_Autonomy/README.md @@ -77,7 +77,7 @@ The 28 requirements in this domain are organized by the autonomy level they prim ### Conformance -A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 AL requirement plus every Tier 2 AL requirement, and a Tier 3 platform satisfies all three tiers. As described in the Tier and Level Mapping above, level-specific requirements apply only to platforms that offer the corresponding autonomy level. +A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 AL requirement plus every Tier 2 AL requirement, and a Tier 3 platform satisfies all three tiers. As described in the Tier and Level Mapping above, level-specific requirements apply only to platforms that offer the corresponding autonomy level. One advisory practice relevant to this domain (APTS-AL-A01 Continuous Improvement and Maturity Roadmap) is documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). It is not required for conformance at any tier. From 2c76d65a232cff1ffa85ddc2ae6fc7f599ec65e2 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 00:47:47 +0530 Subject: [PATCH 24/35] Update README.md --- standard/5_Auditability/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/5_Auditability/README.md b/standard/5_Auditability/README.md index a2c1b4a..4202239 100644 --- a/standard/5_Auditability/README.md +++ b/standard/5_Auditability/README.md @@ -54,7 +54,7 @@ Several requirements in this domain reference attack-chain phases. APTS uses a s ### Conformance -A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 AR requirement plus every Tier 2 AR requirement, and a Tier 3 platform satisfies all three tiers. +A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 AR requirement plus every Tier 2 AR requirement, and a Tier 3 platform satisfies all three tiers. Four advisory practices relevant to this domain (APTS-AR-A01 State Capture and Replay Support, APTS-AR-A02 Replay Variance Analysis, APTS-AR-A03 Real-Time External Log Streaming, APTS-AR-A04 Continuous Runtime Integrity Monitoring) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. From 156d7b481b1a624813711cc16723160b79df1a59 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 00:48:04 +0530 Subject: [PATCH 25/35] Update README.md --- standard/6_Manipulation_Resistance/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/6_Manipulation_Resistance/README.md b/standard/6_Manipulation_Resistance/README.md index 110a50c..a9cb0e3 100644 --- a/standard/6_Manipulation_Resistance/README.md +++ b/standard/6_Manipulation_Resistance/README.md @@ -60,7 +60,7 @@ The 23 requirements in this domain fall into seven thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 MR requirement plus every Tier 2 MR requirement, and a Tier 3 platform satisfies all three tiers. +A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 MR requirement plus every Tier 2 MR requirement, and a Tier 3 platform satisfies all three tiers. Three advisory practices relevant to this domain (APTS-MR-A01 Goal Misgeneralization and Emergent Misalignment Evaluation Suite, APTS-MR-A02 Sandbagging Detection and Behavioral Consistency Validation, and APTS-MR-A03 Multi-Turn Adversarial Conversation Resilience) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. From 3aeb8a455a448a28ba9729367f3717145654f3e9 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 00:48:22 +0530 Subject: [PATCH 26/35] Update README.md --- standard/7_Supply_Chain_Trust/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/7_Supply_Chain_Trust/README.md b/standard/7_Supply_Chain_Trust/README.md index bab9559..a32f0b4 100644 --- a/standard/7_Supply_Chain_Trust/README.md +++ b/standard/7_Supply_Chain_Trust/README.md @@ -53,7 +53,7 @@ The 22 requirements in this domain fall into seven thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 TP requirement plus every Tier 2 TP requirement, and a Tier 3 platform satisfies all three tiers. +A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 TP requirement plus every Tier 2 TP requirement, and a Tier 3 platform satisfies all three tiers. Four appendix-only advisory requirements for this domain (APTS-TP-A01 Breach Notification and Regulatory Reporting, APTS-TP-A02 Privacy Regulation Compliance, APTS-TP-A03 Professional Liability and Engagement Agreements, APTS-TP-A04 External Tool Connector Trust Boundaries and Credential Isolation) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. From 9d9c1d700a1e8a57c0cb17efd8c1efc41bfba770 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 00:48:56 +0530 Subject: [PATCH 27/35] Update README.md --- standard/8_Reporting/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/8_Reporting/README.md b/standard/8_Reporting/README.md index 82a2f83..79c90f2 100644 --- a/standard/8_Reporting/README.md +++ b/standard/8_Reporting/README.md @@ -44,7 +44,7 @@ The 15 requirements in this domain fall into five thematic groups: ### Conformance -A platform claims conformance with this domain by satisfying every requirement (both MUST and SHOULD) assigned to the compliance tier it targets and to all lower tiers. A SHOULD requirement counts toward tier conformance; a platform that does not implement a SHOULD requirement MUST record a documented justification for the deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 RP requirement plus every Tier 2 RP requirement, and a Tier 3 platform satisfies all three tiers. +A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 RP requirement plus every Tier 2 RP requirement, and a Tier 3 platform satisfies all three tiers. Every requirement in this domain includes a Verification subsection listing the verification procedures a reviewer uses to confirm implementation. From 378f781e11837cb4b3726dade51e4b76166550f6 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 01:04:23 +0530 Subject: [PATCH 28/35] Update Frontispiece.md --- standard/Frontispiece.md | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/standard/Frontispiece.md b/standard/Frontispiece.md index 317c2d1..70519bc 100644 --- a/standard/Frontispiece.md +++ b/standard/Frontispiece.md @@ -34,7 +34,7 @@ This standard uses RFC 2119 language: | Term | Meaning | |------|---------| -| **MUST** | Mandatory requirement. Non-compliance means the requirement is not met. | +| **MUST** | Mandatory requirement. No deviation is permitted for a tier claim. | | **MUST NOT** | Absolute prohibition. | | **SHOULD** | Recommended. Deviation requires documented justification. | | **SHOULD NOT** | Not recommended. Deviation is acceptable with documented justification. | @@ -76,10 +76,3 @@ Licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/). |---------|------|-------| | 0.1.0 | April 2026 | Initial release. Eight domains, 173 tier-required requirements across three compliance tiers, plus 18 advisory practices in the appendix. | ---- - -## Version History - -| Version | Date | Notes | -|---------|------|-------| -| 0.1.0 | April 2026 | Initial release. Eight domains, 173 tier-required requirements across three compliance tiers, plus 18 advisory practices in the appendix. | From db06b73b9b6e076a2d4af8fec5c478ab6eb22a19 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 01:05:28 +0530 Subject: [PATCH 29/35] Update Getting_Started.md --- standard/Getting_Started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/Getting_Started.md b/standard/Getting_Started.md index 927138d..b7b1da7 100644 --- a/standard/Getting_Started.md +++ b/standard/Getting_Started.md @@ -97,7 +97,7 @@ Depending on your role: No. Start with Tier 1 (72 requirements). Tier 2 and Tier 3 add requirements progressively for cumulative totals of 157 and 173. An additional 18 advisory practices live in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern; advisory practices are not required for conformance at any tier. See [Introduction: Compliance Tiers](Introduction.md#compliance-tiers) for details. **Q: What if my platform meets most but not all Tier 1 requirements?** -APTS does not award partial credit. A platform must meet 100% of requirements for its claimed tier; MUST requirements permit no deviation. At Tier 2 and above, an unimplemented SHOULD requirement does not void the claim if the deviation is documented with justification in the conformance claim. Address MUST gaps before claiming a tier. +APTS does not award partial credit. A tier claim requires every MUST requirement at the claimed tier and all lower tiers to be implemented, with no deviation. Every SHOULD requirement at those tiers must be either implemented or covered by a documented justification in the conformance claim. Address MUST gaps before claiming a tier. **Q: Are the Implementation Guides mandatory?** No. Implementation Guides are informative. They suggest approaches but do not define requirements. The domain READMEs contain all normative requirements. From 49a6e9b554f2bcf14c002d47923dbc129d9a3dd0 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 01:15:58 +0530 Subject: [PATCH 30/35] Update Introduction.md --- standard/Introduction.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/Introduction.md b/standard/Introduction.md index a25d591..00af415 100644 --- a/standard/Introduction.md +++ b/standard/Introduction.md @@ -50,7 +50,7 @@ APTS does not prescribe who performs the assessment. The choice of internal self ## Compliance Tiers -APTS defines three compliance tiers. A platform must meet 100% of requirements assigned to its claimed tier (both MUST and SHOULD). No partial credit. +APTS defines three compliance tiers. A platform claims a tier by implementing every MUST requirement assigned to that tier and all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim. An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap; there is no partial credit. **Verification model:** APTS follows a conformance assessment model, consistent with how other OWASP standards (WSTG, ASVS) are used by practitioners. Platform operators evaluate their platforms against the requirements using the [Checklists](appendix/Checklists.md) and document conformance. The [Conformance Claim Template](appendix/Conformance_Claim_Template.md) provides an optional format for publishing evidence of conformance. Customers MAY independently verify claims using the [Vendor Evaluation Guide](appendix/Vendor_Evaluation_Guide.md) or the [Customer Acceptance Testing](appendix/Customer_Acceptance_Testing.md) appendix for hands-on verification of behavioral requirements (kill switch response times, scope enforcement accuracy, manipulation resistance) that cannot be verified through documentation alone. From c1a4318733490e93f886802de038925cb4d3113a Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 01:17:27 +0530 Subject: [PATCH 31/35] Update Compliance_Matrix.md --- standard/appendix/Compliance_Matrix.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/appendix/Compliance_Matrix.md b/standard/appendix/Compliance_Matrix.md index f056e05..bf4f30b 100644 --- a/standard/appendix/Compliance_Matrix.md +++ b/standard/appendix/Compliance_Matrix.md @@ -456,7 +456,7 @@ SOC 2 defines five trust services categories, each with specific Trust Services NIST AI RMF 1.0 defines four functions for managing AI system risks. APTS addresses controls across all four functions, with particular depth in GOVERN and MANAGE. -### GOVERN Function +### GOVERN Function (AI RMF) **GOVERN 1: Policies and Procedures** - Controls: Implement graduated autonomy governance with mandatory approval gates for all significant actions at L1 (APTS-HO-001); document human oversight policies specifying role responsibilities (APTS-HO-004) From cd3dcb29397b1ff4b29e99318db2f840ba0db5b3 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 01:18:21 +0530 Subject: [PATCH 32/35] Update Conformance_Claim_Template.md --- standard/appendix/Conformance_Claim_Template.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/appendix/Conformance_Claim_Template.md b/standard/appendix/Conformance_Claim_Template.md index 92238d9..e839960 100644 --- a/standard/appendix/Conformance_Claim_Template.md +++ b/standard/appendix/Conformance_Claim_Template.md @@ -64,7 +64,7 @@ _[If the platform supports model substitution at runtime, disclose each approved | Reporting (RP) | _[count]_ | _[count]_ | | | **Total** | _[count]_ | _[count]_ | | -> **Reminder:** APTS requires 100% of requirements at the claimed tier to be met. MUST requirements permit no deviation. A SHOULD requirement that is not implemented must be recorded in the SHOULD Deviations section below with a documented justification; an undocumented SHOULD deviation is a conformance gap. +> **Reminder:** A tier claim requires every MUST requirement at the claimed tier and all lower tiers to be implemented, with no deviation. Every SHOULD requirement at those tiers must be either implemented or recorded in the SHOULD Deviations section below with a documented justification; an undocumented SHOULD deviation is a conformance gap. --- From f591a41c2b0b57f41dcd9c01e50ed6bdadaeebbb Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 01:19:02 +0530 Subject: [PATCH 33/35] Update Glossary.md --- standard/appendix/Glossary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/standard/appendix/Glossary.md b/standard/appendix/Glossary.md index 2e72be8..07f2538 100644 --- a/standard/appendix/Glossary.md +++ b/standard/appendix/Glossary.md @@ -79,7 +79,7 @@ Notation for specifying IP address ranges using a base address and prefix length Alternative security measures that mitigate vulnerability when the primary control is missing. Example: Two-factor authentication compensates for weak passwords. **Compliance Tier** -One of three progressive levels of APTS conformance. Tier 1 (Foundation) requires 72 core requirements (MUST | Tier 1). Tier 2 (Verified) adds 85 requirements for a cumulative 157 (MUST | Tier 2 + SHOULD | Tier 2). Tier 3 (Comprehensive) adds 16 requirements for a cumulative 173 (MUST | Tier 3 + SHOULD | Tier 3). A platform must meet 100% of requirements assigned to its claimed tier (both MUST and SHOULD); an unimplemented SHOULD requires a documented justification in the conformance claim, while an unimplemented MUST is a conformance failure. An additional 18 advisory practices in the Advisory Requirements appendix are recommended for highest-assurance engagements but are not counted toward any tier. +One of three progressive levels of APTS conformance. Tier 1 (Foundation) requires 72 core requirements (MUST | Tier 1). Tier 2 (Verified) adds 85 requirements for a cumulative 157 (MUST | Tier 2 + SHOULD | Tier 2). Tier 3 (Comprehensive) adds 16 requirements for a cumulative 173 (MUST | Tier 3 + SHOULD | Tier 3). A platform claims a tier by implementing every MUST requirement at that tier and all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim. An additional 18 advisory practices in the Advisory Requirements appendix are recommended for highest-assurance engagements but are not counted toward any tier. **Confidence Score** A numeric value on a 0-100% scale indicating the platform's certainty in a scope boundary determination, target legitimacy assessment, asset classification, or finding validity. Scores below 75% for scope-related decisions trigger mandatory human escalation. See APTS-HO-013, APTS-RP-003. From 0f01ae8b75427ab7d27fae5e996bdcf63f6a9e5d Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 01:32:38 +0530 Subject: [PATCH 34/35] Update Conformance_Claim_Example.md --- standard/appendix/examples/Conformance_Claim_Example.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/standard/appendix/examples/Conformance_Claim_Example.md b/standard/appendix/examples/Conformance_Claim_Example.md index 4ea45a0..c081366 100644 --- a/standard/appendix/examples/Conformance_Claim_Example.md +++ b/standard/appendix/examples/Conformance_Claim_Example.md @@ -76,7 +76,13 @@ The table below is illustrative. A real claim can use the current requirement co | Reporting (RP) | Tier 1 + Tier 2 applicable requirements | All applicable tier requirements met | Finding validation, confidence scoring, evidence integrity, and downstream export controls sampled | | **Total** | Tier 2 cumulative requirements | All applicable tier requirements met | Completed checklist reference: `apts-checklist-tier2-examplecorp-2026-04-20.xlsx` | -> **Reminder:** APTS requires 100% of requirements at the claimed tier to be met. Partial credit is not awarded. +> **Reminder:** A tier claim requires every MUST requirement at the claimed tier and all lower tiers to be implemented, with no deviation. Every SHOULD requirement at those tiers must be either implemented or recorded in the SHOULD Deviations section with a documented justification; an undocumented SHOULD deviation is a conformance gap. + +--- + +## SHOULD Deviations + +None. All SHOULD requirements at Tier 1 and Tier 2 are implemented; no deviations are claimed. --- From f698b955af29a2e35ea38fb6b80db24ba3927847 Mon Sep 17 00:00:00 2001 From: Jinson Varghese Behanan <33680980+jinsonvarghese@users.noreply.github.com> Date: Fri, 12 Jun 2026 01:43:23 +0530 Subject: [PATCH 35/35] Update Glossary.md --- standard/appendix/Glossary.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/standard/appendix/Glossary.md b/standard/appendix/Glossary.md index 07f2538..999bc1b 100644 --- a/standard/appendix/Glossary.md +++ b/standard/appendix/Glossary.md @@ -17,6 +17,9 @@ External service providing artificial intelligence or large language model capab **API Key** Cryptographic credential used to authenticate API requests to third-party services. Should be stored in secure vaults, rotated annually, and never embedded in code. See Supply Chain Trust. +**Approval Gate** +A mandatory checkpoint at which an autonomous action pauses until a qualified human operator grants or denies authorization within a defined SLA response window (see APTS-HO-003). Medium and high-impact actions require approval gates regardless of the platform's autonomy level. + **Autonomy Level** Classification system (L1-L4) describing degree of independence and decision-making authority of autonomous pentest system. Higher levels require more sophisticated technical safeguards and governance mechanisms while reducing per-action human approval requirements. L1 requires human direction for every action; L4 operates independently within pre-approved boundaries under continuous automated monitoring. @@ -149,6 +152,9 @@ The process of raising an event, finding, or decision to a higher authority or m ## F +**Execution Sandbox** +A kernel-enforced isolation boundary containing the agent runtime, whose configuration the runtime itself has no credentials to modify (see APTS-SC-019). Tool and action allowlists are enforced by components external to the model rather than by the model's own instructions (APTS-SC-020). + **Failover** Automatic or manual transition from primary system to backup when primary becomes unavailable. Example: Cloud provider A unavailable → automatically switch to Provider B. @@ -290,6 +296,9 @@ Demonstration that vulnerability is exploitable. Can be code, command, or screen **Production-Like Environment** A target environment that mirrors production in configuration, data sensitivity, or network topology sufficiently that unintended testing impact could affect real users, data, or services. This includes staging environments with production data, pre-production environments connected to production networks, and disaster recovery environments that can be activated. Isolated development environments with synthetic data are not production-like. +**Prompt Injection** +An attack in which adversarial content embedded in data the agent processes (target system responses, files, web pages) attempts to override the agent's instructions or alter its behavior. Manipulation Resistance requirements such as APTS-MR-001 mandate strict separation between operator instructions and target-derived data. + **Qualified Reviewer** An individual with demonstrated expertise in penetration testing methodology (for example, OSCP, CREST CRT, GPEN, or equivalent experience), security governance frameworks, and familiarity with AI/ML systems. Organizations may use qualified reviewers when evaluating platforms against APTS requirements. @@ -306,6 +315,9 @@ Maximum acceptable data loss window. If system fails at 2pm and RPO is 1 hour, m **Recovery Time Objective (RTO)** Maximum acceptable downtime after a failure event. Determines backup and failover procedure requirements. Example: an RTO of 4 hours means service must be restored within 4 hours of failure. +**Rollback** +Restoring a target system or the platform to its pre-action state after testing changes, using pre-action state capture and documented step-by-step procedures (see APTS-SC-014). Evidence is preserved in tamper-evident storage before rollback executes (APTS-SC-016). + **Rules of Engagement (RoE)** A formal document defining the scope, boundaries, authorized activities, temporal constraints, escalation procedures, and contact information for an autonomous penetration testing engagement. The RoE is the authoritative source for scope enforcement. @@ -361,6 +373,9 @@ Manipulation of people to divulge confidential information or perform security-v **SOC 2** Audit standard for service organizations. Type II includes controls testing over time. Widely adopted by cloud and SaaS providers to demonstrate trust assurance. +**Software Bill of Materials (SBOM)** +A machine-readable inventory of the software components and dependencies used by the platform, maintained in SPDX or CycloneDX format. The baseline inventory is required at Tier 1 (APTS-TP-006); SBOM freshness and customer access obligations apply at Tier 2 (APTS-AR-016). + **SQL Injection (SQLi)** Vulnerability in database queries where attacker can insert malicious SQL code. Allows unauthorized database access, data theft, modification.