Skip to content

feat(proxy): retire DRY_RUN, honest EDR action mapping, simulated flag, durable nonce#26

Merged
keirsalterego merged 9 commits into
mainfrom
feat/retire-dry-run-honest-edr
Jun 14, 2026
Merged

feat(proxy): retire DRY_RUN, honest EDR action mapping, simulated flag, durable nonce#26
keirsalterego merged 9 commits into
mainfrom
feat/retire-dry-run-honest-edr

Conversation

@keirsalterego

Copy link
Copy Markdown
Contributor

Summary

Lands the proxy half of the per-tenant connector model (workspace Rule #5) that was committed to local main but never pushed, rebased cleanly onto current origin/main (which had added cargo CI). No conflicts.

  • Retire the global DRY_RUN kill-switch. The proxy always dispatches to the tenant's configured EDR; safety comes from human approval + connector config, not a global switch.
  • simulated honesty flag round-trips through the signed request → response → audit, set from the tenant's is_demo. A demo tenant runs the real execute/rollback path against the mock EDR, tagged simulated=true.
  • Honest EDR action mapping, no silent downgrade. PROCESS_KILL has no faithful CrowdStrike/SentinelOne mapping, so it fails loudly with 501 Not Implemented rather than silently substituting host isolation. Supportability is checked before any EDR call.
  • Canonical request hash + fail-closed nonce store.
  • In-memory nonce eviction now honors the configured ttl_seconds (was hardcoded to DEFAULT_RETENTION_SECONDS), and the runtime /audit log artifact is gitignored.

Verification

  • cargo fmt --check — clean
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test68 passed, 0 failed (incl. process_kill_unsupported_returns_501, simulated_flag_round_trips, nonce bound/replay tests)

The new cargo CI workflow (added on origin via #25) runs the same gates on this PR.

🤖 Generated with Claude Code

PROCESS_KILL on CrowdStrike used to quietly map to host containment with
only a log line: the operator approved a surgical kill and the proxy
quarantined the whole host. SentinelOne ignored the action type entirely
and disconnected the host for every action. Both are gone.

- crowdstrike_action_name and sentinelone_action_path now map each
  (ActionType, ActionDirection) pair to exactly one vendor call or fail
  loudly with EdrError::Unsupported. A true CrowdStrike kill needs an RTR
  session plus a process id, and an S1 kill via threat mitigation needs
  an S1 threat id; the signed request carries neither, so the proxy
  refuses instead of substituting a different action.
- HOST_ISOLATION and NETWORK_QUARANTINE map to contain/lift_containment
  (CS) and disconnect/connect (S1) because those ARE the vendors'
  network containment primitives, not stand-ins.
- The unsupported check runs before any network traffic, so zero EDR
  calls happen for a refused action. DRY_RUN still short-circuits before
  dispatch, unchanged.
- /execute and /rollback return 501 Not Implemented for Unsupported
  (502 stays for transport/EDR failures) and append a FAILED_<action>
  audit entry next to the intent entry, so the trail records that the
  approved action did NOT happen.
- Tests: mapping units for both providers, no-call-attempted dispatch
  tests, an S1 loopback mock asserting the exact endpoint per action and
  zero hits for PROCESS_KILL, and HTTP-level 501 + failure-audit tests.
PROCESS_KILL no longer silently downgrades to host isolation; it
returns 501 with a FAILED_ audit entry, in dry-run and live alike.
SentinelOne gets its own action path mapping. 63 tests, clippy clean.
…-closed nonce

Proxy always dispatches now; the simulated honesty flag is read, audited, and echoed (audit field renamed dry_run->simulated). PRX-01 missing-Redis is a hard prod boot error (VYROX_PROXY_ALLOW_EPHEMERAL_NONCE opt-in for dev). PRX-02 canonical BTreeMap serializer cross-verified with Python (shared sha256 fixture). PRX-03 poison-safe lock, PRX-04 streaming audit export, PRX-05 EDR-body scrubber. Verify side mirrors VYROX_PROXY_SECRET (fallback VYROX_HMAC_SECRET).
claim_memory/evict_expired used the hardcoded DEFAULT_RETENTION_SECONDS instead
of the store's configured ttl_seconds, so the memory backend's replay-retention
window ignored config. Thread ttl_seconds through. Also gitignore the runtime
/audit log artifact.
The two in-test callers used the old 2-arg signature; pass
DEFAULT_RETENTION_SECONDS to preserve their original behavior.
sentry 0.34 pulled rustls 0.22 -> rustls-webpki 0.102.8, which carries 4 RUSTSEC
TLS advisories (RUSTSEC-2026-0049/0098/0099/0104). sentry 0.48 uses rustls 0.23
-> rustls-webpki 0.103.13 (patched). The rest of the TLS stack (axum-server,
reqwest, redis) was already on 0.103. No code changes needed (sentry::init,
ClientOptions, release_name!, ClientInitGuard, protocol::Event are unchanged).

cargo audit: 4 vulnerabilities -> 0. cargo fmt/clippy/test all pass (68 tests).
Cargo.lock was tracked but also listed in .gitignore, so the sentry 0.48 bump
never reached the committed lock and CI kept auditing the stale tree (webpki
0.102.8). Commit the regenerated lock (only rustls-webpki 0.103.13 now) and drop
the dead .gitignore entry: a binary should commit its lockfile so audit + builds
are reproducible.
@keirsalterego keirsalterego merged commit e0cbbab into main Jun 14, 2026
2 checks passed
@keirsalterego keirsalterego deleted the feat/retire-dry-run-honest-edr branch June 14, 2026 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant