websentry-ai · vigneshsubbiah16 · Jun 13, 2026
diff --git a/tooling/edr-rehearsal/.gitignore b/tooling/edr-rehearsal/.gitignore
@@ -0,0 +1,2 @@
+results/*
+!results/.gitkeep
diff --git a/tooling/edr-rehearsal/README.md b/tooling/edr-rehearsal/README.md
@@ -0,0 +1,31 @@
+# tooling/edr-rehearsal — push-button EDR (SentinelOne) rehearsal harness
+
+Rehearse the signed+notarized macOS runtime against Salesloft's EDR
+(SentinelOne) on a throwaway EC2 Mac fixture, pick the surviving artifact by
+evidence, and re-image. Authoring/operator tooling — distinct from the build
+steps in `packaging/`. Ticket: **WEB-4805**.
+
+Start with **[RUNBOOK.md](RUNBOOK.md)**.
+
+| File | What |
+|---|---|
+| `RUNBOOK.md` | Operator-facing, step-by-step end-to-end procedure |
+| `matrix.md` | 2 artifacts × 2 allowlist states × 5 stages, with a results column |
+| `lib.sh` | Shared config + the AWS-profile guard + dry-run/confirm gate (sourced, not run) |
+| `provision-fixture.sh` | Allocate a fresh dedicated host + instance per chip (`mac2.metal` arm64, `mac1.metal` intel), us-west-2, default profile |
+| `install-s1.sh` | Install the SentinelOne agent on a fixture (site token via `S1_SITE_TOKEN`) |
+| `run-rehearsal.sh` | Drive one artifact (`--artifact pyinstaller\|nuitka`) through install → onboard → all hook events → discovery daemon → `--clear` |
+| `capture-telemetry.sh` | Collect S1 detections + Storyline + our logs, tagged `{artifact, allowlist, run-id}` |
+| `teardown.sh` | Terminate the instance + release the dedicated host (the re-image) |
+| `results/` | Per-cell evidence dirs (gitignored) |
+
+## Safety
+
+Every live-action script **defaults to dry-run** (prints the exact commands,
+touches nothing) and requires `--execute` to do anything real. AWS-touching
+scripts also warn about the **24-hour dedicated-host minimum** and require an
+interactive `yes`. The **benchling AWS profile is hard-refused**. Nothing here
+can block a developer's daily machine — fail-open is sacred.
+
+All scripts pass `shellcheck` and follow the `set -euo pipefail` / `HERE` /
+heredoc conventions used in `packaging/scripts/`.
diff --git a/tooling/edr-rehearsal/RUNBOOK.md b/tooling/edr-rehearsal/RUNBOOK.md
@@ -0,0 +1,197 @@
+# EDR (SentinelOne) rehearsal — operator runbook (WEB-4805)
+
+> One shot. We rehearse the signed+notarized macOS runtime against Salesloft's
+> actual EDR (SentinelOne) on a throwaway EC2 Mac fixture **before** the
+> customer ever sees it, pick the artifact that survives by evidence, and
+> re-image so the rehearsal never pollutes the real fixtures.
+
+Ticket: https://linear.app/unboundsec/issue/WEB-4805
+Project: Non-Python MDM Rollout (Mac fleet) — customer Salesloft (~1,150-Mac
+Jamf fleet).
+
+## What we are deciding
+
+Two signed artifacts ship from the same pipeline; we run BOTH end-to-end
+against S1 and keep the one that stays clean:
+
+| Artifact | Source | pkg |
+|---|---|---|
+| PyInstaller (default) | WEB-4786 / WEB-4787 | `unbound-runtime-0.1.0.pkg` |
+| Nuitka | WEB-4804 (PR #132) | `unbound-runtime-0.1.0-nuitka.pkg` (`-nuitka` suffix; `workflow_dispatch builder=nuitka`) |
+
+**Winner = clears S1 AND notarizes AND passes the bare-Mac universal2 gate**
+(`packaging/scripts/lipo-gate.sh`). Pre-allowlisting with Mike (WEB-4784) may
+settle it for either artifact even at `allowlist=none`.
+
+## Coordinates (source of truth — do not re-derive)
+
+| Thing | Value |
+|---|---|
+| Apple Team ID (signer/cert allowlist) | `ZMA55FTA8W` ("Websentry Inc") |
+| Released pkg | `https://unbound-release-artifacts.s3.us-west-2.amazonaws.com/macos/0.1.0/unbound-runtime-0.1.0.pkg` |
+| onboard.sh | `https://unbound-release-artifacts.s3.us-west-2.amazonaws.com/macos/0.1.0/onboard.sh` |
+| Install layout | `/opt/unbound/current/{unbound-hook,unbound-discovery}/` ; LaunchDaemon `ai.getunbound.discovery` |
+| AWS | `us-west-2`, **default** profile only (NEVER the benchling profile) |
+
+## Open dependency — DO NOT BLOCK on it
+
+The S1 **tenant + site token** are pending the WEB-4805 sourcing decision
+(Salesloft's EDR is confirmed SentinelOne; the tenant we rehearse against —
+Salesloft-supplied vs an Unbound S1 trial — is the open question, tracked
+against WEB-4784). The harness is fully parameterized: when the token + agent
+pkg land, drop them into the env vars below — no script edits required.
+
+This runbook and all scripts can be exercised in **dry-run today** (they print
+every command and touch nothing).
+
+## Allowlist strategy under test
+
+| State | S1 console configuration |
+|---|---|
+| `none` | No exclusions. Baseline. |
+| `team-id` | Signer/cert exclusion on **ZMA55FTA8W**, scope **Suppress Alerts** (NOT Interop) **+** path exclusion `/opt/unbound/*` for the LaunchDaemon. |
+
+Set the console policy to the matching state **before** each run; the scripts
+only tag captures with the state, they do not configure S1's console.
+
+---
+
+## Prerequisites
+
+- `aws` CLI authenticated to the **default** profile (the org payer / dev
+  account — see `project_unbound_aws_org`). The harness hard-refuses any
+  profile name containing `benchling`.
+- An SSH keypair registered in EC2 (`EC2_KEY_NAME`), plus a subnet + security
+  group that allows SSH from your egress IP. Export them so the scripts emit
+  concrete commands:
+  ```
+  export EC2_KEY_NAME=...           EC2_SUBNET_ID=subnet-...
+  export EC2_SECURITY_GROUP_ID=sg-...
+  ```
+- `shellcheck` (CI runs it; scripts are clean).
+- The S1 agent pkg URL + site token + console API token (pending; see above):
+  ```
+  export S1_SITE_TOKEN=...          # registration token, never on argv
+  export S1_API_TOKEN=...           # console read token (capture only)
+  export S1_CONSOLE_URL=https://<tenant>.sentinelone.net
+  ```
+- Rehearsal onboarding keys (a scoped, throwaway tenant — never a prod admin
+  key):
+  ```
+  export ONBOARD_API_KEY=...        ONBOARD_DISCOVERY_KEY=...
+  ```
+
+> **Every live-action script defaults to DRY-RUN.** Run it once without
+> `--execute` to read the exact commands, then add `--execute` when you mean
+> it. AWS-touching scripts also print a cost warning and require typing `yes`
+> (or `--yes`). Mac dedicated hosts bill a **24-hour minimum** per host.
+
+---
+
+## Step 1 — Provision fresh fixtures (both chips)
+
+```
+./provision-fixture.sh --chip both            # dry-run: prints every aws call
+./provision-fixture.sh --chip both --execute  # allocates host + instance per chip
+```
+
+- arm64 → `mac2.metal`, intel → `mac1.metal` (the Intel slice must be proven on
+  real x86_64 hardware, not just present in `lipo` output).
+- The script resolves the newest Apple macOS AMI per chip, launches one
+  instance onto a dedicated host, waits for `instance-status-ok`, and tells you
+  to record `HOST_ID`/`INSTANCE_ID` into `results/fixtures-<chip>.env` (consumed
+  by `teardown.sh`).
+- Note each fixture's public IP for the next steps.
+
+## Step 2 — Install the SentinelOne agent
+
+```
+S1_SITE_TOKEN=... ./install-s1.sh --host <fixture-ip> --pkg <s1-agent.pkg|url>            # dry-run
+S1_SITE_TOKEN=... ./install-s1.sh --host <fixture-ip> --pkg <s1-agent.pkg|url> --execute  # installs
+```
+
+Repeat per fixture. Confirm in the S1 console that each fixture is registered
+and online before rehearsing. The site token is read from env, never argv (it
+would otherwise leak via `ps`).
+
+## Step 3 — Run the full matrix
+
+For each cell — **2 artifacts × 2 allowlist states** — set the S1 console
+allowlist to the matching state, then drive the lifecycle. Use a stable
+`--run-id` so the capture pairs with the run.
+
+```
+# allowlist = none
+./run-rehearsal.sh --host <ip> --artifact pyinstaller --allowlist none    --run-id r1 --execute
+./run-rehearsal.sh --host <ip> --artifact nuitka      --allowlist none    --run-id r1 --execute
+# allowlist = team-id (set ZMA55FTA8W suppress + /opt/unbound/* path excl. in S1 first)
+./run-rehearsal.sh --host <ip> --artifact pyinstaller --allowlist team-id --run-id r1 --execute
+./run-rehearsal.sh --host <ip> --artifact nuitka      --allowlist team-id --run-id r1 --execute
+```
+
+Each run drives, in order: **pkg install → onboard.sh → all 5 hook events
+(PreToolUse, PostToolUse, UserPromptSubmit, Stop, SessionStart) → discovery
+daemon scheduled run → `--clear`**, and captures our own per-stage logs to
+`results/<artifact>_<allowlist>_<run-id>/`.
+
+> A non-zero stage does **not** abort the run — it is logged and the matrix
+> continues. We are measuring what S1 does, and fail-open is sacred: a hook
+> that fails open is expected behavior, not a stop condition.
+
+**Re-image between cells that share a fixture.** The cleanest re-image is
+terminate + release the host (Step 5) and re-provision (Step 1) — a fresh host
+boots a clean AMI with no S1/runtime residue. At minimum, run the artifact's
+`--clear` (the rehearsal does this as Stage 5) and confirm `/opt/unbound` is
+gone before the next install.
+
+## Step 4 — Capture telemetry per cell
+
+```
+S1_API_TOKEN=... S1_CONSOLE_URL=... \
+  ./capture-telemetry.sh --host <ip> --artifact pyinstaller --allowlist none --run-id r1            # dry-run
+S1_API_TOKEN=... S1_CONSOLE_URL=... \
+  ./capture-telemetry.sh --host <ip> --artifact pyinstaller --allowlist none --run-id r1 --execute  # collects
+```
+
+Collects, into `results/<artifact>_<allowlist>_<run-id>/`:
+- S1 console: agent record, threats/detections, activities (Storyline-adjacent)
+  scoped to the fixture. For each threat id, also export the full Storyline
+  (process tree) from the console — see the note the script prints.
+- Our-side logs pulled off the fixture (`/var/log/unbound/discovery*.log`) plus
+  the per-stage logs `run-rehearsal.sh` captured.
+- `metadata.txt` provenance stamp (artifact, allowlist, run-id, host, team-id,
+  captured-at).
+
+Pass `--since <iso8601>` to scope S1 queries to the rehearsal window.
+
+## Step 5 — Pick the winner, then teardown
+
+1. Fill `matrix.md` from the evidence dirs.
+2. Apply the decision rule: **clears S1 (ideally even at `allowlist=none`, and
+   certainly at `team-id`) AND notarizes AND passes the bare-Mac lipo gate**.
+   0.1.0 already notarizes; the lipo gate is `packaging/scripts/lipo-gate.sh`
+   over each artifact's `dist/`.
+3. Release the fixtures so billing stops:
+   ```
+   ./teardown.sh --chip both            # dry-run
+   ./teardown.sh --chip both --execute  # terminate instances + release hosts
+   ```
+   `teardown.sh` reads ids from `results/fixtures-<chip>.env`, or pass
+   `--instance-id`/`--host-id`. If you lost the ids, the script prints the
+   `describe-instances` query that finds them by the `unbound:purpose` tag.
+
+> **Re-imaging is the teardown.** The rehearsal must not pollute the Stream V
+> fixtures: terminate + release, and provision fresh for any further runs.
+
+---
+
+## Safety invariants (do not weaken)
+
+- Nothing here can block a developer's daily machine. Every live action targets
+  a throwaway EC2 Mac fixture and is `--execute`-gated; the runtime fails open
+  by design.
+- The benchling AWS profile is hard-refused (`lib.sh`).
+- Secrets (S1 site/API tokens, onboarding keys) come from env, never argv, and
+  are never written to the results dir except as the literal name in echoed
+  commands.
+- Captured results may contain endpoint/host data — `results/` is gitignored.
diff --git a/tooling/edr-rehearsal/capture-telemetry.sh b/tooling/edr-rehearsal/capture-telemetry.sh
@@ -0,0 +1,147 @@
+#!/bin/bash
+# Collect the evidence for one rehearsal cell (WEB-4805): S1 detections/threats
+# + Storyline export from the S1 console API, plus our own install/hook/
+# discovery logs pulled off the fixture. Everything lands in a results dir
+# tagged {artifact, allowlist-state, run-id} so matrix.md can be filled from
+# files, not memory.
+#
+# The run-id is a PARAMETER (--run-id) so a capture is reproducible and matches
+# the run-rehearsal.sh tag exactly. If omitted, it defaults to a UTC timestamp
+# (fine for an interactive one-off; pass --run-id to pair with a specific run).
+#
+# DRY-RUN BY DEFAULT. --execute performs the S1 API queries + SSH log pulls.
+# Reads only — capture never changes the fixture or the S1 tenant.
+#
+# Usage:
+#   S1_API_TOKEN=... capture-telemetry.sh --host <ip> --artifact pyinstaller|nuitka \
+#       --allowlist none|team-id [--run-id <id>] [--since <iso8601>] [--execute]
+set -euo pipefail
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+# shellcheck source=./lib.sh
+source "$HERE/lib.sh"
+
+TARGET_HOST=""
+ARTIFACT=""
+ALLOWLIST=""
+RUN_ID=""
+SINCE=""
+EXECUTE=0
+SSH_USER="${SSH_USER:-ec2-user}"
+
+# S1 console (mgmt) base URL + read API token. Pending WEB-4805 vendor decision.
+S1_CONSOLE_URL="${S1_CONSOLE_URL:-<S1_CONSOLE_URL>}"
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --host)      TARGET_HOST="${2:-}"; shift 2 ;;
+    --artifact)  ARTIFACT="${2:-}"; shift 2 ;;
+    --allowlist) ALLOWLIST="${2:-}"; shift 2 ;;
+    --run-id)    RUN_ID="${2:-}"; shift 2 ;;
+    --since)     SINCE="${2:-}"; shift 2 ;;
+    --execute)   EXECUTE=1; shift ;;
+    -h|--help)   grep '^#' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
+    *) die "unknown argument: $1 (see --help)" ;;
+  esac
+done
+
+[[ -n "$TARGET_HOST" ]] || die "--host <fixture-ip> is required"
+case "$ARTIFACT" in
+  pyinstaller|nuitka) ;;
+  *) die "--artifact must be pyinstaller or nuitka" ;;
+esac
+case "$ALLOWLIST" in
+  none|team-id) ;;
+  *) die "--allowlist must be none or team-id" ;;
+esac
+# Default run-id is a timestamp; pass --run-id to pair with a specific
+# run-rehearsal.sh cell. (Default kept out of the inline command paths so the
+# value is stable for the whole invocation.)
+RUN_ID="${RUN_ID:-$(date -u +%Y%m%dT%H%M%SZ)}"
+
+TAG="${ARTIFACT}_${ALLOWLIST}_${RUN_ID}"
+OUT="$RESULTS_DIR/$TAG"
+
+s1_get() { # path -> prints curl command; runs it on --execute into a named file
+  local desc="$1" path="$2" outfile="$3"
+  log "$desc"
+  emit_cmd "curl -fsS -H 'Authorization: ApiToken \$S1_API_TOKEN' '$S1_CONSOLE_URL$path' > $OUT/$outfile"
+  if [[ $EXECUTE -eq 1 ]]; then
+    require_tool curl
+    [[ -n "${S1_API_TOKEN:-}" ]] || die "S1_API_TOKEN unset (read token from the S1 console; do not pass on argv)"
+    curl -fsS -H "Authorization: ApiToken ${S1_API_TOKEN}" "$S1_CONSOLE_URL$path" > "$OUT/$outfile" \
+      || warn "S1 query failed: $desc (left $OUT/$outfile possibly empty)"
+  fi
+}
+
+pull_log() { # remote-path local-name
+  local remote="$1" local_name="$2"
+  log "pull $remote"
+  emit_cmd "scp $SSH_USER@$TARGET_HOST:$remote $OUT/$local_name"
+  if [[ $EXECUTE -eq 1 ]]; then
+    require_tool scp
+    scp -o StrictHostKeyChecking=accept-new "$SSH_USER@$TARGET_HOST:$remote" "$OUT/$local_name" 2>/dev/null \
+      || warn "could not pull $remote (may not exist for this stage — ok)"
+  fi
+}
+
+main() {
+  section "Capture telemetry — $TAG"
+  log "Results dir: $OUT"
+  if [[ $EXECUTE -eq 1 ]]; then
+    mkdir -p "$OUT"
+  else
+    section "DRY RUN — no API/SSH calls. Re-run with --execute."
+  fi
+
+  section "1) SentinelOne console: threats + Storyline for this fixture"
+  # The agent UUID/endpoint name for $TARGET_HOST is looked up first so the
+  # threat/Storyline queries scope to THIS fixture only. computerName filter
+  # keeps the query tenant-safe.
+  s1_get "agent record for fixture" \
+    "/web/api/v2.1/agents?computerName__contains=${FIXTURE_TAG}" \
+    "s1_agents.json"
+  local since_q=""
+  [[ -n "$SINCE" ]] && since_q="&createdAt__gte=${SINCE}"
+  s1_get "threats/detections for fixture" \
+    "/web/api/v2.1/threats?computerName__contains=${FIXTURE_TAG}${since_q}" \
+    "s1_threats.json"
+  s1_get "activities (Storyline-adjacent events)" \
+    "/web/api/v2.1/activities?computerName__contains=${FIXTURE_TAG}${since_q}" \
+    "s1_activities.json"
+  log "NOTE: full Storyline (process-tree) export is per-threat — for each threat id in"
+  log "      s1_threats.json, also fetch /web/api/v2.1/threats/<id>/explore/* or export"
+  log "      the Deep Visibility query from the console UI into $OUT/storyline/."
+
+  section "2) Our-side logs off the fixture"
+  pull_log "/var/log/unbound/discovery.log"     "unbound-discovery.log"
+  pull_log "/var/log/unbound/discovery.err.log" "unbound-discovery.err.log"
+  # run-rehearsal.sh already captured per-stage logs locally under the matching
+  # tag; copy them in so each cell's evidence is self-contained.
+  if [[ $EXECUTE -eq 1 && -d "$RESULTS_DIR/$TAG" && "$RESULTS_DIR/$TAG" != "$OUT" ]]; then
+    cp "$RESULTS_DIR/$TAG"/*.log "$OUT/" 2>/dev/null || true
+  fi
+
+  section "3) Provenance stamp"
+  emit_cmd "write $OUT/metadata.txt (artifact, allowlist, run-id, host, team-id, captured-at)"
+  if [[ $EXECUTE -eq 1 ]]; then
+    {
+      printf 'artifact=%s\n'   "$ARTIFACT"
+      printf 'allowlist=%s\n'  "$ALLOWLIST"
+      printf 'run_id=%s\n'     "$RUN_ID"
+      printf 'fixture_host=%s\n' "$TARGET_HOST"
+      printf 'team_id=%s\n'    "$TEAM_ID"
+      printf 'release_version=%s\n' "$RELEASE_VERSION"
+      printf 'captured_at=%s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+    } > "$OUT/metadata.txt"
+    log "wrote $OUT/metadata.txt"
+  fi
+
+  section "Done"
+  log "Fill the matching row in matrix.md from the files in $OUT."
+  if [[ $EXECUTE -eq 0 ]]; then
+    section "DRY RUN complete. Nothing was captured."
+  fi
+}
+
+main