Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions tooling/edr-rehearsal/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
results/*
!results/.gitkeep
31 changes: 31 additions & 0 deletions tooling/edr-rehearsal/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# tooling/edr-rehearsal — push-button EDR (SentinelOne) rehearsal harness

Rehearse the signed+notarized macOS runtime against Salesloft's EDR
(SentinelOne) on a throwaway EC2 Mac fixture, pick the surviving artifact by
evidence, and re-image. Authoring/operator tooling — distinct from the build
steps in `packaging/`. Ticket: **WEB-4805**.

Start with **[RUNBOOK.md](RUNBOOK.md)**.

| File | What |
|---|---|
| `RUNBOOK.md` | Operator-facing, step-by-step end-to-end procedure |
| `matrix.md` | 2 artifacts × 2 allowlist states × 5 stages, with a results column |
| `lib.sh` | Shared config + the AWS-profile guard + dry-run/confirm gate (sourced, not run) |
| `provision-fixture.sh` | Allocate a fresh dedicated host + instance per chip (`mac2.metal` arm64, `mac1.metal` intel), us-west-2, default profile |
| `install-s1.sh` | Install the SentinelOne agent on a fixture (site token via `S1_SITE_TOKEN`) |
| `run-rehearsal.sh` | Drive one artifact (`--artifact pyinstaller\|nuitka`) through install → onboard → all hook events → discovery daemon → `--clear` |
| `capture-telemetry.sh` | Collect S1 detections + Storyline + our logs, tagged `{artifact, allowlist, run-id}` |
| `teardown.sh` | Terminate the instance + release the dedicated host (the re-image) |
| `results/` | Per-cell evidence dirs (gitignored) |

## Safety

Every live-action script **defaults to dry-run** (prints the exact commands,
touches nothing) and requires `--execute` to do anything real. AWS-touching
scripts also warn about the **24-hour dedicated-host minimum** and require an
interactive `yes`. The **benchling AWS profile is hard-refused**. Nothing here
can block a developer's daily machine — fail-open is sacred.

All scripts pass `shellcheck` and follow the `set -euo pipefail` / `HERE` /
heredoc conventions used in `packaging/scripts/`.
197 changes: 197 additions & 0 deletions tooling/edr-rehearsal/RUNBOOK.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
# EDR (SentinelOne) rehearsal — operator runbook (WEB-4805)

> One shot. We rehearse the signed+notarized macOS runtime against Salesloft's
> actual EDR (SentinelOne) on a throwaway EC2 Mac fixture **before** the
> customer ever sees it, pick the artifact that survives by evidence, and
> re-image so the rehearsal never pollutes the real fixtures.

Ticket: https://linear.app/unboundsec/issue/WEB-4805
Project: Non-Python MDM Rollout (Mac fleet) — customer Salesloft (~1,150-Mac
Jamf fleet).

## What we are deciding

Two signed artifacts ship from the same pipeline; we run BOTH end-to-end
against S1 and keep the one that stays clean:

| Artifact | Source | pkg |
|---|---|---|
| PyInstaller (default) | WEB-4786 / WEB-4787 | `unbound-runtime-0.1.0.pkg` |
| Nuitka | WEB-4804 (PR #132) | `unbound-runtime-0.1.0-nuitka.pkg` (`-nuitka` suffix; `workflow_dispatch builder=nuitka`) |

**Winner = clears S1 AND notarizes AND passes the bare-Mac universal2 gate**
(`packaging/scripts/lipo-gate.sh`). Pre-allowlisting with Mike (WEB-4784) may
settle it for either artifact even at `allowlist=none`.

## Coordinates (source of truth — do not re-derive)

| Thing | Value |
|---|---|
| Apple Team ID (signer/cert allowlist) | `ZMA55FTA8W` ("Websentry Inc") |
| Released pkg | `https://unbound-release-artifacts.s3.us-west-2.amazonaws.com/macos/0.1.0/unbound-runtime-0.1.0.pkg` |
| onboard.sh | `https://unbound-release-artifacts.s3.us-west-2.amazonaws.com/macos/0.1.0/onboard.sh` |
| Install layout | `/opt/unbound/current/{unbound-hook,unbound-discovery}/` ; LaunchDaemon `ai.getunbound.discovery` |
| AWS | `us-west-2`, **default** profile only (NEVER the benchling profile) |

## Open dependency — DO NOT BLOCK on it

The S1 **tenant + site token** are pending the WEB-4805 sourcing decision
(Salesloft's EDR is confirmed SentinelOne; the tenant we rehearse against —
Salesloft-supplied vs an Unbound S1 trial — is the open question, tracked
against WEB-4784). The harness is fully parameterized: when the token + agent
pkg land, drop them into the env vars below — no script edits required.

This runbook and all scripts can be exercised in **dry-run today** (they print
every command and touch nothing).

## Allowlist strategy under test

| State | S1 console configuration |
|---|---|
| `none` | No exclusions. Baseline. |
| `team-id` | Signer/cert exclusion on **ZMA55FTA8W**, scope **Suppress Alerts** (NOT Interop) **+** path exclusion `/opt/unbound/*` for the LaunchDaemon. |

Set the console policy to the matching state **before** each run; the scripts
only tag captures with the state, they do not configure S1's console.

---

## Prerequisites

- `aws` CLI authenticated to the **default** profile (the org payer / dev
account — see `project_unbound_aws_org`). The harness hard-refuses any
profile name containing `benchling`.
- An SSH keypair registered in EC2 (`EC2_KEY_NAME`), plus a subnet + security
group that allows SSH from your egress IP. Export them so the scripts emit
concrete commands:
```
export EC2_KEY_NAME=... EC2_SUBNET_ID=subnet-...
export EC2_SECURITY_GROUP_ID=sg-...
```
- `shellcheck` (CI runs it; scripts are clean).
- The S1 agent pkg URL + site token + console API token (pending; see above):
```
export S1_SITE_TOKEN=... # registration token, never on argv
export S1_API_TOKEN=... # console read token (capture only)
export S1_CONSOLE_URL=https://<tenant>.sentinelone.net
```
- Rehearsal onboarding keys (a scoped, throwaway tenant — never a prod admin
key):
```
export ONBOARD_API_KEY=... ONBOARD_DISCOVERY_KEY=...
```

> **Every live-action script defaults to DRY-RUN.** Run it once without
> `--execute` to read the exact commands, then add `--execute` when you mean
> it. AWS-touching scripts also print a cost warning and require typing `yes`
> (or `--yes`). Mac dedicated hosts bill a **24-hour minimum** per host.

---

## Step 1 — Provision fresh fixtures (both chips)

```
./provision-fixture.sh --chip both # dry-run: prints every aws call
./provision-fixture.sh --chip both --execute # allocates host + instance per chip
```

- arm64 → `mac2.metal`, intel → `mac1.metal` (the Intel slice must be proven on
real x86_64 hardware, not just present in `lipo` output).
- The script resolves the newest Apple macOS AMI per chip, launches one
instance onto a dedicated host, waits for `instance-status-ok`, and tells you
to record `HOST_ID`/`INSTANCE_ID` into `results/fixtures-<chip>.env` (consumed
by `teardown.sh`).
- Note each fixture's public IP for the next steps.

## Step 2 — Install the SentinelOne agent

```
S1_SITE_TOKEN=... ./install-s1.sh --host <fixture-ip> --pkg <s1-agent.pkg|url> # dry-run
S1_SITE_TOKEN=... ./install-s1.sh --host <fixture-ip> --pkg <s1-agent.pkg|url> --execute # installs
```

Repeat per fixture. Confirm in the S1 console that each fixture is registered
and online before rehearsing. The site token is read from env, never argv (it
would otherwise leak via `ps`).

## Step 3 — Run the full matrix

For each cell — **2 artifacts × 2 allowlist states** — set the S1 console
allowlist to the matching state, then drive the lifecycle. Use a stable
`--run-id` so the capture pairs with the run.

```
# allowlist = none
./run-rehearsal.sh --host <ip> --artifact pyinstaller --allowlist none --run-id r1 --execute
./run-rehearsal.sh --host <ip> --artifact nuitka --allowlist none --run-id r1 --execute
# allowlist = team-id (set ZMA55FTA8W suppress + /opt/unbound/* path excl. in S1 first)
./run-rehearsal.sh --host <ip> --artifact pyinstaller --allowlist team-id --run-id r1 --execute
./run-rehearsal.sh --host <ip> --artifact nuitka --allowlist team-id --run-id r1 --execute
```

Each run drives, in order: **pkg install → onboard.sh → all 5 hook events
(PreToolUse, PostToolUse, UserPromptSubmit, Stop, SessionStart) → discovery
daemon scheduled run → `--clear`**, and captures our own per-stage logs to
`results/<artifact>_<allowlist>_<run-id>/`.

> A non-zero stage does **not** abort the run — it is logged and the matrix
> continues. We are measuring what S1 does, and fail-open is sacred: a hook
> that fails open is expected behavior, not a stop condition.

**Re-image between cells that share a fixture.** The cleanest re-image is
terminate + release the host (Step 5) and re-provision (Step 1) — a fresh host
boots a clean AMI with no S1/runtime residue. At minimum, run the artifact's
`--clear` (the rehearsal does this as Stage 5) and confirm `/opt/unbound` is
gone before the next install.

## Step 4 — Capture telemetry per cell

```
S1_API_TOKEN=... S1_CONSOLE_URL=... \
./capture-telemetry.sh --host <ip> --artifact pyinstaller --allowlist none --run-id r1 # dry-run
S1_API_TOKEN=... S1_CONSOLE_URL=... \
./capture-telemetry.sh --host <ip> --artifact pyinstaller --allowlist none --run-id r1 --execute # collects
```

Collects, into `results/<artifact>_<allowlist>_<run-id>/`:
- S1 console: agent record, threats/detections, activities (Storyline-adjacent)
scoped to the fixture. For each threat id, also export the full Storyline
(process tree) from the console — see the note the script prints.
- Our-side logs pulled off the fixture (`/var/log/unbound/discovery*.log`) plus
the per-stage logs `run-rehearsal.sh` captured.
- `metadata.txt` provenance stamp (artifact, allowlist, run-id, host, team-id,
captured-at).

Pass `--since <iso8601>` to scope S1 queries to the rehearsal window.

## Step 5 — Pick the winner, then teardown

1. Fill `matrix.md` from the evidence dirs.
2. Apply the decision rule: **clears S1 (ideally even at `allowlist=none`, and
certainly at `team-id`) AND notarizes AND passes the bare-Mac lipo gate**.
0.1.0 already notarizes; the lipo gate is `packaging/scripts/lipo-gate.sh`
over each artifact's `dist/`.
3. Release the fixtures so billing stops:
```
./teardown.sh --chip both # dry-run
./teardown.sh --chip both --execute # terminate instances + release hosts
```
`teardown.sh` reads ids from `results/fixtures-<chip>.env`, or pass
`--instance-id`/`--host-id`. If you lost the ids, the script prints the
`describe-instances` query that finds them by the `unbound:purpose` tag.

> **Re-imaging is the teardown.** The rehearsal must not pollute the Stream V
> fixtures: terminate + release, and provision fresh for any further runs.

---

## Safety invariants (do not weaken)

- Nothing here can block a developer's daily machine. Every live action targets
a throwaway EC2 Mac fixture and is `--execute`-gated; the runtime fails open
by design.
- The benchling AWS profile is hard-refused (`lib.sh`).
- Secrets (S1 site/API tokens, onboarding keys) come from env, never argv, and
are never written to the results dir except as the literal name in echoed
commands.
- Captured results may contain endpoint/host data — `results/` is gitignored.
147 changes: 147 additions & 0 deletions tooling/edr-rehearsal/capture-telemetry.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
#!/bin/bash
# Collect the evidence for one rehearsal cell (WEB-4805): S1 detections/threats
# + Storyline export from the S1 console API, plus our own install/hook/
# discovery logs pulled off the fixture. Everything lands in a results dir
# tagged {artifact, allowlist-state, run-id} so matrix.md can be filled from
# files, not memory.
#
# The run-id is a PARAMETER (--run-id) so a capture is reproducible and matches
# the run-rehearsal.sh tag exactly. If omitted, it defaults to a UTC timestamp
# (fine for an interactive one-off; pass --run-id to pair with a specific run).
#
# DRY-RUN BY DEFAULT. --execute performs the S1 API queries + SSH log pulls.
# Reads only — capture never changes the fixture or the S1 tenant.
#
# Usage:
# S1_API_TOKEN=... capture-telemetry.sh --host <ip> --artifact pyinstaller|nuitka \
# --allowlist none|team-id [--run-id <id>] [--since <iso8601>] [--execute]
set -euo pipefail

HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# shellcheck source=./lib.sh
source "$HERE/lib.sh"

TARGET_HOST=""
ARTIFACT=""
ALLOWLIST=""
RUN_ID=""
SINCE=""
EXECUTE=0
SSH_USER="${SSH_USER:-ec2-user}"

# S1 console (mgmt) base URL + read API token. Pending WEB-4805 vendor decision.
S1_CONSOLE_URL="${S1_CONSOLE_URL:-<S1_CONSOLE_URL>}"

while [[ $# -gt 0 ]]; do
case "$1" in
--host) TARGET_HOST="${2:-}"; shift 2 ;;
--artifact) ARTIFACT="${2:-}"; shift 2 ;;
--allowlist) ALLOWLIST="${2:-}"; shift 2 ;;
--run-id) RUN_ID="${2:-}"; shift 2 ;;
--since) SINCE="${2:-}"; shift 2 ;;
--execute) EXECUTE=1; shift ;;
-h|--help) grep '^#' "$0" | sed 's/^# \{0,1\}//'; exit 0 ;;
*) die "unknown argument: $1 (see --help)" ;;
esac
done

[[ -n "$TARGET_HOST" ]] || die "--host <fixture-ip> is required"
case "$ARTIFACT" in
pyinstaller|nuitka) ;;
*) die "--artifact must be pyinstaller or nuitka" ;;
esac
case "$ALLOWLIST" in
none|team-id) ;;
*) die "--allowlist must be none or team-id" ;;
esac
# Default run-id is a timestamp; pass --run-id to pair with a specific
# run-rehearsal.sh cell. (Default kept out of the inline command paths so the
# value is stable for the whole invocation.)
RUN_ID="${RUN_ID:-$(date -u +%Y%m%dT%H%M%SZ)}"

TAG="${ARTIFACT}_${ALLOWLIST}_${RUN_ID}"
OUT="$RESULTS_DIR/$TAG"

s1_get() { # path -> prints curl command; runs it on --execute into a named file
local desc="$1" path="$2" outfile="$3"
log "$desc"
emit_cmd "curl -fsS -H 'Authorization: ApiToken \$S1_API_TOKEN' '$S1_CONSOLE_URL$path' > $OUT/$outfile"
if [[ $EXECUTE -eq 1 ]]; then
require_tool curl
[[ -n "${S1_API_TOKEN:-}" ]] || die "S1_API_TOKEN unset (read token from the S1 console; do not pass on argv)"
curl -fsS -H "Authorization: ApiToken ${S1_API_TOKEN}" "$S1_CONSOLE_URL$path" > "$OUT/$outfile" \
|| warn "S1 query failed: $desc (left $OUT/$outfile possibly empty)"
fi
}

pull_log() { # remote-path local-name
local remote="$1" local_name="$2"
log "pull $remote"
emit_cmd "scp $SSH_USER@$TARGET_HOST:$remote $OUT/$local_name"
if [[ $EXECUTE -eq 1 ]]; then
require_tool scp
scp -o StrictHostKeyChecking=accept-new "$SSH_USER@$TARGET_HOST:$remote" "$OUT/$local_name" 2>/dev/null \
|| warn "could not pull $remote (may not exist for this stage — ok)"
fi
}

main() {
section "Capture telemetry — $TAG"
log "Results dir: $OUT"
if [[ $EXECUTE -eq 1 ]]; then
mkdir -p "$OUT"
else
section "DRY RUN — no API/SSH calls. Re-run with --execute."
fi

section "1) SentinelOne console: threats + Storyline for this fixture"
# The agent UUID/endpoint name for $TARGET_HOST is looked up first so the
# threat/Storyline queries scope to THIS fixture only. computerName filter
# keeps the query tenant-safe.
s1_get "agent record for fixture" \
"/web/api/v2.1/agents?computerName__contains=${FIXTURE_TAG}" \
"s1_agents.json"
local since_q=""
[[ -n "$SINCE" ]] && since_q="&createdAt__gte=${SINCE}"
s1_get "threats/detections for fixture" \
"/web/api/v2.1/threats?computerName__contains=${FIXTURE_TAG}${since_q}" \
"s1_threats.json"
s1_get "activities (Storyline-adjacent events)" \
"/web/api/v2.1/activities?computerName__contains=${FIXTURE_TAG}${since_q}" \
"s1_activities.json"
log "NOTE: full Storyline (process-tree) export is per-threat — for each threat id in"
log " s1_threats.json, also fetch /web/api/v2.1/threats/<id>/explore/* or export"
log " the Deep Visibility query from the console UI into $OUT/storyline/."

section "2) Our-side logs off the fixture"
pull_log "/var/log/unbound/discovery.log" "unbound-discovery.log"
pull_log "/var/log/unbound/discovery.err.log" "unbound-discovery.err.log"
# run-rehearsal.sh already captured per-stage logs locally under the matching
# tag; copy them in so each cell's evidence is self-contained.
if [[ $EXECUTE -eq 1 && -d "$RESULTS_DIR/$TAG" && "$RESULTS_DIR/$TAG" != "$OUT" ]]; then
cp "$RESULTS_DIR/$TAG"/*.log "$OUT/" 2>/dev/null || true
fi

section "3) Provenance stamp"
emit_cmd "write $OUT/metadata.txt (artifact, allowlist, run-id, host, team-id, captured-at)"
if [[ $EXECUTE -eq 1 ]]; then
{
printf 'artifact=%s\n' "$ARTIFACT"
printf 'allowlist=%s\n' "$ALLOWLIST"
printf 'run_id=%s\n' "$RUN_ID"
printf 'fixture_host=%s\n' "$TARGET_HOST"
printf 'team_id=%s\n' "$TEAM_ID"
printf 'release_version=%s\n' "$RELEASE_VERSION"
printf 'captured_at=%s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
} > "$OUT/metadata.txt"
log "wrote $OUT/metadata.txt"
fi

section "Done"
log "Fill the matching row in matrix.md from the files in $OUT."
if [[ $EXECUTE -eq 0 ]]; then
section "DRY RUN complete. Nothing was captured."
fi
}

main
Loading