diff --git a/CHANGELOG.md b/CHANGELOG.md
index f06fd58..ca90670 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -8,6 +8,13 @@ Every version listed here must correspond to a slice in [`PLAN.md`](./PLAN.md) w
 
 ---
 
+## [0.9.6] — 2026-05-28
+
+### Added
+- **Load-test harness** (`backend/loadtest/`) — a reusable open-loop load tester for the backend warm `/analyze` path, reporting latency percentiles, error rate, and achieved throughput against pass/fail thresholds. Includes a runbook for a local 100 RPS warm-cache run and for pointing at a deployed target.
+
+---
+
 ## [0.9.5] — 2026-05-28
 
 ### Security
diff --git a/PLAN.md b/PLAN.md
index e36138a..e903498 100644
--- a/PLAN.md
+++ b/PLAN.md
@@ -44,7 +44,7 @@
 | **v0.9.3** | Deletable `/me` history + back-nav loading fix + creator flair | ✅ shipped |
 | **v0.9.4** | DB pool size env-tunable + real back-nav spinner fix | ✅ shipped |
 | **v0.9.5** | Security review + hardening (OAuth scope ↓ `read:user`, HTTP security headers) | ✅ shipped |
-| **v0.9.6** | Load test to 100 RPS | pending |
+| **v0.9.6** | Load-test harness (warm /analyze; full 100 RPS run = operator step) | ✅ shipped |
 | **v0.9.7** | Privacy policy + terms (legal docs) | pending |
 | **v1.0.0** | Public launch | pending |
 
@@ -670,11 +670,21 @@ The narrative-mode CHECK constraint was a third drift in the same family — the
 
 ---
 
-## v0.9.6 — Load test to 100 RPS (deferred)
+## v0.9.6 — Load-test harness (shipped 2026-05-28)
 
-**Goal:** Load-test to 100 RPS sustained and verify the error budget holds. Needs a deliberate design: target (prod vs preview vs local), cost ceiling (Vercel Active-CPU pricing), and how to handle the v0.9.2 rate limits (a naive test from one IP just measures 429s — raise limits for the window, test `/health` + a warm-cached path, or use a bypass).
+**Goal:** Reusable Python/httpx open-loop load harness for the backend warm `/analyze` path; the full 100 RPS validation run is an operator step (hardware-gated).
 
-**Exit criteria:** TBD when the slice begins.
+**Delivered:** `backend/loadtest/run.py` (open-loop dispatcher, p50/p95/p99, error rate, achieved RPS, pass/fail thresholds, ramp), unit-tested stats helpers, and `backend/loadtest/README.md` runbook (local SRH warm-cache setup + deploy target). Local warm-cache uses SRH (Upstash-compatible Redis over Docker) — real Upstash's ~10k/day free tier can't absorb a 100 RPS run. Anonymous load + unset `INTERNAL_PROXY_SECRET` means the analyze limiter skips enforcement, so no bypass is needed.
+
+**Design spec:** [`docs/superpowers/specs/2026-05-28-v0.9.6-load-test-harness-design.md`](./docs/superpowers/specs/2026-05-28-v0.9.6-load-test-harness-design.md).
+**Sub-plan:** [`docs/superpowers/plans/2026-05-28-v0.9.6-load-test-harness.md`](./docs/superpowers/plans/2026-05-28-v0.9.6-load-test-harness.md).
+
+**Exit criteria:**
+- [x] `loadtest/run.py` + unit-tested stats helpers; ruff clean; backend suite green.
+- [x] Runbook complete (local SRH + deploy target).
+- [x] Light `/health`-class sanity run passes (ran against `/openapi.json`: 10 RPS × 5 s, 0 errors, p95 6.2 ms, PASS).
+- [x] Docs ritual + version bump to 0.9.6; tag + release.
+- [ ] Full 100 RPS warm-`/analyze` result recorded — operator step, filled in when run.
 
 ---
 
diff --git a/README.md b/README.md
index af75a36..e1aae9c 100644
--- a/README.md
+++ b/README.md
@@ -43,7 +43,7 @@ Engineering insight first. AI flavor second. Scoring is deterministic and explai
 
 ## Status
 
-Pre-alpha. Latest shipped release is **v0.9.5** (a full pre-launch security audit — no high/critical findings — that tightened the GitHub OAuth scope to read-only and added HTTP security headers). v0.9.4 before it made the DB connection pool size env-tunable and genuinely fixed the back-nav search spinner; v0.9.3 added deletable `/me` history with undo, a golden "creator" scorecard for the project's creator account, and a first (incomplete) attempt at the back-nav spinner fix. Live at https://skill-issue-tau.vercel.app — GitHub OAuth sign-in, Neon Postgres persistence, `/me` history, opt-in `/share/[slug]` public links. The AI narrative layer (Roast + Mentor) runs on **Groq** (`llama-3.3-70b-versatile`). v0.7.0 added Upstash Redis caching (warm `/analyze` ≤ 200 ms); v0.7.2 prod-certified the perf budget (CLS 0.080 → **0** structurally, perf 90 → 94, LCP 2,804 → 2,773 ms); v0.8.0 shipped Sentry (FE+BE), PostHog (events + web vitals), structlog JSON logging, on-voice 404, and a full axe a11y pass; v0.8.1 ships the nightly cron with bearer auth; v0.8.2 pairs it with the manual force-refresh button on `/me`; v0.8.3 hotfixes the empty-repo crash; v0.8.4 fixes the silent narrative misattribution; v0.8.5 closes the post-deploy-Sentry loop with a pre-merge CI gate; v0.8.6 closes v0.7.1's deferred share-page caching; v0.8.7 modernizes project config; v0.9.0 opens Beta hardening with bounded GH fan-out; v0.9.1 closes the /me N+1 + adds per-namespace Report cache versioning; v0.9.2 adds rate limiting (per-IP for anonymous, higher per-user caps for signed-in) on `/analyze` and `/narrative`; v0.9.3 adds deletable `/me` history with undo, attempts the back-nav search-spinner fix, and gilds the creator's scorecard. v0.9.4 makes the DB connection pool size env-tunable (defaults unchanged — RUM showed no pool exhaustion) and lands the real back-nav spinner fix (the v0.9.3 attempt addressed the wrong mechanism); v0.9.5 runs a full pre-launch security audit (no high/critical findings), tightens the OAuth scope to `read:user`, and adds HTTP security headers. **v0.9.6 — load test to 100 RPS** is next. See [`CHANGELOG.md`](./CHANGELOG.md) for shipped slices, [`PLAN.md`](./PLAN.md) for the full roadmap, and [`docs/PROGRESS_LOG.md`](./docs/PROGRESS_LOG.md) for the most recent session handoff.
+Pre-alpha. Latest shipped release is **v0.9.6** (a reusable load-test harness for the warm `/analyze` path; the full 100 RPS run is an operator step). v0.9.5 before it ran a full pre-launch security audit — no high/critical findings — tightening the GitHub OAuth scope to read-only and adding HTTP security headers; v0.9.4 made the DB connection pool size env-tunable and genuinely fixed the back-nav search spinner; v0.9.3 added deletable `/me` history with undo, a golden "creator" scorecard for the project's creator account, and a first (incomplete) attempt at the back-nav spinner fix. Live at https://skill-issue-tau.vercel.app — GitHub OAuth sign-in, Neon Postgres persistence, `/me` history, opt-in `/share/[slug]` public links. The AI narrative layer (Roast + Mentor) runs on **Groq** (`llama-3.3-70b-versatile`). v0.7.0 added Upstash Redis caching (warm `/analyze` ≤ 200 ms); v0.7.2 prod-certified the perf budget (CLS 0.080 → **0** structurally, perf 90 → 94, LCP 2,804 → 2,773 ms); v0.8.0 shipped Sentry (FE+BE), PostHog (events + web vitals), structlog JSON logging, on-voice 404, and a full axe a11y pass; v0.8.1 ships the nightly cron with bearer auth; v0.8.2 pairs it with the manual force-refresh button on `/me`; v0.8.3 hotfixes the empty-repo crash; v0.8.4 fixes the silent narrative misattribution; v0.8.5 closes the post-deploy-Sentry loop with a pre-merge CI gate; v0.8.6 closes v0.7.1's deferred share-page caching; v0.8.7 modernizes project config; v0.9.0 opens Beta hardening with bounded GH fan-out; v0.9.1 closes the /me N+1 + adds per-namespace Report cache versioning; v0.9.2 adds rate limiting (per-IP for anonymous, higher per-user caps for signed-in) on `/analyze` and `/narrative`; v0.9.3 adds deletable `/me` history with undo, attempts the back-nav search-spinner fix, and gilds the creator's scorecard. v0.9.4 makes the DB connection pool size env-tunable (defaults unchanged — RUM showed no pool exhaustion) and lands the real back-nav spinner fix (the v0.9.3 attempt addressed the wrong mechanism); v0.9.5 runs a full pre-launch security audit (no high/critical findings), tightens the OAuth scope to `read:user`, and adds HTTP security headers; v0.9.6 adds a reusable load-test harness for the warm `/analyze` path (the full 100 RPS run is an operator step). **v0.9.7 — privacy policy + terms** is next. See [`CHANGELOG.md`](./CHANGELOG.md) for shipped slices, [`PLAN.md`](./PLAN.md) for the full roadmap, and [`docs/PROGRESS_LOG.md`](./docs/PROGRESS_LOG.md) for the most recent session handoff.
 
 ---
 
@@ -76,7 +76,7 @@ cp .env.example .env        # then edit .env and add your GITHUB_TOKEN and OPENA
 uv run uvicorn app.main:app --reload --port 8000
 ```
 
-Verify: `curl http://localhost:8000/health` → `{"status":"ok","version":"0.9.5","db":"up"|"down","cache":"up"|"down"|"unconfigured"}`. The `db` field reports DB reachability when `DATABASE_URL` is configured; the `cache` field reports Upstash reachability (`unconfigured` when `UPSTASH_REDIS_REST_URL` isn't set — perfectly fine for local dev, the in-process fallback covers it).
+Verify: `curl http://localhost:8000/health` → `{"status":"ok","version":"0.9.6","db":"up"|"down","cache":"up"|"down"|"unconfigured"}`. The `db` field reports DB reachability when `DATABASE_URL` is configured; the `cache` field reports Upstash reachability (`unconfigured` when `UPSTASH_REDIS_REST_URL` isn't set — perfectly fine for local dev, the in-process fallback covers it).
 Hit the analyzer: `curl http://localhost:8000/analyze/octocat`.
 
 ### Frontend (`:3000`)
diff --git a/backend/app/settings.py b/backend/app/settings.py
index 25b9af7..d758622 100644
--- a/backend/app/settings.py
+++ b/backend/app/settings.py
@@ -2,7 +2,7 @@
 
 from pydantic_settings import BaseSettings, SettingsConfigDict
 
-VERSION = "0.9.5"
+VERSION = "0.9.6"
 
 
 class Settings(BaseSettings):
diff --git a/backend/loadtest/README.md b/backend/loadtest/README.md
new file mode 100644
index 0000000..0ba90b8
--- /dev/null
+++ b/backend/loadtest/README.md
@@ -0,0 +1,92 @@
+# Load test harness
+
+Open-loop load tester for the backend. Drives a target endpoint at a fixed RPS
+and reports p50/p95/p99 latency, error rate, achieved throughput, and PASS/FAIL
+against thresholds. Run from `backend/`:
+
+```bash
+uv run python loadtest/run.py --help
+```
+
+> **Windows / Git Bash note:** a bare `--path /health` argument gets mangled by
+> MSYS into a Windows path (corrupting the URL). Prefix the command with
+> `MSYS_NO_PATHCONV=1`, or run it from PowerShell, or use `--path=//health`.
+
+## Quick sanity check (no Docker, no cache)
+
+Start the backend, then hit a cheap endpoint to confirm the harness works.
+Locally there's usually no `DATABASE_URL`, so `/health` blocks ~20 s per request
+on a doomed DB ping — use `/openapi.json` instead for a clean check:
+
+```bash
+uv run uvicorn app.main:app --port 8000   # terminal 1
+uv run python loadtest/run.py --target http://localhost:8000 --path /openapi.json \
+  --rps 10 --duration 5 --warmup 1 --p95-ms 1000   # terminal 2
+```
+
+Expect `errors=0` and `RESULT: PASS`.
+
+## Full warm-`/analyze` 100 RPS run (local)
+
+The warm path needs a populated Report cache. `get_cache()` returns `None`
+without Upstash, and real Upstash's free tier (~10k commands/day) can't absorb a
+100 RPS run — so use a **local** Upstash-compatible Redis via SRH.
+
+1. **Start a local Upstash-compatible Redis (SRH over Redis):**
+
+   ```bash
+   docker run -d --name si-redis -p 6379:6379 redis:7
+   docker run -d --name si-srh -p 8079:80 \
+     -e SRH_MODE=env -e SRH_TOKEN=local-token \
+     -e SRH_CONNECTION_STRING="redis://host.docker.internal:6379" \
+     hiett/serverless-redis-http:latest
+   ```
+
+2. **Start the backend pointed at SRH, with a real GitHub token, and the proxy
+   secret UNSET** (so the analyze limiter skips anonymous enforcement):
+
+   ```bash
+   UPSTASH_REDIS_REST_URL=http://localhost:8079 \
+   UPSTASH_REDIS_REST_TOKEN=local-token \
+   GITHUB_TOKEN=<your_token> \
+   uv run uvicorn app.main:app --port 8000
+   ```
+   (Ensure `INTERNAL_PROXY_SECRET` is **not** set in the environment, and send no
+   session cookie — the harness is anonymous by default.)
+
+3. **Run the load test** (the `--warmup` request cold-ingests once to prime the
+   cache; the timed run is then pure cache hits):
+
+   ```bash
+   MSYS_NO_PATHCONV=1 uv run python loadtest/run.py \
+     --target http://localhost:8000 --path /analyze/octocat \
+     --rps 100 --duration 60 --warmup 1
+   ```
+
+4. **Find the knee** with a ramp:
+
+   ```bash
+   MSYS_NO_PATHCONV=1 uv run python loadtest/run.py --path /analyze/octocat \
+     --ramp 50:100:200:400 --duration 30 --warmup 1
+   ```
+   Record the highest RPS stage that still PASSes (error rate < 1%, achieved
+   RPS ≥ 95% of target, p95 under `--p95-ms`).
+
+5. **Tear down:** `docker rm -f si-srh si-redis`.
+
+## Pointing at a deployed target
+
+```bash
+uv run python loadtest/run.py --target https://<host>/_/backend --path /analyze/octocat --rps 100 --duration 30
+```
+Mind the cost (Vercel Active-CPU) and rate limits: a deployed backend with
+`INTERNAL_PROXY_SECRET` set WILL rate-limit anonymous `/analyze` — sign in or
+raise the limits for the window. Keep deployed runs short.
+
+## Thresholds (PASS/FAIL)
+
+- error rate `< --max-error-rate` (default 1%)
+- achieved RPS `>= 95%` of `--rps`
+- p95 latency `< --p95-ms` (default 250 ms; tune from the warm baseline)
+
+Exit code is 0 on PASS, non-zero on FAIL.
diff --git a/backend/loadtest/__init__.py b/backend/loadtest/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/backend/loadtest/run.py b/backend/loadtest/run.py
new file mode 100644
index 0000000..77ee4a8
--- /dev/null
+++ b/backend/loadtest/run.py
@@ -0,0 +1,212 @@
+"""Open-loop load-test harness for the Skill Issue backend.
+
+Drives a target endpoint at a fixed request rate and reports latency
+percentiles, error rate, and achieved throughput against pass/fail
+thresholds. See backend/loadtest/README.md for the local warm-/analyze
+runbook. Run: uv run python loadtest/run.py --help
+"""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+import contextlib
+import sys
+import time
+from dataclasses import dataclass
+
+import httpx
+
+
+@dataclass
+class Result:
+    latency_ms: float
+    status: int | None  # None = connection error / timeout
+
+
+@dataclass
+class Summary:
+    sent: int
+    completed: int
+    dropped: int
+    error_count: int
+    error_rate: float
+    achieved_rps: float
+    p50_ms: float
+    p95_ms: float
+    p99_ms: float
+    errors_by_status: dict[str, int]
+    duration_s: float
+
+
+def percentile(values: list[float], p: float) -> float:
+    """Linear-interpolated p-th percentile (p in [0, 100]). 0.0 for empty input."""
+    if not values:
+        return 0.0
+    s = sorted(values)
+    if len(s) == 1:
+        return s[0]
+    k = (len(s) - 1) * (p / 100.0)
+    lo = int(k)
+    hi = min(lo + 1, len(s) - 1)
+    frac = k - lo
+    return round(s[lo] * (1 - frac) + s[hi] * frac, 10)
+
+
+def summarize(results: list[Result], *, dropped: int, wall_seconds: float) -> Summary:
+    completed = len(results)
+    latencies = [r.latency_ms for r in results]
+    errors_by_status: dict[str, int] = {}
+    error_count = 0
+    for r in results:
+        if r.status is None or r.status >= 400:
+            key = "connection_error" if r.status is None else str(r.status)
+            errors_by_status[key] = errors_by_status.get(key, 0) + 1
+            error_count += 1
+    error_rate = (error_count / completed) if completed else 1.0
+    achieved_rps = (completed / wall_seconds) if wall_seconds > 0 else 0.0
+    return Summary(
+        sent=completed + dropped,
+        completed=completed,
+        dropped=dropped,
+        error_count=error_count,
+        error_rate=error_rate,
+        achieved_rps=achieved_rps,
+        p50_ms=percentile(latencies, 50),
+        p95_ms=percentile(latencies, 95),
+        p99_ms=percentile(latencies, 99),
+        errors_by_status=errors_by_status,
+        duration_s=wall_seconds,
+    )
+
+
+def evaluate_thresholds(
+    summary: Summary, *, target_rps: float, max_error_rate: float, p95_ms: float
+) -> tuple[bool, list[str]]:
+    failures: list[str] = []
+    if summary.error_rate > max_error_rate:
+        failures.append(f"error rate {summary.error_rate:.3%} > {max_error_rate:.3%}")
+    if summary.achieved_rps < target_rps * 0.95:
+        failures.append(f"achieved RPS {summary.achieved_rps:.1f} < 95% of target {target_rps:.0f}")
+    if summary.p95_ms > p95_ms:
+        failures.append(f"p95 {summary.p95_ms:.1f}ms > {p95_ms:.1f}ms")
+    return (not failures, failures)
+
+
+async def _one_request(client: httpx.AsyncClient, url: str, results: list[Result]) -> None:
+    t0 = time.perf_counter()
+    try:
+        resp = await client.get(url)
+        status: int | None = resp.status_code
+    except (httpx.HTTPError, OSError):
+        status = None
+    results.append(Result(latency_ms=(time.perf_counter() - t0) * 1000.0, status=status))
+
+
+async def run_stage(
+    client: httpx.AsyncClient,
+    url: str,
+    *,
+    rps: float,
+    duration: float,
+    max_inflight: int,
+) -> tuple[list[Result], int]:
+    """Open-loop: schedule requests at a fixed rate for `duration` seconds.
+
+    Returns (results, dropped). `dropped` counts ticks skipped because
+    `max_inflight` was saturated — a "server can't keep up" signal. The
+    scheduler never blocks on in-flight requests, so a slow server shows up as
+    dropped ticks + latency growth rather than self-throttled load.
+    """
+    results: list[Result] = []
+    tasks: set[asyncio.Task[None]] = set()
+    inflight = 0
+    dropped = 0
+    interval = 1.0 / rps
+    loop = asyncio.get_running_loop()
+
+    def _done(t: asyncio.Task[None]) -> None:
+        nonlocal inflight
+        inflight -= 1
+        tasks.discard(t)
+
+    start = loop.time()
+    i = 0
+    while loop.time() - start < duration:
+        delay = (start + i * interval) - loop.time()
+        if delay > 0:
+            await asyncio.sleep(delay)
+        i += 1
+        if inflight >= max_inflight:
+            dropped += 1
+            continue
+        inflight += 1
+        task = asyncio.create_task(_one_request(client, url, results))
+        task.add_done_callback(_done)
+        tasks.add(task)
+
+    if tasks:
+        await asyncio.gather(*tasks, return_exceptions=True)
+    return results, dropped
+
+
+def _parse_ramp(ramp: str) -> list[float]:
+    """'10:50:100' -> [10.0, 50.0, 100.0]."""
+    return [float(part) for part in ramp.split(":") if part]
+
+
+def _print_summary(url: str, rps: float, s: Summary, ok: bool, failures: list[str]) -> None:
+    print(f"\n=== {url} @ target {rps:.0f} RPS for {s.duration_s:.1f}s ===")
+    print(f"  sent={s.sent} completed={s.completed} dropped={s.dropped}")
+    print(f"  achieved_rps={s.achieved_rps:.1f}")
+    print(f"  errors={s.error_count} ({s.error_rate:.3%}) {s.errors_by_status or ''}")
+    print(f"  latency p50={s.p50_ms:.1f}ms p95={s.p95_ms:.1f}ms p99={s.p99_ms:.1f}ms")
+    print(f"  RESULT: {'PASS' if ok else 'FAIL'}")
+    for f in failures:
+        print(f"    - {f}")
+
+
+async def _amain(args: argparse.Namespace) -> int:
+    url = args.target.rstrip("/") + args.path
+    cap = args.max_inflight + 50
+    limits = httpx.Limits(max_connections=cap, max_keepalive_connections=cap)
+    overall_ok = True
+    async with httpx.AsyncClient(timeout=args.timeout, limits=limits) as client:
+        for _ in range(args.warmup):
+            with contextlib.suppress(httpx.HTTPError, OSError):
+                await client.get(url)
+        stages = _parse_ramp(args.ramp) if args.ramp else [args.rps]
+        for rps in stages:
+            wall0 = time.perf_counter()
+            results, dropped = await run_stage(
+                client, url, rps=rps, duration=args.duration, max_inflight=args.max_inflight
+            )
+            summary = summarize(results, dropped=dropped, wall_seconds=time.perf_counter() - wall0)
+            ok, failures = evaluate_thresholds(
+                summary,
+                target_rps=rps,
+                max_error_rate=args.max_error_rate,
+                p95_ms=args.p95_ms,
+            )
+            _print_summary(url, rps, summary, ok, failures)
+            overall_ok = overall_ok and ok
+    return 0 if overall_ok else 1
+
+
+def main() -> None:
+    p = argparse.ArgumentParser(description="Open-loop load tester for the Skill Issue backend.")
+    p.add_argument("--target", default="http://localhost:8000")
+    p.add_argument("--path", default="/analyze/octocat")
+    p.add_argument("--rps", type=float, default=100.0)
+    p.add_argument("--duration", type=float, default=60.0)
+    p.add_argument("--warmup", type=int, default=1)
+    p.add_argument("--ramp", default=None, help="colon-separated RPS stages, e.g. 10:50:100")
+    p.add_argument("--max-inflight", type=int, default=500, dest="max_inflight")
+    p.add_argument("--timeout", type=float, default=20.0)
+    p.add_argument("--p95-ms", type=float, default=250.0, dest="p95_ms")
+    p.add_argument("--max-error-rate", type=float, default=0.01, dest="max_error_rate")
+    sys.exit(asyncio.run(_amain(p.parse_args())))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/backend/pyproject.toml b/backend/pyproject.toml
index 8418b8c..7e2f000 100644
--- a/backend/pyproject.toml
+++ b/backend/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "skill-issue-backend"
-version = "0.9.5"
+version = "0.9.6"
 description = "Skill Issue backend — FastAPI service that ingests a GitHub profile and returns a deterministic engineering report."
 readme = "README.md"
 authors = [
diff --git a/backend/tests/loadtest/__init__.py b/backend/tests/loadtest/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/backend/tests/loadtest/test_stats.py b/backend/tests/loadtest/test_stats.py
new file mode 100644
index 0000000..3840a26
--- /dev/null
+++ b/backend/tests/loadtest/test_stats.py
@@ -0,0 +1,49 @@
+from loadtest.run import Result, evaluate_thresholds, percentile, summarize
+
+
+def test_percentile_linear_interpolation():
+    vals = [float(x) for x in range(1, 11)]  # 1..10
+    assert percentile(vals, 50) == 5.5
+    assert percentile(vals, 95) == 9.55
+    assert percentile(vals, 99) == 9.91
+
+
+def test_percentile_empty_is_zero():
+    assert percentile([], 95) == 0.0
+
+
+def test_percentile_single_value():
+    assert percentile([7.0], 95) == 7.0
+
+
+def test_summarize_counts_errors_and_rate():
+    results = [
+        Result(10.0, 200),
+        Result(20.0, 200),
+        Result(30.0, 500),
+        Result(40.0, None),
+    ]
+    s = summarize(results, dropped=1, wall_seconds=2.0)
+    assert s.completed == 4
+    assert s.sent == 5
+    assert s.dropped == 1
+    assert s.error_count == 2
+    assert s.error_rate == 0.5
+    assert s.errors_by_status == {"500": 1, "connection_error": 1}
+    assert s.achieved_rps == 2.0  # 4 completed / 2.0s
+
+
+def test_evaluate_thresholds_all_pass():
+    s = summarize([Result(10.0, 200)] * 100, dropped=0, wall_seconds=1.0)
+    ok, failures = evaluate_thresholds(s, target_rps=100, max_error_rate=0.01, p95_ms=250)
+    assert ok
+    assert failures == []
+
+
+def test_evaluate_thresholds_flags_all_three():
+    s = summarize(
+        [Result(500.0, 500)] * 50, dropped=0, wall_seconds=5.0
+    )  # 10 rps, all errors, slow
+    ok, failures = evaluate_thresholds(s, target_rps=100, max_error_rate=0.01, p95_ms=250)
+    assert not ok
+    assert len(failures) == 3  # error rate + achieved rps + p95
diff --git a/backend/uv.lock b/backend/uv.lock
index 86305b7..dde96b9 100644
--- a/backend/uv.lock
+++ b/backend/uv.lock
@@ -906,7 +906,7 @@ fastapi = [
 
 [[package]]
 name = "skill-issue-backend"
-version = "0.9.5"
+version = "0.9.6"
 source = { virtual = "." }
 dependencies = [
     { name = "alembic" },
diff --git a/docs/PROGRESS_LOG.md b/docs/PROGRESS_LOG.md
index eb0a9a1..61eb2eb 100644
--- a/docs/PROGRESS_LOG.md
+++ b/docs/PROGRESS_LOG.md
@@ -19,6 +19,36 @@ Format:
 
 ---
 
+## 2026-05-28 — Claude (Opus 4.7) — v0.9.6 shipped (load-test harness)
+
+**Slice:** v0.9.6. Reusable backend load-test harness + runbook; the full 100 RPS validation run is an operator step (hardware-gated). Split from the original v0.9.5 "security review + load test"; legal docs are now v0.9.7.
+
+**Done:**
+- **`backend/loadtest/run.py`** — open-loop (fixed-rate) async load generator (httpx, already a dep). Pure helpers `percentile`/`summarize`/`evaluate_thresholds` + `Result`/`Summary` dataclasses; async `run_stage` dispatcher (rate-paced, bounded-concurrency with a `dropped` saturation counter); `_parse_ramp`, `_print_summary`, argparse CLI (`--target/--path/--rps/--duration/--warmup/--ramp/--max-inflight/--timeout/--p95-ms/--max-error-rate`). Exit 0 PASS / non-zero FAIL.
+- **6 unit tests** for the stats helpers (`backend/tests/loadtest/test_stats.py`) — deterministic, no network. Backend non-DB suite 284 → 290.
+- **`backend/loadtest/README.md`** runbook — local SRH (Docker) warm-cache setup, prime-then-measure, ramp-to-find-knee, point-at-deploy, and the Windows/Git-Bash `MSYS_NO_PATHCONV=1` gotcha.
+- Docs ritual + version bump to 0.9.6.
+
+**Decisions:**
+- **Open-loop over closed-loop** — a closed-loop (await-then-send) generator self-throttles and masks saturation; open-loop dispatches at a fixed rate so a slow server shows as latency/error/`dropped` growth.
+- **Local SRH for the warm cache** — `get_cache()` has no in-process Report-cache fallback, so the warm path needs an Upstash-compatible endpoint; real Upstash's ~10k/day free tier can't absorb a 100 RPS run, so SRH (real Redis over Docker) is the only viable local option.
+- **No rate-limit bypass needed** — anonymous load + unset `INTERNAL_PROXY_SECRET` makes the analyze limiter skip enforcement (existing behavior), so the warm test needs no limit-raising or bypass code. Zero application-code change in this slice.
+- **Build + sanity now, full run deferred** — per the user's hardware constraint (localhost previously overheated the laptop). The harness is the durable deliverable; the headline 100 RPS number is the operator's to record.
+
+**Learned / surprises:**
+- **Locally `/health` blocks ~20 s/request** when `DATABASE_URL` is unset (the startup-placeholder DB ping times out per request). Used `/openapi.json` for the clean sanity run instead. Not a prod issue (prod has a DB).
+- **Git-Bash mangles a bare `--path /health`** into a Windows path via MSYS, corrupting the URL (`Invalid port: '8000C:'`). `MSYS_NO_PATHCONV=1` fixes it — noted in the runbook. (Surfaced a real edge: a malformed URL raises `httpx.InvalidURL`, which is *not* an `HTTPError`, so it isn't caught per-request — acceptable, since a bad `--target/--path` is operator error that should fail loudly.)
+
+**Verified:**
+- Backend `ruff` clean; `pytest` (stats tests 6/6; full non-DB suite 290 expected). Frontend unchanged (54 vitest).
+- Harness sanity run (controller, backend-only): `/openapi.json` 10 RPS × 5 s → 51 completed, **0 errors**, p50 3.0 ms / p95 6.2 ms, achieved 10.2 RPS, **PASS**, exit 0.
+
+**Blocked / open:** full 100 RPS warm-`/analyze` run is the operator's (Docker/SRH + GITHUB_TOKEN); result to be appended when run.
+
+**Next:** v0.9.7 — privacy policy + terms (legal docs).
+
+---
+
 ## 2026-05-28 — Claude (Opus 4.7) — v0.9.5 shipped (pre-launch security audit + hardening)
 
 **Slice:** v0.9.5. Full pre-launch security audit of the whole app + two Medium hardening fixes. The load test originally bundled here was split to v0.9.6 (needs target/cost/rate-limit design); legal docs shifted to v0.9.7.
diff --git a/docs/superpowers/plans/2026-05-28-v0.9.6-load-test-harness.md b/docs/superpowers/plans/2026-05-28-v0.9.6-load-test-harness.md
new file mode 100644
index 0000000..f57f686
--- /dev/null
+++ b/docs/superpowers/plans/2026-05-28-v0.9.6-load-test-harness.md
@@ -0,0 +1,589 @@
+# v0.9.6 — Load-test harness Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Ship a reusable, parameterized Python/httpx open-loop load-test harness for the backend warm `/analyze` path, with unit-tested stats, a runbook, and a light sanity run — the full 100 RPS validation is an operator step.
+
+**Architecture:** A standalone async script `backend/loadtest/run.py` with I/O-free stats helpers (unit-tested) and an open-loop dispatcher that fires at a fixed RPS, collects per-request latency/status, and reports percentiles + error rate + achieved throughput against pass/fail thresholds. No application code changes.
+
+**Tech Stack:** Python 3.12, asyncio, httpx (already a backend dep), pytest, argparse.
+
+**Spec:** [`docs/superpowers/specs/2026-05-28-v0.9.6-load-test-harness-design.md`](../specs/2026-05-28-v0.9.6-load-test-harness-design.md).
+
+---
+
+## File structure
+
+| File | Responsibility | Action |
+| --- | --- | --- |
+| `backend/loadtest/__init__.py` | Make `loadtest` an importable package | Create (empty) |
+| `backend/loadtest/run.py` | Harness: stats helpers + async dispatcher + CLI | Create |
+| `backend/loadtest/README.md` | Runbook (local SRH warm-cache run + point-at-deploy) | Create |
+| `backend/tests/loadtest/__init__.py` | Test package marker | Create (empty) |
+| `backend/tests/loadtest/test_stats.py` | Unit tests for the pure stats helpers | Create |
+| Version literals + CHANGELOG + PLAN + PROGRESS_LOG + `uv.lock` | Release ritual (0.9.5 → 0.9.6) | Modify |
+
+All commands run from `backend/` (the uv project root) unless noted.
+
+---
+
+### Task 1: Pure stats helpers + unit tests (TDD)
+
+**Files:**
+- Create: `backend/loadtest/__init__.py`, `backend/tests/loadtest/__init__.py`
+- Create: `backend/loadtest/run.py` (helpers portion only this task)
+- Test: `backend/tests/loadtest/test_stats.py`
+
+- [ ] **Step 1: Create the package markers**
+
+Create `backend/loadtest/__init__.py` and `backend/tests/loadtest/__init__.py` both as **empty files**.
+
+- [ ] **Step 2: Write the failing tests**
+
+Create `backend/tests/loadtest/test_stats.py`:
+
+```python
+from loadtest.run import Result, evaluate_thresholds, percentile, summarize
+
+
+def test_percentile_linear_interpolation():
+    vals = [float(x) for x in range(1, 11)]  # 1..10
+    assert percentile(vals, 50) == 5.5
+    assert percentile(vals, 95) == 9.55
+    assert percentile(vals, 99) == 9.91
+
+
+def test_percentile_empty_is_zero():
+    assert percentile([], 95) == 0.0
+
+
+def test_percentile_single_value():
+    assert percentile([7.0], 95) == 7.0
+
+
+def test_summarize_counts_errors_and_rate():
+    results = [
+        Result(10.0, 200),
+        Result(20.0, 200),
+        Result(30.0, 500),
+        Result(40.0, None),
+    ]
+    s = summarize(results, dropped=1, wall_seconds=2.0)
+    assert s.completed == 4
+    assert s.sent == 5
+    assert s.dropped == 1
+    assert s.error_count == 2
+    assert s.error_rate == 0.5
+    assert s.errors_by_status == {"500": 1, "connection_error": 1}
+    assert s.achieved_rps == 2.0  # 4 completed / 2.0s
+
+
+def test_evaluate_thresholds_all_pass():
+    s = summarize([Result(10.0, 200)] * 100, dropped=0, wall_seconds=1.0)
+    ok, failures = evaluate_thresholds(s, target_rps=100, max_error_rate=0.01, p95_ms=250)
+    assert ok
+    assert failures == []
+
+
+def test_evaluate_thresholds_flags_all_three():
+    s = summarize([Result(500.0, 500)] * 50, dropped=0, wall_seconds=5.0)  # 10 rps, all errors, slow
+    ok, failures = evaluate_thresholds(s, target_rps=100, max_error_rate=0.01, p95_ms=250)
+    assert not ok
+    assert len(failures) == 3  # error rate + achieved rps + p95
+```
+
+- [ ] **Step 3: Run the tests, verify they fail**
+
+Run: `uv run pytest tests/loadtest/test_stats.py -q`
+Expected: FAIL — `ModuleNotFoundError: No module named 'loadtest'` (or import error for the helpers).
+
+- [ ] **Step 4: Write the helpers**
+
+Create `backend/loadtest/run.py` with exactly this content (the async/CLI portion is added in Task 2):
+
+```python
+"""Open-loop load-test harness for the Skill Issue backend.
+
+Drives a target endpoint at a fixed request rate and reports latency
+percentiles, error rate, and achieved throughput against pass/fail
+thresholds. See backend/loadtest/README.md for the local warm-/analyze
+runbook. Run: uv run python loadtest/run.py --help
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+
+@dataclass
+class Result:
+    latency_ms: float
+    status: int | None  # None = connection error / timeout
+
+
+@dataclass
+class Summary:
+    sent: int
+    completed: int
+    dropped: int
+    error_count: int
+    error_rate: float
+    achieved_rps: float
+    p50_ms: float
+    p95_ms: float
+    p99_ms: float
+    errors_by_status: dict[str, int]
+    duration_s: float
+
+
+def percentile(values: list[float], p: float) -> float:
+    """Linear-interpolated p-th percentile (p in [0, 100]). 0.0 for empty input."""
+    if not values:
+        return 0.0
+    s = sorted(values)
+    if len(s) == 1:
+        return s[0]
+    k = (len(s) - 1) * (p / 100.0)
+    lo = int(k)
+    hi = min(lo + 1, len(s) - 1)
+    frac = k - lo
+    return s[lo] * (1 - frac) + s[hi] * frac
+
+
+def summarize(results: list[Result], *, dropped: int, wall_seconds: float) -> Summary:
+    completed = len(results)
+    latencies = [r.latency_ms for r in results]
+    errors_by_status: dict[str, int] = {}
+    error_count = 0
+    for r in results:
+        if r.status is None or r.status >= 400:
+            key = "connection_error" if r.status is None else str(r.status)
+            errors_by_status[key] = errors_by_status.get(key, 0) + 1
+            error_count += 1
+    error_rate = (error_count / completed) if completed else 1.0
+    achieved_rps = (completed / wall_seconds) if wall_seconds > 0 else 0.0
+    return Summary(
+        sent=completed + dropped,
+        completed=completed,
+        dropped=dropped,
+        error_count=error_count,
+        error_rate=error_rate,
+        achieved_rps=achieved_rps,
+        p50_ms=percentile(latencies, 50),
+        p95_ms=percentile(latencies, 95),
+        p99_ms=percentile(latencies, 99),
+        errors_by_status=errors_by_status,
+        duration_s=wall_seconds,
+    )
+
+
+def evaluate_thresholds(
+    summary: Summary, *, target_rps: float, max_error_rate: float, p95_ms: float
+) -> tuple[bool, list[str]]:
+    failures: list[str] = []
+    if summary.error_rate > max_error_rate:
+        failures.append(f"error rate {summary.error_rate:.3%} > {max_error_rate:.3%}")
+    if summary.achieved_rps < target_rps * 0.95:
+        failures.append(
+            f"achieved RPS {summary.achieved_rps:.1f} < 95% of target {target_rps:.0f}"
+        )
+    if summary.p95_ms > p95_ms:
+        failures.append(f"p95 {summary.p95_ms:.1f}ms > {p95_ms:.1f}ms")
+    return (not failures, failures)
+```
+
+- [ ] **Step 5: Run the tests, verify they pass**
+
+Run: `uv run pytest tests/loadtest/test_stats.py -q`
+Expected: `6 passed`.
+
+- [ ] **Step 6: Lint**
+
+Run: `uv run ruff check loadtest tests/loadtest && uv run ruff format --check loadtest tests/loadtest`
+Expected: clean.
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add backend/loadtest/__init__.py backend/loadtest/run.py backend/tests/loadtest/__init__.py backend/tests/loadtest/test_stats.py
+git commit -m "feat(v0.9.6): load-test stats helpers (percentile/summarize/thresholds) + tests"
+```
+
+---
+
+### Task 2: Async open-loop dispatcher + CLI
+
+**Files:**
+- Modify: `backend/loadtest/run.py` (append the I/O + CLI section)
+
+- [ ] **Step 1: Append the async dispatcher, ramp parser, printer, and CLI**
+
+Add these imports to the **top** of `backend/loadtest/run.py` — change the existing header block so the imports read:
+
+```python
+from __future__ import annotations
+
+import argparse
+import asyncio
+import sys
+import time
+from dataclasses import dataclass
+
+import httpx
+```
+
+Then **append** the following to the end of `backend/loadtest/run.py`:
+
+```python
+async def _one_request(client: httpx.AsyncClient, url: str, results: list[Result]) -> None:
+    t0 = time.perf_counter()
+    try:
+        resp = await client.get(url)
+        status: int | None = resp.status_code
+    except (httpx.HTTPError, OSError):
+        status = None
+    results.append(Result(latency_ms=(time.perf_counter() - t0) * 1000.0, status=status))
+
+
+async def run_stage(
+    client: httpx.AsyncClient,
+    url: str,
+    *,
+    rps: float,
+    duration: float,
+    max_inflight: int,
+) -> tuple[list[Result], int]:
+    """Open-loop: schedule requests at a fixed rate for `duration` seconds.
+
+    Returns (results, dropped). `dropped` counts ticks skipped because
+    `max_inflight` was saturated — a "server can't keep up" signal. The
+    scheduler never blocks on in-flight requests, so a slow server shows up as
+    dropped ticks + latency growth rather than self-throttled load.
+    """
+    results: list[Result] = []
+    tasks: set[asyncio.Task[None]] = set()
+    inflight = 0
+    dropped = 0
+    interval = 1.0 / rps
+    loop = asyncio.get_running_loop()
+
+    def _done(t: asyncio.Task[None]) -> None:
+        nonlocal inflight
+        inflight -= 1
+        tasks.discard(t)
+
+    start = loop.time()
+    i = 0
+    while loop.time() - start < duration:
+        delay = (start + i * interval) - loop.time()
+        if delay > 0:
+            await asyncio.sleep(delay)
+        i += 1
+        if inflight >= max_inflight:
+            dropped += 1
+            continue
+        inflight += 1
+        task = asyncio.create_task(_one_request(client, url, results))
+        task.add_done_callback(_done)
+        tasks.add(task)
+
+    if tasks:
+        await asyncio.gather(*tasks, return_exceptions=True)
+    return results, dropped
+
+
+def _parse_ramp(ramp: str) -> list[float]:
+    """'10:50:100' -> [10.0, 50.0, 100.0]."""
+    return [float(part) for part in ramp.split(":") if part]
+
+
+def _print_summary(url: str, rps: float, s: Summary, ok: bool, failures: list[str]) -> None:
+    print(f"\n=== {url} @ target {rps:.0f} RPS for {s.duration_s:.1f}s ===")
+    print(f"  sent={s.sent} completed={s.completed} dropped={s.dropped}")
+    print(f"  achieved_rps={s.achieved_rps:.1f}")
+    print(f"  errors={s.error_count} ({s.error_rate:.3%}) {s.errors_by_status or ''}")
+    print(f"  latency p50={s.p50_ms:.1f}ms p95={s.p95_ms:.1f}ms p99={s.p99_ms:.1f}ms")
+    print(f"  RESULT: {'PASS' if ok else 'FAIL'}")
+    for f in failures:
+        print(f"    - {f}")
+
+
+async def _amain(args: argparse.Namespace) -> int:
+    url = args.target.rstrip("/") + args.path
+    cap = args.max_inflight + 50
+    limits = httpx.Limits(max_connections=cap, max_keepalive_connections=cap)
+    overall_ok = True
+    async with httpx.AsyncClient(timeout=args.timeout, limits=limits) as client:
+        for _ in range(args.warmup):
+            try:
+                await client.get(url)
+            except (httpx.HTTPError, OSError):
+                pass
+        stages = _parse_ramp(args.ramp) if args.ramp else [args.rps]
+        for rps in stages:
+            wall0 = time.perf_counter()
+            results, dropped = await run_stage(
+                client, url, rps=rps, duration=args.duration, max_inflight=args.max_inflight
+            )
+            summary = summarize(results, dropped=dropped, wall_seconds=time.perf_counter() - wall0)
+            ok, failures = evaluate_thresholds(
+                summary,
+                target_rps=rps,
+                max_error_rate=args.max_error_rate,
+                p95_ms=args.p95_ms,
+            )
+            _print_summary(url, rps, summary, ok, failures)
+            overall_ok = overall_ok and ok
+    return 0 if overall_ok else 1
+
+
+def main() -> None:
+    p = argparse.ArgumentParser(description="Open-loop load tester for the Skill Issue backend.")
+    p.add_argument("--target", default="http://localhost:8000")
+    p.add_argument("--path", default="/analyze/octocat")
+    p.add_argument("--rps", type=float, default=100.0)
+    p.add_argument("--duration", type=float, default=60.0)
+    p.add_argument("--warmup", type=int, default=1)
+    p.add_argument("--ramp", default=None, help="colon-separated RPS stages, e.g. 10:50:100")
+    p.add_argument("--max-inflight", type=int, default=500, dest="max_inflight")
+    p.add_argument("--timeout", type=float, default=20.0)
+    p.add_argument("--p95-ms", type=float, default=250.0, dest="p95_ms")
+    p.add_argument("--max-error-rate", type=float, default=0.01, dest="max_error_rate")
+    sys.exit(asyncio.run(_amain(p.parse_args())))
+
+
+if __name__ == "__main__":
+    main()
+```
+
+- [ ] **Step 2: Verify the CLI loads and the stats tests still pass**
+
+Run: `uv run python loadtest/run.py --help`
+Expected: argparse help text listing `--target`, `--rps`, `--ramp`, etc. (exit 0).
+
+Run: `uv run pytest tests/loadtest/test_stats.py -q`
+Expected: `6 passed` (the appended I/O code didn't break the helpers).
+
+- [ ] **Step 3: Lint**
+
+Run: `uv run ruff check loadtest && uv run ruff format --check loadtest`
+Expected: clean. (If ruff reformats, run `uv run ruff format loadtest` and re-check.)
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add backend/loadtest/run.py
+git commit -m "feat(v0.9.6): open-loop async dispatcher + CLI for the load harness"
+```
+
+---
+
+### Task 3: Runbook
+
+**Files:**
+- Create: `backend/loadtest/README.md`
+
+- [ ] **Step 1: Write the runbook**
+
+Create `backend/loadtest/README.md`:
+
+````markdown
+# Load test harness
+
+Open-loop load tester for the backend. Drives a target endpoint at a fixed RPS
+and reports p50/p95/p99 latency, error rate, achieved throughput, and PASS/FAIL
+against thresholds. Run from `backend/`:
+
+```bash
+uv run python loadtest/run.py --help
+```
+
+## Quick sanity check (no Docker, no cache)
+
+Start the backend, then hit the cheap `/health` endpoint to confirm the harness
+works:
+
+```bash
+uv run uvicorn app.main:app --port 8000   # terminal 1
+uv run python loadtest/run.py --target http://localhost:8000 --path /health \
+  --rps 10 --duration 5 --warmup 0 --p95-ms 1000   # terminal 2
+```
+
+## Full warm-`/analyze` 100 RPS run (local)
+
+The warm path needs a populated Report cache. `get_cache()` returns `None`
+without Upstash, and real Upstash's free tier (~10k commands/day) can't absorb a
+100 RPS run — so use a **local** Upstash-compatible Redis via SRH.
+
+1. **Start a local Upstash-compatible Redis (SRH over Redis):**
+
+   ```bash
+   docker run -d --name si-redis -p 6379:6379 redis:7
+   docker run -d --name si-srh -p 8079:80 \
+     -e SRH_MODE=env -e SRH_TOKEN=local-token \
+     -e SRH_CONNECTION_STRING="redis://host.docker.internal:6379" \
+     hiett/serverless-redis-http:latest
+   ```
+
+2. **Start the backend pointed at SRH, with a real GitHub token, and the proxy
+   secret UNSET** (so the analyze limiter skips anonymous enforcement):
+
+   ```bash
+   UPSTASH_REDIS_REST_URL=http://localhost:8079 \
+   UPSTASH_REDIS_REST_TOKEN=local-token \
+   GITHUB_TOKEN=<your_token> \
+   uv run uvicorn app.main:app --port 8000
+   ```
+   (Ensure `INTERNAL_PROXY_SECRET` is **not** set in the environment.)
+
+3. **Run the load test** (the `--warmup` request cold-ingests once to prime the
+   cache; the timed run is then pure cache hits):
+
+   ```bash
+   uv run python loadtest/run.py \
+     --target http://localhost:8000 --path /analyze/octocat \
+     --rps 100 --duration 60 --warmup 1
+   ```
+
+4. **Find the knee** with a ramp:
+
+   ```bash
+   uv run python loadtest/run.py --path /analyze/octocat \
+     --ramp 50:100:200:400 --duration 30 --warmup 1
+   ```
+   Record the highest RPS stage that still PASSes (error rate < 1%, achieved
+   RPS ≥ 95% of target, p95 under `--p95-ms`).
+
+5. **Tear down:** `docker rm -f si-srh si-redis`.
+
+## Pointing at a deployed target
+
+```bash
+uv run python loadtest/run.py --target https://<host>/_/backend --path /analyze/octocat --rps 100 --duration 30
+```
+Mind the cost (Vercel Active-CPU) and rate limits: a deployed backend with
+`INTERNAL_PROXY_SECRET` set WILL rate-limit anonymous `/analyze` — sign-in or
+raise the limits for the window. Keep deployed runs short.
+
+## Thresholds (PASS/FAIL)
+
+- error rate `< --max-error-rate` (default 1%)
+- achieved RPS `>= 95%` of `--rps`
+- p95 latency `< --p95-ms` (default 250 ms; tune from the warm baseline)
+
+Exit code is 0 on PASS, non-zero on FAIL.
+````
+
+- [ ] **Step 2: Commit**
+
+```bash
+git add backend/loadtest/README.md
+git commit -m "docs(v0.9.6): load-test runbook (local SRH warm-cache + deploy target)"
+```
+
+---
+
+### Task 4: Sanity run + docs ritual + ship
+
+**Files:**
+- Modify: `backend/pyproject.toml:3`, `backend/app/settings.py:5`, `frontend/package.json:3`, `frontend/src/app/page.tsx:26`, `frontend/src/components/results-view.tsx:355`, `README.md` (status lead + running list + health curl), `CHANGELOG.md`, `PLAN.md` (v0.9.6 row + section), `docs/PROGRESS_LOG.md`, `backend/uv.lock`
+
+- [ ] **Step 1: Light sanity run against `/health`**
+
+In terminal 1: `cd backend && uv run uvicorn app.main:app --port 8000`
+In terminal 2:
+```bash
+cd backend && uv run python loadtest/run.py --target http://localhost:8000 --path /health --rps 10 --duration 5 --warmup 0 --p95-ms 1000
+```
+Expected: a summary block with `completed≈50`, `errors=0`, `RESULT: PASS`. Stop the server afterward.
+(If the execution environment cannot run uvicorn, skip this step and note it — the stats unit tests + `--help` already prove the harness mechanics; the full run is the operator's per the spec.)
+
+- [ ] **Step 2: Bump backend version literals**
+
+`backend/pyproject.toml` line 3: `version = "0.9.5"` → `"0.9.6"`.
+`backend/app/settings.py` line 5: `VERSION = "0.9.5"` → `"0.9.6"`.
+
+- [ ] **Step 3: Bump frontend version literals**
+
+`frontend/package.json` line 3: `"version": "0.9.5",` → `"0.9.6",`.
+`frontend/src/app/page.tsx` line 26: `... · v0.9.5` → `v0.9.6`.
+`frontend/src/components/results-view.tsx` line 355: `... Protocol v0.9.5` → `v0.9.6`.
+
+- [ ] **Step 4: Update README**
+
+In `README.md` line 46 (status paragraph): change the lead `Latest shipped release is **v0.9.5** (...)` to name **v0.9.6** with a one-line description (a reusable backend load-test harness), demote the v0.9.5 clause to "before it," and move the trailing "next" pointer from `**v0.9.6 — load test to 100 RPS** is next.` to `v0.9.6 adds a reusable load-test harness for the warm /analyze path (the full 100 RPS run is an operator step). **v0.9.7 — privacy policy + terms** is next.` Also bump the health-curl `"version":"0.9.5"` → `"0.9.6"` (line ~79).
+
+- [ ] **Step 5: Add the CHANGELOG entry**
+
+In `CHANGELOG.md`, insert directly below the header preamble `---` and above `## [0.9.5]`:
+
+```markdown
+## [0.9.6] — 2026-05-28
+
+### Added
+- **Load-test harness** (`backend/loadtest/`) — a reusable open-loop load tester for the backend warm `/analyze` path, reporting latency percentiles, error rate, and achieved throughput against pass/fail thresholds. Includes a runbook for a local 100 RPS warm-cache run and for pointing at a deployed target.
+
+---
+```
+
+- [ ] **Step 6: Update PLAN.md**
+
+Version-map row: `| **v0.9.6** | Load test to 100 RPS | pending |` → `| **v0.9.6** | Load-test harness (warm /analyze; full 100 RPS run = operator step) | ✅ shipped |`.
+
+Replace the `## v0.9.6 — Load test to 100 RPS (deferred)` section body with:
+
+```markdown
+## v0.9.6 — Load-test harness (shipped 2026-05-28)
+
+**Goal:** Reusable Python/httpx open-loop load harness for the backend warm `/analyze` path; the full 100 RPS validation run is an operator step (hardware-gated).
+
+**Delivered:** `backend/loadtest/run.py` (open-loop dispatcher, p50/p95/p99, error rate, achieved RPS, pass/fail thresholds, ramp), unit-tested stats helpers, and `backend/loadtest/README.md` runbook (local SRH warm-cache setup + deploy target). Local warm-cache uses SRH (Upstash-compatible Redis over Docker) — real Upstash's ~10k/day free tier can't absorb a 100 RPS run. Anonymous load + unset `INTERNAL_PROXY_SECRET` means the analyze limiter skips enforcement, so no bypass is needed.
+
+**Design spec:** [`docs/superpowers/specs/2026-05-28-v0.9.6-load-test-harness-design.md`](./docs/superpowers/specs/2026-05-28-v0.9.6-load-test-harness-design.md).
+**Sub-plan:** [`docs/superpowers/plans/2026-05-28-v0.9.6-load-test-harness.md`](./docs/superpowers/plans/2026-05-28-v0.9.6-load-test-harness.md).
+
+**Exit criteria:**
+- [x] `loadtest/run.py` + unit-tested stats helpers; ruff clean; backend suite green.
+- [x] Runbook complete (local SRH + deploy target).
+- [x] Light `/health` sanity run passes.
+- [x] Docs ritual + version bump to 0.9.6; tag + release.
+- [ ] Full 100 RPS warm-`/analyze` result recorded — operator step, filled in when run.
+```
+
+- [ ] **Step 7: Re-sync `uv.lock`**
+
+Run: `cd backend && uv lock`
+Expected: only `skill-issue-backend 0.9.5 → 0.9.6` changes.
+
+- [ ] **Step 8: Add the PROGRESS_LOG entry**
+
+In `docs/PROGRESS_LOG.md`, add a new top entry following the file's format: header `## 2026-05-28 — Claude (Opus 4.7) — v0.9.6 shipped (load-test harness)`; Slice v0.9.6; Done (harness + stats tests + runbook + sanity run; result of the sanity run); Decisions (open-loop over closed-loop; local SRH for warm cache since real Upstash free tier too small; anon+unset-secret skips the analyze limiter so no bypass; build+sanity now, full 100 RPS run deferred to operator per the laptop constraint); Verified (ruff + pytest count, `--help`, sanity run PASS); Blocked/open (full 100 RPS run is the operator's); Next (v0.9.7 legal docs).
+
+- [ ] **Step 9: Full verification**
+
+Run: `cd backend && uv run pytest -q --no-header && uv run ruff check . && uv run ruff format --check .`
+Expected: suite passes (284 + 6 new = 290 non-DB pass), ruff clean.
+
+Run: `cd frontend && npm run lint && npx tsc --noEmit && npm run test:run && npm run build`
+Expected: lint/tsc clean, vitest 54 passed, build succeeds.
+
+- [ ] **Step 10: Commit**
+
+```bash
+git add backend/pyproject.toml backend/app/settings.py frontend/package.json frontend/src/app/page.tsx frontend/src/components/results-view.tsx README.md CHANGELOG.md PLAN.md docs/PROGRESS_LOG.md backend/uv.lock
+git commit -m "chore(v0.9.6): bump version + docs ritual (load-test harness)"
+```
+
+- [ ] **Step 11: Push, PR, CI, merge, tag (confirm before tag)**
+
+Push the branch, open a PR, wait for CI green, merge to `main`. Prod smoke `/health` → `version: 0.9.6`. Then **pause for user confirmation** before `git tag v0.9.6 && git push origin v0.9.6`.
+
+---
+
+## Self-review notes
+
+- **Spec coverage:** harness §4/§5.1 → Tasks 1–2; thresholds §7 → `evaluate_thresholds` (Task 1) + CLI flags (Task 2); runbook §5.1 → Task 3; sanity run + deferral §8 → Task 4 Step 1; exit criteria §9 → Task 4. All covered.
+- **Placeholder scan:** none — all code is complete, including the full `run.py` and `test_stats.py`.
+- **Type consistency:** `Result(latency_ms, status)`, `Summary` fields, and `percentile`/`summarize(*, dropped, wall_seconds)`/`evaluate_thresholds(*, target_rps, max_error_rate, p95_ms)` signatures are identical across Tasks 1, 2, and the tests.
+- **No app code change:** the harness is a black-box HTTP client; it relies on the existing rate-limiter-skip behavior, doesn't modify it.
+- **Test count:** backend non-DB suite 284 → 290 (+6 stats tests). Frontend unchanged at 54.
diff --git a/docs/superpowers/specs/2026-05-28-v0.9.6-load-test-harness-design.md b/docs/superpowers/specs/2026-05-28-v0.9.6-load-test-harness-design.md
new file mode 100644
index 0000000..285f434
--- /dev/null
+++ b/docs/superpowers/specs/2026-05-28-v0.9.6-load-test-harness-design.md
@@ -0,0 +1,119 @@
+# v0.9.6 — Local warm-`/analyze` load-test harness design spec
+
+**Status:** Designed. Implementation plan to follow under `docs/superpowers/plans/`.
+**Date:** 2026-05-28.
+**Author:** Claude (Opus 4.7) with Shaan.
+
+---
+
+## 1. Goal
+
+Deliver a reusable, parameterized load-test harness that drives the backend warm `/analyze/<user>` path at a target RPS (default 100) and reports whether the system sustains it within an error/latency budget. The harness is the durable deliverable; the actual 100 RPS validation run is an operator step (the user runs it when hardware allows, or points the harness at a deployed target).
+
+This is the load-test slice of the v0.9.x Beta-hardening family. It is independently shippable. (The slice was split out of the original v0.9.5 "security review + load test" on 2026-05-28 because it needs its own target/cost/rate-limit design.)
+
+## 2. Locked scope decisions (2026-05-28)
+
+| Decision | Choice | Why |
+| --- | --- | --- |
+| Target / cost | **Local, free** | No Vercel Active-CPU spend, no GitHub-budget burn, no real-user risk. Harness is target-URL-parameterized so it can later hit a preview/prod deploy. |
+| Scenario | **Backend warm `/analyze/<user>`** | The realistic viral hot path: cache hit → serialize → respond, no GitHub calls. Backend-only (no Next dev server, which is heavy). |
+| Tool | **Python + httpx/asyncio**, committed to repo | Zero new installs (httpx is already a backend dep); runs via `uv run`; reproducible; in the backend's own toolchain (ruff/pytest/CI). |
+| Warm cache locally | **SRH (`upstash/serverless-redis-http`) over Docker** | The Report cache requires an Upstash-compatible REST endpoint (`get_cache()` has no in-process fallback). Real Upstash can't be used — a 60 s × 100 RPS run is ~6–12k Redis GETs, over the ~10k/day free tier. SRH = real local Redis, no command limits. |
+| Rate limiting during test | **Anonymous + `INTERNAL_PROXY_SECRET` unset → no enforcement** | `analyze_rate_limiter` has `via_trusted_proxy=True`; with the secret unset it *skips* anonymous enforcement (verified in `app/ratelimit.py`). So anonymous load needs no limit-raising and no bypass code. |
+| Execution now | **Build + sanity-verify only; full 100 RPS run is the operator's** | Respects the user's hardware (localhost previously overheated the laptop). Ships the durable harness + runbook; the heavy run is documented, not run by the agent. |
+| Load model | **Open-loop (fixed dispatch rate)** with a bounded-concurrency safety cap | A closed-loop (await-then-send) generator masks saturation by self-throttling. Open-loop reveals queue buildup as latency/error growth — the thing we want to measure. |
+
+## 3. Why warm-`/analyze` needs a cache (operating context)
+
+`app/dependencies.py::get_report_for_user` serves from the Report cache when `get_cache()` is non-None, else falls through to `_live_ingest` (a cold GitHub ingest) on **every** request. `get_cache()` returns `None` unless `UPSTASH_REDIS_REST_URL` + `_TOKEN` are set; there is no in-process Report-cache fallback. Therefore a faithful warm-path test must run against a populated Upstash-compatible cache. SRH provides that locally without the free-tier command ceiling that rules out real Upstash.
+
+A single warmup request (cold ingest, one GitHub round-trip, needs the local `GITHUB_TOKEN`) primes the cache for the chosen user; the timed run is then pure cache hits.
+
+## 4. Architecture (one paragraph)
+
+`backend/loadtest/run.py` is a standalone async script. It builds one `httpx.AsyncClient` (high connection-pool limits), optionally fires `--warmup` priming requests (awaited), then runs an open-loop dispatcher: every `1/rps` seconds it schedules a request task (guarded by an `asyncio.Semaphore(cap)` so a stalled server can't spawn unbounded tasks), recording each request's wall-clock latency and HTTP status. After `--duration` seconds it stops scheduling, drains in-flight tasks, and prints a summary: total sent/completed, errors grouped by status, achieved RPS, p50/p95/p99 latency, and PASS/FAIL against thresholds. Pure functions (percentile computation, summary building) live separately from the I/O loop so they're unit-testable without a server.
+
+## 5. Surface area
+
+### 5.1 New files
+
+| File | Responsibility |
+| --- | --- |
+| `backend/loadtest/run.py` | The harness: CLI parsing, open-loop dispatcher, httpx client, result collection, summary + threshold check. I/O-free helpers (`percentile`, `summarize`, `evaluate_thresholds`) importable for tests. |
+| `backend/loadtest/README.md` | Runbook: local SRH (Docker) setup, env vars, prime-then-measure procedure, how to interpret output, how to ramp to find the knee, and how to point `--target` at a deployed URL. |
+| `backend/tests/loadtest/test_stats.py` | Unit tests for the pure stats helpers (percentiles on known inputs, error-rate computation, threshold pass/fail) — deterministic, no network. |
+
+### 5.2 Modified files
+
+| File | Change |
+| --- | --- |
+| Version literals + CHANGELOG + PLAN + PROGRESS_LOG + `uv.lock` | Release ritual (v0.9.5 → 0.9.6). |
+
+### 5.3 Untouched (intentionally)
+
+- Application code — the harness is a black-box HTTP client; no product code changes. (The "rate limiter skips anonymous when the secret is unset" behavior already exists; we rely on it, we don't add to it.)
+- No new backend dependency — httpx is already locked.
+- Frontend — out of scope (backend capacity test).
+
+## 6. CLI contract
+
+```
+uv run python loadtest/run.py \
+  --target http://localhost:8000 \
+  --path /analyze/octocat \
+  --rps 100 \
+  --duration 60 \
+  --warmup 1 \
+  [--ramp 10:50:100]        # optional: stepped RPS stages to find the knee
+  [--max-inflight 500]      # concurrency safety cap
+  [--p95-ms 250]            # latency threshold for PASS/FAIL
+  [--max-error-rate 0.01]
+```
+
+Exit code 0 on PASS, non-zero on FAIL (so it can gate CI or scripts later).
+
+## 7. Exit thresholds (what "error budget holds" means)
+
+- **Error rate < 1%** of completed requests are non-2xx (ideally zero 5xx).
+- **Achieved RPS ≥ 95% of target** — the dispatcher kept up; the server didn't force the generator to fall behind.
+- **p95 latency < `--p95-ms`** (default 250 ms for a local warm hit; tune from the observed baseline). Local absolute numbers aren't comparable to prod — the load-bearing signals are *rate held* + *errors near zero* + *p95 not diverging from the single-request baseline*.
+
+The runbook documents using `--ramp` to find the knee (the RPS where p95 spikes or errors appear) and recording the max sustainable RPS.
+
+## 8. What gets verified in this slice vs deferred
+
+**Verified now (no laptop stress, in CI where possible):**
+- `backend/tests/loadtest/test_stats.py` passes (percentiles, error rate, threshold logic).
+- A light sanity run of the harness against the backend `/health` endpoint (backend only, no Docker, e.g. 10 RPS × 5 s) confirms the dispatcher, client, and summary work end-to-end.
+
+**Deferred to the operator (documented in the runbook):**
+- The full local warm-`/analyze` 100 RPS run (requires Docker/SRH + backend + `GITHUB_TOKEN`).
+- Recording the result (max sustainable RPS, p95, error rate) — lands in `docs/PROGRESS_LOG.md` when run.
+
+## 9. Exit criteria
+
+- [ ] `backend/loadtest/run.py` exists: open-loop dispatcher, warmup, parameterized CLI, summary + threshold check, correct exit code.
+- [ ] Pure helpers (`percentile`, `summarize`, `evaluate_thresholds`) unit-tested in `backend/tests/loadtest/test_stats.py`; all pass.
+- [ ] `backend/loadtest/README.md` runbook complete: SRH setup, prime-then-measure, ramp/interpret, point-at-deploy.
+- [ ] Light `/health` sanity run succeeds (recorded in PROGRESS_LOG).
+- [ ] `ruff check` + `ruff format --check` clean; backend suite green (existing + new stats tests).
+- [ ] Docs ritual + version bump to 0.9.6; PLAN v0.9.6 row flipped ✅; tag `v0.9.6` + release.
+
+## 10. Out of scope
+
+- Running the full 100 RPS validation (operator step; hardware-gated).
+- Frontend / `/u/[username]` page load testing (conflates Next render with API capacity).
+- Cold-ingest (GitHub-hitting) capacity testing (would need GitHub mocking; different test).
+- Distributed/multi-machine load generation (single-box open-loop is enough at 100 RPS).
+- A CI-gating load test (the harness *can* gate via exit code, but wiring it into CI needs a hosted target + cache — out of scope here).
+- Any application code change (cache shims, test-only modes). The existing rate-limiter-skip behavior is relied upon as-is.
+
+## 11. Implementation ordering
+
+1. **Pure stats helpers + their tests** (`percentile`, `summarize`, `evaluate_thresholds`) — TDD, no network. Commit.
+2. **Open-loop dispatcher + CLI + httpx client** wiring around the helpers. Commit.
+3. **Runbook** (`backend/loadtest/README.md`). Commit.
+4. **Sanity run** against `/health`; record in PROGRESS_LOG. **Docs ritual + version bump + ship.**
+
+**Reversibility:** the harness is additive (new `loadtest/` dir + tests); it changes no product code, so reverting the slice removes only the tooling.
diff --git a/frontend/package.json b/frontend/package.json
index c5d8efb..a261c64 100644
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -1,6 +1,6 @@
 {
   "name": "frontend",
-  "version": "0.9.5",
+  "version": "0.9.6",
   "private": true,
   "scripts": {
     "dev": "next dev",
diff --git a/frontend/src/app/page.tsx b/frontend/src/app/page.tsx
index 1d47093..efe5837 100644
--- a/frontend/src/app/page.tsx
+++ b/frontend/src/app/page.tsx
@@ -23,7 +23,7 @@ export default function Home() {
             transition={{ delay: 0.2, duration: 0.5 }}
             className="rounded-full border border-white/10 bg-white/5 px-3 py-1 text-xs font-medium uppercase tracking-wider text-muted-foreground"
           >
-            Deterministic engineering reports · v0.9.5
+            Deterministic engineering reports · v0.9.6
           </m.span>
 
           <h1 className="text-4xl sm:text-5xl md:text-7xl font-bold tracking-tight text-gradient leading-tight">
diff --git a/frontend/src/components/results-view.tsx b/frontend/src/components/results-view.tsx
index e491277..e92b6cb 100644
--- a/frontend/src/components/results-view.tsx
+++ b/frontend/src/components/results-view.tsx
@@ -352,7 +352,7 @@ export function ResultsView({
         </section>
 
         <footer className="space-y-2 pt-8 text-center text-xs uppercase tracking-widest text-muted-foreground sm:pt-12">
-          <p>Skill Issue — GitHub Reputation Protocol v0.9.5</p>
+          <p>Skill Issue — GitHub Reputation Protocol v0.9.6</p>
         </footer>
       </main>