Shaan-alpha · Shaan-alpha · May 28, 2026 · May 28, 2026 · May 28, 2026 · May 28, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,13 @@ Every version listed here must correspond to a slice in [`PLAN.md`](./PLAN.md) w
 
 ---
 
+## [0.9.6] — 2026-05-28
+
+### Added
+- **Load-test harness** (`backend/loadtest/`) — a reusable open-loop load tester for the backend warm `/analyze` path, reporting latency percentiles, error rate, and achieved throughput against pass/fail thresholds. Includes a runbook for a local 100 RPS warm-cache run and for pointing at a deployed target.
+
+---
+
 ## [0.9.5] — 2026-05-28
 
 ### Security

diff --git a/PLAN.md b/PLAN.md
@@ -44,7 +44,7 @@
 | **v0.9.3** | Deletable `/me` history + back-nav loading fix + creator flair | ✅ shipped |
 | **v0.9.4** | DB pool size env-tunable + real back-nav spinner fix | ✅ shipped |
 | **v0.9.5** | Security review + hardening (OAuth scope ↓ `read:user`, HTTP security headers) | ✅ shipped |
-| **v0.9.6** | Load test to 100 RPS | pending |
+| **v0.9.6** | Load-test harness (warm /analyze; full 100 RPS run = operator step) | ✅ shipped |
 | **v0.9.7** | Privacy policy + terms (legal docs) | pending |
 | **v1.0.0** | Public launch | pending |
 
@@ -670,11 +670,21 @@ The narrative-mode CHECK constraint was a third drift in the same family — the
 
 ---
 
-## v0.9.6 — Load test to 100 RPS (deferred)
+## v0.9.6 — Load-test harness (shipped 2026-05-28)
 
-**Goal:** Load-test to 100 RPS sustained and verify the error budget holds. Needs a deliberate design: target (prod vs preview vs local), cost ceiling (Vercel Active-CPU pricing), and how to handle the v0.9.2 rate limits (a naive test from one IP just measures 429s — raise limits for the window, test `/health` + a warm-cached path, or use a bypass).
+**Goal:** Reusable Python/httpx open-loop load harness for the backend warm `/analyze` path; the full 100 RPS validation run is an operator step (hardware-gated).
 
-**Exit criteria:** TBD when the slice begins.
+**Delivered:** `backend/loadtest/run.py` (open-loop dispatcher, p50/p95/p99, error rate, achieved RPS, pass/fail thresholds, ramp), unit-tested stats helpers, and `backend/loadtest/README.md` runbook (local SRH warm-cache setup + deploy target). Local warm-cache uses SRH (Upstash-compatible Redis over Docker) — real Upstash's ~10k/day free tier can't absorb a 100 RPS run. Anonymous load + unset `INTERNAL_PROXY_SECRET` means the analyze limiter skips enforcement, so no bypass is needed.
+
+**Design spec:** [`docs/superpowers/specs/2026-05-28-v0.9.6-load-test-harness-design.md`](./docs/superpowers/specs/2026-05-28-v0.9.6-load-test-harness-design.md).
+**Sub-plan:** [`docs/superpowers/plans/2026-05-28-v0.9.6-load-test-harness.md`](./docs/superpowers/plans/2026-05-28-v0.9.6-load-test-harness.md).
+
+**Exit criteria:**
+- [x] `loadtest/run.py` + unit-tested stats helpers; ruff clean; backend suite green.
+- [x] Runbook complete (local SRH + deploy target).
+- [x] Light `/health`-class sanity run passes (ran against `/openapi.json`: 10 RPS × 5 s, 0 errors, p95 6.2 ms, PASS).
+- [x] Docs ritual + version bump to 0.9.6; tag + release.
+- [ ] Full 100 RPS warm-`/analyze` result recorded — operator step, filled in when run.
 
 ---
 

diff --git a/README.md b/README.md
@@ -43,7 +43,7 @@ Engineering insight first. AI flavor second. Scoring is deterministic and explai
 
 ## Status
 
-Pre-alpha. Latest shipped release is **v0.9.5** (a full pre-launch security audit — no high/critical findings — that tightened the GitHub OAuth scope to read-only and added HTTP security headers). v0.9.4 before it made the DB connection pool size env-tunable and genuinely fixed the back-nav search spinner; v0.9.3 added deletable `/me` history with undo, a golden "creator" scorecard for the project's creator account, and a first (incomplete) attempt at the back-nav spinner fix. Live at https://skill-issue-tau.vercel.app — GitHub OAuth sign-in, Neon Postgres persistence, `/me` history, opt-in `/share/[slug]` public links. The AI narrative layer (Roast + Mentor) runs on **Groq** (`llama-3.3-70b-versatile`). v0.7.0 added Upstash Redis caching (warm `/analyze` ≤ 200 ms); v0.7.2 prod-certified the perf budget (CLS 0.080 → **0** structurally, perf 90 → 94, LCP 2,804 → 2,773 ms); v0.8.0 shipped Sentry (FE+BE), PostHog (events + web vitals), structlog JSON logging, on-voice 404, and a full axe a11y pass; v0.8.1 ships the nightly cron with bearer auth; v0.8.2 pairs it with the manual force-refresh button on `/me`; v0.8.3 hotfixes the empty-repo crash; v0.8.4 fixes the silent narrative misattribution; v0.8.5 closes the post-deploy-Sentry loop with a pre-merge CI gate; v0.8.6 closes v0.7.1's deferred share-page caching; v0.8.7 modernizes project config; v0.9.0 opens Beta hardening with bounded GH fan-out; v0.9.1 closes the /me N+1 + adds per-namespace Report cache versioning; v0.9.2 adds rate limiting (per-IP for anonymous, higher per-user caps for signed-in) on `/analyze` and `/narrative`; v0.9.3 adds deletable `/me` history with undo, attempts the back-nav search-spinner fix, and gilds the creator's scorecard. v0.9.4 makes the DB connection pool size env-tunable (defaults unchanged — RUM showed no pool exhaustion) and lands the real back-nav spinner fix (the v0.9.3 attempt addressed the wrong mechanism); v0.9.5 runs a full pre-launch security audit (no high/critical findings), tightens the OAuth scope to `read:user`, and adds HTTP security headers. **v0.9.6 — load test to 100 RPS** is next. See [`CHANGELOG.md`](./CHANGELOG.md) for shipped slices, [`PLAN.md`](./PLAN.md) for the full roadmap, and [`docs/PROGRESS_LOG.md`](./docs/PROGRESS_LOG.md) for the most recent session handoff.
+Pre-alpha. Latest shipped release is **v0.9.6** (a reusable load-test harness for the warm `/analyze` path; the full 100 RPS run is an operator step). v0.9.5 before it ran a full pre-launch security audit — no high/critical findings — tightening the GitHub OAuth scope to read-only and adding HTTP security headers; v0.9.4 made the DB connection pool size env-tunable and genuinely fixed the back-nav search spinner; v0.9.3 added deletable `/me` history with undo, a golden "creator" scorecard for the project's creator account, and a first (incomplete) attempt at the back-nav spinner fix. Live at https://skill-issue-tau.vercel.app — GitHub OAuth sign-in, Neon Postgres persistence, `/me` history, opt-in `/share/[slug]` public links. The AI narrative layer (Roast + Mentor) runs on **Groq** (`llama-3.3-70b-versatile`). v0.7.0 added Upstash Redis caching (warm `/analyze` ≤ 200 ms); v0.7.2 prod-certified the perf budget (CLS 0.080 → **0** structurally, perf 90 → 94, LCP 2,804 → 2,773 ms); v0.8.0 shipped Sentry (FE+BE), PostHog (events + web vitals), structlog JSON logging, on-voice 404, and a full axe a11y pass; v0.8.1 ships the nightly cron with bearer auth; v0.8.2 pairs it with the manual force-refresh button on `/me`; v0.8.3 hotfixes the empty-repo crash; v0.8.4 fixes the silent narrative misattribution; v0.8.5 closes the post-deploy-Sentry loop with a pre-merge CI gate; v0.8.6 closes v0.7.1's deferred share-page caching; v0.8.7 modernizes project config; v0.9.0 opens Beta hardening with bounded GH fan-out; v0.9.1 closes the /me N+1 + adds per-namespace Report cache versioning; v0.9.2 adds rate limiting (per-IP for anonymous, higher per-user caps for signed-in) on `/analyze` and `/narrative`; v0.9.3 adds deletable `/me` history with undo, attempts the back-nav search-spinner fix, and gilds the creator's scorecard. v0.9.4 makes the DB connection pool size env-tunable (defaults unchanged — RUM showed no pool exhaustion) and lands the real back-nav spinner fix (the v0.9.3 attempt addressed the wrong mechanism); v0.9.5 runs a full pre-launch security audit (no high/critical findings), tightens the OAuth scope to `read:user`, and adds HTTP security headers; v0.9.6 adds a reusable load-test harness for the warm `/analyze` path (the full 100 RPS run is an operator step). **v0.9.7 — privacy policy + terms** is next. See [`CHANGELOG.md`](./CHANGELOG.md) for shipped slices, [`PLAN.md`](./PLAN.md) for the full roadmap, and [`docs/PROGRESS_LOG.md`](./docs/PROGRESS_LOG.md) for the most recent session handoff.
 
 ---
 
@@ -76,7 +76,7 @@ cp .env.example .env        # then edit .env and add your GITHUB_TOKEN and OPENA
 uv run uvicorn app.main:app --reload --port 8000
 ```
 
-Verify: `curl http://localhost:8000/health` → `{"status":"ok","version":"0.9.5","db":"up"|"down","cache":"up"|"down"|"unconfigured"}`. The `db` field reports DB reachability when `DATABASE_URL` is configured; the `cache` field reports Upstash reachability (`unconfigured` when `UPSTASH_REDIS_REST_URL` isn't set — perfectly fine for local dev, the in-process fallback covers it).
+Verify: `curl http://localhost:8000/health` → `{"status":"ok","version":"0.9.6","db":"up"|"down","cache":"up"|"down"|"unconfigured"}`. The `db` field reports DB reachability when `DATABASE_URL` is configured; the `cache` field reports Upstash reachability (`unconfigured` when `UPSTASH_REDIS_REST_URL` isn't set — perfectly fine for local dev, the in-process fallback covers it).
 Hit the analyzer: `curl http://localhost:8000/analyze/octocat`.
 
 ### Frontend (`:3000`)

diff --git a/backend/app/settings.py b/backend/app/settings.py
@@ -2,7 +2,7 @@
 
 from pydantic_settings import BaseSettings, SettingsConfigDict
 
-VERSION = "0.9.5"
+VERSION = "0.9.6"
 
 
 class Settings(BaseSettings):

diff --git a/backend/loadtest/README.md b/backend/loadtest/README.md
@@ -0,0 +1,92 @@
+# Load test harness
+
+Open-loop load tester for the backend. Drives a target endpoint at a fixed RPS
+and reports p50/p95/p99 latency, error rate, achieved throughput, and PASS/FAIL
+against thresholds. Run from `backend/`:
+
+```bash
+uv run python loadtest/run.py --help
+```
+
+> **Windows / Git Bash note:** a bare `--path /health` argument gets mangled by
+> MSYS into a Windows path (corrupting the URL). Prefix the command with
+> `MSYS_NO_PATHCONV=1`, or run it from PowerShell, or use `--path=//health`.
+
+## Quick sanity check (no Docker, no cache)
+
+Start the backend, then hit a cheap endpoint to confirm the harness works.
+Locally there's usually no `DATABASE_URL`, so `/health` blocks ~20 s per request
+on a doomed DB ping — use `/openapi.json` instead for a clean check:
+
+```bash
+uv run uvicorn app.main:app --port 8000   # terminal 1
+uv run python loadtest/run.py --target http://localhost:8000 --path /openapi.json \
+  --rps 10 --duration 5 --warmup 1 --p95-ms 1000   # terminal 2
+```
+
+Expect `errors=0` and `RESULT: PASS`.
+
+## Full warm-`/analyze` 100 RPS run (local)
+
+The warm path needs a populated Report cache. `get_cache()` returns `None`
+without Upstash, and real Upstash's free tier (~10k commands/day) can't absorb a
+100 RPS run — so use a **local** Upstash-compatible Redis via SRH.
+
+1. **Start a local Upstash-compatible Redis (SRH over Redis):**
+
+   ```bash
+   docker run -d --name si-redis -p 6379:6379 redis:7
+   docker run -d --name si-srh -p 8079:80 \
+     -e SRH_MODE=env -e SRH_TOKEN=local-token \
+     -e SRH_CONNECTION_STRING="redis://host.docker.internal:6379" \
+     hiett/serverless-redis-http:latest
+   ```
+
+2. **Start the backend pointed at SRH, with a real GitHub token, and the proxy
+   secret UNSET** (so the analyze limiter skips anonymous enforcement):
+
+   ```bash
+   UPSTASH_REDIS_REST_URL=http://localhost:8079 \
+   UPSTASH_REDIS_REST_TOKEN=local-token \
+   GITHUB_TOKEN=<your_token> \
+   uv run uvicorn app.main:app --port 8000
+   ```
+   (Ensure `INTERNAL_PROXY_SECRET` is **not** set in the environment, and send no
+   session cookie — the harness is anonymous by default.)
+
+3. **Run the load test** (the `--warmup` request cold-ingests once to prime the
+   cache; the timed run is then pure cache hits):
+
+   ```bash
+   MSYS_NO_PATHCONV=1 uv run python loadtest/run.py \
+     --target http://localhost:8000 --path /analyze/octocat \
+     --rps 100 --duration 60 --warmup 1
+   ```
+
+4. **Find the knee** with a ramp:
+
+   ```bash
+   MSYS_NO_PATHCONV=1 uv run python loadtest/run.py --path /analyze/octocat \
+     --ramp 50:100:200:400 --duration 30 --warmup 1
+   ```
+   Record the highest RPS stage that still PASSes (error rate < 1%, achieved
+   RPS ≥ 95% of target, p95 under `--p95-ms`).
+
+5. **Tear down:** `docker rm -f si-srh si-redis`.
+
+## Pointing at a deployed target
+
+```bash
+uv run python loadtest/run.py --target https://<host>/_/backend --path /analyze/octocat --rps 100 --duration 30
+```
+Mind the cost (Vercel Active-CPU) and rate limits: a deployed backend with
+`INTERNAL_PROXY_SECRET` set WILL rate-limit anonymous `/analyze` — sign in or
+raise the limits for the window. Keep deployed runs short.
+
+## Thresholds (PASS/FAIL)
+
+- error rate `< --max-error-rate` (default 1%)
+- achieved RPS `>= 95%` of `--rps`
+- p95 latency `< --p95-ms` (default 250 ms; tune from the warm baseline)
+
+Exit code is 0 on PASS, non-zero on FAIL.
diff --git a/backend/loadtest/__init__.py b/backend/loadtest/__init__.py
diff --git a/backend/loadtest/run.py b/backend/loadtest/run.py
@@ -0,0 +1,212 @@
+"""Open-loop load-test harness for the Skill Issue backend.
+
+Drives a target endpoint at a fixed request rate and reports latency
+percentiles, error rate, and achieved throughput against pass/fail
+thresholds. See backend/loadtest/README.md for the local warm-/analyze
+runbook. Run: uv run python loadtest/run.py --help
+"""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+import contextlib
+import sys
+import time
+from dataclasses import dataclass
+
+import httpx
+
+
+@dataclass
+class Result:
+    latency_ms: float
+    status: int | None  # None = connection error / timeout
+
+
+@dataclass
+class Summary:
+    sent: int
+    completed: int
+    dropped: int
+    error_count: int
+    error_rate: float
+    achieved_rps: float
+    p50_ms: float
+    p95_ms: float
+    p99_ms: float
+    errors_by_status: dict[str, int]
+    duration_s: float
+
+
+def percentile(values: list[float], p: float) -> float:
+    """Linear-interpolated p-th percentile (p in [0, 100]). 0.0 for empty input."""
+    if not values:
+        return 0.0
+    s = sorted(values)
+    if len(s) == 1:
+        return s[0]
+    k = (len(s) - 1) * (p / 100.0)
+    lo = int(k)
+    hi = min(lo + 1, len(s) - 1)
+    frac = k - lo
+    return round(s[lo] * (1 - frac) + s[hi] * frac, 10)
+
+
+def summarize(results: list[Result], *, dropped: int, wall_seconds: float) -> Summary:
+    completed = len(results)
+    latencies = [r.latency_ms for r in results]
+    errors_by_status: dict[str, int] = {}
+    error_count = 0
+    for r in results:
+        if r.status is None or r.status >= 400:
+            key = "connection_error" if r.status is None else str(r.status)
+            errors_by_status[key] = errors_by_status.get(key, 0) + 1
+            error_count += 1
+    error_rate = (error_count / completed) if completed else 1.0
+    achieved_rps = (completed / wall_seconds) if wall_seconds > 0 else 0.0
+    return Summary(
+        sent=completed + dropped,
+        completed=completed,
+        dropped=dropped,
+        error_count=error_count,
+        error_rate=error_rate,
+        achieved_rps=achieved_rps,
+        p50_ms=percentile(latencies, 50),
+        p95_ms=percentile(latencies, 95),
+        p99_ms=percentile(latencies, 99),
+        errors_by_status=errors_by_status,
+        duration_s=wall_seconds,
+    )
+
+
+def evaluate_thresholds(
+    summary: Summary, *, target_rps: float, max_error_rate: float, p95_ms: float
+) -> tuple[bool, list[str]]:
+    failures: list[str] = []
+    if summary.error_rate > max_error_rate:
+        failures.append(f"error rate {summary.error_rate:.3%} > {max_error_rate:.3%}")
+    if summary.achieved_rps < target_rps * 0.95:
+        failures.append(f"achieved RPS {summary.achieved_rps:.1f} < 95% of target {target_rps:.0f}")
+    if summary.p95_ms > p95_ms:
+        failures.append(f"p95 {summary.p95_ms:.1f}ms > {p95_ms:.1f}ms")
+    return (not failures, failures)
+
+
+async def _one_request(client: httpx.AsyncClient, url: str, results: list[Result]) -> None:
+    t0 = time.perf_counter()
+    try:
+        resp = await client.get(url)
+        status: int | None = resp.status_code
+    except (httpx.HTTPError, OSError):
+        status = None
+    results.append(Result(latency_ms=(time.perf_counter() - t0) * 1000.0, status=status))
+
+
+async def run_stage(
+    client: httpx.AsyncClient,
+    url: str,
+    *,
+    rps: float,
+    duration: float,
+    max_inflight: int,
+) -> tuple[list[Result], int]:
+    """Open-loop: schedule requests at a fixed rate for `duration` seconds.
+
+    Returns (results, dropped). `dropped` counts ticks skipped because
+    `max_inflight` was saturated — a "server can't keep up" signal. The
+    scheduler never blocks on in-flight requests, so a slow server shows up as
+    dropped ticks + latency growth rather than self-throttled load.
+    """
+    results: list[Result] = []
+    tasks: set[asyncio.Task[None]] = set()
+    inflight = 0
+    dropped = 0
+    interval = 1.0 / rps
+    loop = asyncio.get_running_loop()
+
+    def _done(t: asyncio.Task[None]) -> None:
+        nonlocal inflight
+        inflight -= 1
+        tasks.discard(t)
+
+    start = loop.time()
+    i = 0
+    while loop.time() - start < duration:
+        delay = (start + i * interval) - loop.time()
+        if delay > 0:
+            await asyncio.sleep(delay)
+        i += 1
+        if inflight >= max_inflight:
+            dropped += 1
+            continue
+        inflight += 1
+        task = asyncio.create_task(_one_request(client, url, results))
+        task.add_done_callback(_done)
+        tasks.add(task)
+
+    if tasks:
+        await asyncio.gather(*tasks, return_exceptions=True)
+    return results, dropped
+
+
+def _parse_ramp(ramp: str) -> list[float]:
+    """'10:50:100' -> [10.0, 50.0, 100.0]."""
+    return [float(part) for part in ramp.split(":") if part]
+
+
+def _print_summary(url: str, rps: float, s: Summary, ok: bool, failures: list[str]) -> None:
+    print(f"\n=== {url} @ target {rps:.0f} RPS for {s.duration_s:.1f}s ===")
+    print(f"  sent={s.sent} completed={s.completed} dropped={s.dropped}")
+    print(f"  achieved_rps={s.achieved_rps:.1f}")
+    print(f"  errors={s.error_count} ({s.error_rate:.3%}) {s.errors_by_status or ''}")
+    print(f"  latency p50={s.p50_ms:.1f}ms p95={s.p95_ms:.1f}ms p99={s.p99_ms:.1f}ms")
+    print(f"  RESULT: {'PASS' if ok else 'FAIL'}")
+    for f in failures:
+        print(f"    - {f}")
+
+
+async def _amain(args: argparse.Namespace) -> int:
+    url = args.target.rstrip("/") + args.path
+    cap = args.max_inflight + 50
+    limits = httpx.Limits(max_connections=cap, max_keepalive_connections=cap)
+    overall_ok = True
+    async with httpx.AsyncClient(timeout=args.timeout, limits=limits) as client:
+        for _ in range(args.warmup):
+            with contextlib.suppress(httpx.HTTPError, OSError):
+                await client.get(url)
+        stages = _parse_ramp(args.ramp) if args.ramp else [args.rps]
+        for rps in stages:
+            wall0 = time.perf_counter()
+            results, dropped = await run_stage(
+                client, url, rps=rps, duration=args.duration, max_inflight=args.max_inflight
+            )
+            summary = summarize(results, dropped=dropped, wall_seconds=time.perf_counter() - wall0)
+            ok, failures = evaluate_thresholds(
+                summary,
+                target_rps=rps,
+                max_error_rate=args.max_error_rate,
+                p95_ms=args.p95_ms,
+            )
+            _print_summary(url, rps, summary, ok, failures)
+            overall_ok = overall_ok and ok
+    return 0 if overall_ok else 1
+
+
+def main() -> None:
+    p = argparse.ArgumentParser(description="Open-loop load tester for the Skill Issue backend.")
+    p.add_argument("--target", default="http://localhost:8000")
+    p.add_argument("--path", default="/analyze/octocat")
+    p.add_argument("--rps", type=float, default=100.0)
+    p.add_argument("--duration", type=float, default=60.0)
+    p.add_argument("--warmup", type=int, default=1)
+    p.add_argument("--ramp", default=None, help="colon-separated RPS stages, e.g. 10:50:100")
+    p.add_argument("--max-inflight", type=int, default=500, dest="max_inflight")
+    p.add_argument("--timeout", type=float, default=20.0)
+    p.add_argument("--p95-ms", type=float, default=250.0, dest="p95_ms")
+    p.add_argument("--max-error-rate", type=float, default=0.01, dest="max_error_rate")
+    sys.exit(asyncio.run(_amain(p.parse_args())))
+
+
+if __name__ == "__main__":
+    main()