feat: dispatch-outcomes source — efficiency-trend + gate-friction findings from per-dispatch telemetry

Filed from a private dispatch-discipline layer in the zenprocess ecosystem (a Claude Code hook-set that gates dispatches for atomicity and emits per-dispatch outcome telemetry). The producer stays opaque; this issue specifies only the **afterburn-side** capability — a new `dispatch-outcomes` source that turns a structured per-dispatch telemetry stream into afterburn findings.

## Context

Afterburn today mines **conversation transcripts** (`~/.claude/projects/**/*.jsonl`) — unstructured message/tool-call history — into friction / patterns / gaps / releases findings. That's the qualitative layer.

There is a second, *structured* residual-intelligence source it does not yet read: **per-dispatch outcome telemetry**. Some Claude Code tooling emits an append-only JSONL log with one record per dispatch decision and one per completion (token counts, cache hit ratios, gate decisions, acceptance pass/fail). This is purpose-built numeric telemetry, not transcript prose — so a regex/RLM transcript pass is the wrong instrument. It needs a typed source adapter.

The question this answers is quantitative and falsifiable: **does input-tokens-per-completed-task trend down as a dispatch layer enforces atomic decomposition + prompt-cache reuse?** That's a time-series regression over a documented schema, and it belongs in afterburn — the ecosystem's spent-session intelligence product — not duplicated inside each producer.

This is the same producer→findings pattern as #8 (switchyard → afterburn): the producer emits raw signal; afterburn owns analysis, findings, retrieval, and evolution.

## The source schema (`dispatch-outcomes`)

An append-only JSONL stream. Two record kinds, joined on `prompt_fingerprint` (+ `attempt_id` when present):

**`kind: "dispatch"`** — emitted at the gate, before the work runs:

| field | type | meaning |
|---|---|---|
| `kind` | `"dispatch"` | record discriminator |
| `ts_dispatched` | int (ms) | gate decision time |
| `agent` | str | dispatched agent/arm name |
| `attempt_id` | str | unique per dispatch attempt |
| `prompt_fingerprint` | str | stable hash of the task prompt (join key) |
| `prompt_chars` | int | prompt length |
| `prompt_files_named` | int | count of files named in the prompt |
| `prompt_input_token_estimate` | int | pre-dispatch token estimate |
| `has_design_verb` | bool | prompt contains a non-atomic "design" verb |
| `has_acceptance_verb` | bool | prompt contains a closed-form acceptance command |
| `atomic_hook_outcome` | str | `"allow"` \| `"deny"` (gate decision) |
| `block_reasons` | str[] | why the gate blocked (empty on allow) |
| `rationalization_match` | str\|null | which known "rationalization" pattern the prompt matched |

**`kind: "finalize"`** — emitted at completion:

| field | type | meaning |
|---|---|---|
| `kind` | `"finalize"` | record discriminator |
| `ts_completed` | int (ms) | completion time |
| `agent` / `agent_id` | str | arm + concrete instance id |
| `prompt_fingerprint` | str | join key back to the dispatch record |
| `status` | str | `"completed"` \| `"failed"` \| … |
| `ms_elapsed` | int | wall-clock |
| `total_tokens` / `input_tokens` / `output_tokens` | int | usage |
| `cache_read_tokens` / `cache_creation_tokens` | int | prompt-cache usage |
| `acceptance_passed` | bool\|null | did the closed-form acceptance command pass |
| `suggested_registry_edit` | obj\|null | producer-proposed capability/registry change |
| `error` | str\|null | failure detail |

The schema is generic dispatch telemetry — no secrets, no hostnames, no private product names. The source **path is a CLI arg / env var**, never hardcoded (the producer's state dir is private and machine-local).

## Proposed changes

### US-A — `dispatch-outcomes` source adapter + schema parser
New flat module `afterburn/dispatch_outcomes.py`. Parse the JSONL, tolerate unknown/extra fields (forward-compat), and join `dispatch`↔`finalize` on `prompt_fingerprint` (+`attempt_id`) into a typed `DispatchRecord`. Skip malformed lines with a counted warning (never crash a run on one bad line).
**Acceptance:** `pytest tests/test_dispatch_outcomes.py -k parse_and_join` — a fixture with 2 dispatch + 2 finalize lines (one unmatched, one malformed) yields exactly 1 joined record + 1 unmatched + 1 skipped, asserted by count.

### US-B — efficiency-trend analyzer + verdict finding
Compute **input-tokens-per-completed-task** (restrict to `status==completed`, optionally `acceptance_passed==true`) as a time-ordered series, per-arm and global. Fit a simple slope (no heavy deps — ordinary least squares in plain Python over the `requests`-only dependency floor) + report the cache-read ratio distribution. Emit one `efficiency_trend` finding per arm and one global, each with: n, slope sign, first-vs-last-decile median, and a PASS/FAIL on "trend is non-increasing."
**Acceptance:** `pytest -k efficiency_trend` — a synthetic descending series yields `verdict: pass`; a flat/ascending series yields `verdict: fail`. The finding JSON validates against the findings schema.

### US-C — gate-friction findings (reuse the correction taxonomy from #3)
Map `atomic_hook_outcome=="deny"` + `block_reasons` + `rationalization_match` onto afterburn's existing **friction** finding type. The producer's "rationalizations" are structurally a correction taxonomy (cf. #3) — surface the top-N block reasons and their frequencies as friction findings with remediation hints ("prompt named >3 files", "design verb without decomposition", etc.).
**Acceptance:** `pytest -k gate_friction` — a fixture with 3 denies (2 sharing a rationalization) emits a friction finding ranking that rationalization first, count==2.

### US-D — registry-edit suggestion findings
Surface non-null `suggested_registry_edit` values as candidate findings (dedupe by a signature characteristic, mirroring #8's `signature_characteristic` filter). This is the producer's half-built "outcome → capability-PR" loop; afterburn owns turning the suggestion into a reviewable, verifiable finding (status transitions via #8 US-D `PATCH …/verify`).
**Acceptance:** `pytest -k registry_suggestions` — 3 finalize records (2 identical suggestions) yield 1 deduped finding with `occurrences==2`.

### US-E — CLI wiring + docs + tests
`afterburn discover --source dispatch-outcomes --path <jsonl>` (and `AFTERBURN_DISPATCH_OUTCOMES_PATH` env fallback). Default source set unchanged (transcript scan) — `dispatch-outcomes` is opt-in. README "What It Does" table gains a row; add the findings to the `.afterburn/` output summary. Wire the new test module into the suite.
**Acceptance:** `afterburn discover --source dispatch-outcomes --path tests/fixtures/outcomes.jsonl` exits 0 and writes an `efficiency_trend` finding to the configured store; `pytest tests/test_dispatch_outcomes.py` green.

## Why this matters

- afterburn is already the **findings hub** (#8) — adding a typed numeric source is additive, not architectural.
- The producer's founding hypothesis (input-tokens-per-task ↓ as discipline is enforced) currently has **1,400+ unanalyzed records** and no instrument. Centralizing the verdict here means every producer that emits this schema gets the trend analysis for free.
- US-C reinforces the **correction taxonomy** (#3) with a second, structured signal — the gate's own block reasons.
- US-D feeds the **evolve** loop with reviewable capability-edit candidates.

## Non-ask (scope fences)

- No change to the default transcript scan — `dispatch-outcomes` is an opt-in source.
- No coupling to any specific producer — adapter reads a documented schema from a caller-supplied path; tolerates extra fields.
- No new heavy deps — stay within the current `requests`-only floor (OLS in plain Python).
- No secrets/paths/product names committed — path is runtime-supplied.

## Estimate

~250 LOC impl (`dispatch_outcomes.py` + analyzers + CLI wiring) + ~120 LOC tests/fixtures.

## Cross-links

- #8 — ingestion + retrieval (this source emits findings through the same store; US-D verification applies)
- #5 — gap detection on Agent dispatches (adjacent: both treat dispatches as first-class)
- #3 — correction taxonomy (US-C reuses the friction-finding shape)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: dispatch-outcomes source — efficiency-trend + gate-friction findings from per-dispatch telemetry #9

Context

The source schema (`dispatch-outcomes`)

Proposed changes

US-A — `dispatch-outcomes` source adapter + schema parser

US-B — efficiency-trend analyzer + verdict finding

US-C — gate-friction findings (reuse the correction taxonomy from #3)

US-D — registry-edit suggestion findings

US-E — CLI wiring + docs + tests

Why this matters

Non-ask (scope fences)

Estimate

Cross-links

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

field	type	meaning
`kind`	`"dispatch"`	record discriminator
`ts_dispatched`	int (ms)	gate decision time
`agent`	str	dispatched agent/arm name
`attempt_id`	str	unique per dispatch attempt
`prompt_fingerprint`	str	stable hash of the task prompt (join key)
`prompt_chars`	int	prompt length
`prompt_files_named`	int	count of files named in the prompt
`prompt_input_token_estimate`	int	pre-dispatch token estimate
`has_design_verb`	bool	prompt contains a non-atomic "design" verb
`has_acceptance_verb`	bool	prompt contains a closed-form acceptance command
`atomic_hook_outcome`	str	`"allow"` \| `"deny"` (gate decision)
`block_reasons`	str[]	why the gate blocked (empty on allow)
`rationalization_match`	str\|null	which known "rationalization" pattern the prompt matched

field	type	meaning
`kind`	`"finalize"`	record discriminator
`ts_completed`	int (ms)	completion time
`agent` / `agent_id`	str	arm + concrete instance id
`prompt_fingerprint`	str	join key back to the dispatch record
`status`	str	`"completed"` \| `"failed"` \| …
`ms_elapsed`	int	wall-clock
`total_tokens` / `input_tokens` / `output_tokens`	int	usage
`cache_read_tokens` / `cache_creation_tokens`	int	prompt-cache usage
`acceptance_passed`	bool\|null	did the closed-form acceptance command pass
`suggested_registry_edit`	obj\|null	producer-proposed capability/registry change
`error`	str\|null	failure detail

feat: dispatch-outcomes source — efficiency-trend + gate-friction findings from per-dispatch telemetry #9

Description

Context

The source schema (dispatch-outcomes)

Proposed changes

US-A — dispatch-outcomes source adapter + schema parser

US-B — efficiency-trend analyzer + verdict finding

US-C — gate-friction findings (reuse the correction taxonomy from #3)

US-D — registry-edit suggestion findings

US-E — CLI wiring + docs + tests

Why this matters

Non-ask (scope fences)

Estimate

Cross-links

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

The source schema (`dispatch-outcomes`)

US-A — `dispatch-outcomes` source adapter + schema parser