Filed from a private dispatch-discipline layer in the zenprocess ecosystem (a Claude Code hook-set that gates dispatches for atomicity and emits per-dispatch outcome telemetry). The producer stays opaque; this issue specifies only the afterburn-side capability — a new dispatch-outcomes source that turns a structured per-dispatch telemetry stream into afterburn findings.
Context
Afterburn today mines conversation transcripts (~/.claude/projects/**/*.jsonl) — unstructured message/tool-call history — into friction / patterns / gaps / releases findings. That's the qualitative layer.
There is a second, structured residual-intelligence source it does not yet read: per-dispatch outcome telemetry. Some Claude Code tooling emits an append-only JSONL log with one record per dispatch decision and one per completion (token counts, cache hit ratios, gate decisions, acceptance pass/fail). This is purpose-built numeric telemetry, not transcript prose — so a regex/RLM transcript pass is the wrong instrument. It needs a typed source adapter.
The question this answers is quantitative and falsifiable: does input-tokens-per-completed-task trend down as a dispatch layer enforces atomic decomposition + prompt-cache reuse? That's a time-series regression over a documented schema, and it belongs in afterburn — the ecosystem's spent-session intelligence product — not duplicated inside each producer.
This is the same producer→findings pattern as #8 (switchyard → afterburn): the producer emits raw signal; afterburn owns analysis, findings, retrieval, and evolution.
The source schema (dispatch-outcomes)
An append-only JSONL stream. Two record kinds, joined on prompt_fingerprint (+ attempt_id when present):
kind: "dispatch" — emitted at the gate, before the work runs:
| field |
type |
meaning |
kind |
"dispatch" |
record discriminator |
ts_dispatched |
int (ms) |
gate decision time |
agent |
str |
dispatched agent/arm name |
attempt_id |
str |
unique per dispatch attempt |
prompt_fingerprint |
str |
stable hash of the task prompt (join key) |
prompt_chars |
int |
prompt length |
prompt_files_named |
int |
count of files named in the prompt |
prompt_input_token_estimate |
int |
pre-dispatch token estimate |
has_design_verb |
bool |
prompt contains a non-atomic "design" verb |
has_acceptance_verb |
bool |
prompt contains a closed-form acceptance command |
atomic_hook_outcome |
str |
"allow" | "deny" (gate decision) |
block_reasons |
str[] |
why the gate blocked (empty on allow) |
rationalization_match |
str|null |
which known "rationalization" pattern the prompt matched |
kind: "finalize" — emitted at completion:
| field |
type |
meaning |
kind |
"finalize" |
record discriminator |
ts_completed |
int (ms) |
completion time |
agent / agent_id |
str |
arm + concrete instance id |
prompt_fingerprint |
str |
join key back to the dispatch record |
status |
str |
"completed" | "failed" | … |
ms_elapsed |
int |
wall-clock |
total_tokens / input_tokens / output_tokens |
int |
usage |
cache_read_tokens / cache_creation_tokens |
int |
prompt-cache usage |
acceptance_passed |
bool|null |
did the closed-form acceptance command pass |
suggested_registry_edit |
obj|null |
producer-proposed capability/registry change |
error |
str|null |
failure detail |
The schema is generic dispatch telemetry — no secrets, no hostnames, no private product names. The source path is a CLI arg / env var, never hardcoded (the producer's state dir is private and machine-local).
Proposed changes
US-A — dispatch-outcomes source adapter + schema parser
New flat module afterburn/dispatch_outcomes.py. Parse the JSONL, tolerate unknown/extra fields (forward-compat), and join dispatch↔finalize on prompt_fingerprint (+attempt_id) into a typed DispatchRecord. Skip malformed lines with a counted warning (never crash a run on one bad line).
Acceptance: pytest tests/test_dispatch_outcomes.py -k parse_and_join — a fixture with 2 dispatch + 2 finalize lines (one unmatched, one malformed) yields exactly 1 joined record + 1 unmatched + 1 skipped, asserted by count.
US-B — efficiency-trend analyzer + verdict finding
Compute input-tokens-per-completed-task (restrict to status==completed, optionally acceptance_passed==true) as a time-ordered series, per-arm and global. Fit a simple slope (no heavy deps — ordinary least squares in plain Python over the requests-only dependency floor) + report the cache-read ratio distribution. Emit one efficiency_trend finding per arm and one global, each with: n, slope sign, first-vs-last-decile median, and a PASS/FAIL on "trend is non-increasing."
Acceptance: pytest -k efficiency_trend — a synthetic descending series yields verdict: pass; a flat/ascending series yields verdict: fail. The finding JSON validates against the findings schema.
US-C — gate-friction findings (reuse the correction taxonomy from #3)
Map atomic_hook_outcome=="deny" + block_reasons + rationalization_match onto afterburn's existing friction finding type. The producer's "rationalizations" are structurally a correction taxonomy (cf. #3) — surface the top-N block reasons and their frequencies as friction findings with remediation hints ("prompt named >3 files", "design verb without decomposition", etc.).
Acceptance: pytest -k gate_friction — a fixture with 3 denies (2 sharing a rationalization) emits a friction finding ranking that rationalization first, count==2.
US-D — registry-edit suggestion findings
Surface non-null suggested_registry_edit values as candidate findings (dedupe by a signature characteristic, mirroring #8's signature_characteristic filter). This is the producer's half-built "outcome → capability-PR" loop; afterburn owns turning the suggestion into a reviewable, verifiable finding (status transitions via #8 US-D PATCH …/verify).
Acceptance: pytest -k registry_suggestions — 3 finalize records (2 identical suggestions) yield 1 deduped finding with occurrences==2.
US-E — CLI wiring + docs + tests
afterburn discover --source dispatch-outcomes --path <jsonl> (and AFTERBURN_DISPATCH_OUTCOMES_PATH env fallback). Default source set unchanged (transcript scan) — dispatch-outcomes is opt-in. README "What It Does" table gains a row; add the findings to the .afterburn/ output summary. Wire the new test module into the suite.
Acceptance: afterburn discover --source dispatch-outcomes --path tests/fixtures/outcomes.jsonl exits 0 and writes an efficiency_trend finding to the configured store; pytest tests/test_dispatch_outcomes.py green.
Why this matters
Non-ask (scope fences)
- No change to the default transcript scan —
dispatch-outcomes is an opt-in source.
- No coupling to any specific producer — adapter reads a documented schema from a caller-supplied path; tolerates extra fields.
- No new heavy deps — stay within the current
requests-only floor (OLS in plain Python).
- No secrets/paths/product names committed — path is runtime-supplied.
Estimate
~250 LOC impl (dispatch_outcomes.py + analyzers + CLI wiring) + ~120 LOC tests/fixtures.
Cross-links
Filed from a private dispatch-discipline layer in the zenprocess ecosystem (a Claude Code hook-set that gates dispatches for atomicity and emits per-dispatch outcome telemetry). The producer stays opaque; this issue specifies only the afterburn-side capability — a new
dispatch-outcomessource that turns a structured per-dispatch telemetry stream into afterburn findings.Context
Afterburn today mines conversation transcripts (
~/.claude/projects/**/*.jsonl) — unstructured message/tool-call history — into friction / patterns / gaps / releases findings. That's the qualitative layer.There is a second, structured residual-intelligence source it does not yet read: per-dispatch outcome telemetry. Some Claude Code tooling emits an append-only JSONL log with one record per dispatch decision and one per completion (token counts, cache hit ratios, gate decisions, acceptance pass/fail). This is purpose-built numeric telemetry, not transcript prose — so a regex/RLM transcript pass is the wrong instrument. It needs a typed source adapter.
The question this answers is quantitative and falsifiable: does input-tokens-per-completed-task trend down as a dispatch layer enforces atomic decomposition + prompt-cache reuse? That's a time-series regression over a documented schema, and it belongs in afterburn — the ecosystem's spent-session intelligence product — not duplicated inside each producer.
This is the same producer→findings pattern as #8 (switchyard → afterburn): the producer emits raw signal; afterburn owns analysis, findings, retrieval, and evolution.
The source schema (
dispatch-outcomes)An append-only JSONL stream. Two record kinds, joined on
prompt_fingerprint(+attempt_idwhen present):kind: "dispatch"— emitted at the gate, before the work runs:kind"dispatch"ts_dispatchedagentattempt_idprompt_fingerprintprompt_charsprompt_files_namedprompt_input_token_estimatehas_design_verbhas_acceptance_verbatomic_hook_outcome"allow"|"deny"(gate decision)block_reasonsrationalization_matchkind: "finalize"— emitted at completion:kind"finalize"ts_completedagent/agent_idprompt_fingerprintstatus"completed"|"failed"| …ms_elapsedtotal_tokens/input_tokens/output_tokenscache_read_tokens/cache_creation_tokensacceptance_passedsuggested_registry_editerrorThe schema is generic dispatch telemetry — no secrets, no hostnames, no private product names. The source path is a CLI arg / env var, never hardcoded (the producer's state dir is private and machine-local).
Proposed changes
US-A —
dispatch-outcomessource adapter + schema parserNew flat module
afterburn/dispatch_outcomes.py. Parse the JSONL, tolerate unknown/extra fields (forward-compat), and joindispatch↔finalizeonprompt_fingerprint(+attempt_id) into a typedDispatchRecord. Skip malformed lines with a counted warning (never crash a run on one bad line).Acceptance:
pytest tests/test_dispatch_outcomes.py -k parse_and_join— a fixture with 2 dispatch + 2 finalize lines (one unmatched, one malformed) yields exactly 1 joined record + 1 unmatched + 1 skipped, asserted by count.US-B — efficiency-trend analyzer + verdict finding
Compute input-tokens-per-completed-task (restrict to
status==completed, optionallyacceptance_passed==true) as a time-ordered series, per-arm and global. Fit a simple slope (no heavy deps — ordinary least squares in plain Python over therequests-only dependency floor) + report the cache-read ratio distribution. Emit oneefficiency_trendfinding per arm and one global, each with: n, slope sign, first-vs-last-decile median, and a PASS/FAIL on "trend is non-increasing."Acceptance:
pytest -k efficiency_trend— a synthetic descending series yieldsverdict: pass; a flat/ascending series yieldsverdict: fail. The finding JSON validates against the findings schema.US-C — gate-friction findings (reuse the correction taxonomy from #3)
Map
atomic_hook_outcome=="deny"+block_reasons+rationalization_matchonto afterburn's existing friction finding type. The producer's "rationalizations" are structurally a correction taxonomy (cf. #3) — surface the top-N block reasons and their frequencies as friction findings with remediation hints ("prompt named >3 files", "design verb without decomposition", etc.).Acceptance:
pytest -k gate_friction— a fixture with 3 denies (2 sharing a rationalization) emits a friction finding ranking that rationalization first, count==2.US-D — registry-edit suggestion findings
Surface non-null
suggested_registry_editvalues as candidate findings (dedupe by a signature characteristic, mirroring #8'ssignature_characteristicfilter). This is the producer's half-built "outcome → capability-PR" loop; afterburn owns turning the suggestion into a reviewable, verifiable finding (status transitions via #8 US-DPATCH …/verify).Acceptance:
pytest -k registry_suggestions— 3 finalize records (2 identical suggestions) yield 1 deduped finding withoccurrences==2.US-E — CLI wiring + docs + tests
afterburn discover --source dispatch-outcomes --path <jsonl>(andAFTERBURN_DISPATCH_OUTCOMES_PATHenv fallback). Default source set unchanged (transcript scan) —dispatch-outcomesis opt-in. README "What It Does" table gains a row; add the findings to the.afterburn/output summary. Wire the new test module into the suite.Acceptance:
afterburn discover --source dispatch-outcomes --path tests/fixtures/outcomes.jsonlexits 0 and writes anefficiency_trendfinding to the configured store;pytest tests/test_dispatch_outcomes.pygreen.Why this matters
Non-ask (scope fences)
dispatch-outcomesis an opt-in source.requests-only floor (OLS in plain Python).Estimate
~250 LOC impl (
dispatch_outcomes.py+ analyzers + CLI wiring) + ~120 LOC tests/fixtures.Cross-links