Skip to content

feat: dispatch-outcomes source — efficiency-trend + gate-friction findings from per-dispatch telemetry #9

@zenprocess

Description

@zenprocess

Filed from a private dispatch-discipline layer in the zenprocess ecosystem (a Claude Code hook-set that gates dispatches for atomicity and emits per-dispatch outcome telemetry). The producer stays opaque; this issue specifies only the afterburn-side capability — a new dispatch-outcomes source that turns a structured per-dispatch telemetry stream into afterburn findings.

Context

Afterburn today mines conversation transcripts (~/.claude/projects/**/*.jsonl) — unstructured message/tool-call history — into friction / patterns / gaps / releases findings. That's the qualitative layer.

There is a second, structured residual-intelligence source it does not yet read: per-dispatch outcome telemetry. Some Claude Code tooling emits an append-only JSONL log with one record per dispatch decision and one per completion (token counts, cache hit ratios, gate decisions, acceptance pass/fail). This is purpose-built numeric telemetry, not transcript prose — so a regex/RLM transcript pass is the wrong instrument. It needs a typed source adapter.

The question this answers is quantitative and falsifiable: does input-tokens-per-completed-task trend down as a dispatch layer enforces atomic decomposition + prompt-cache reuse? That's a time-series regression over a documented schema, and it belongs in afterburn — the ecosystem's spent-session intelligence product — not duplicated inside each producer.

This is the same producer→findings pattern as #8 (switchyard → afterburn): the producer emits raw signal; afterburn owns analysis, findings, retrieval, and evolution.

The source schema (dispatch-outcomes)

An append-only JSONL stream. Two record kinds, joined on prompt_fingerprint (+ attempt_id when present):

kind: "dispatch" — emitted at the gate, before the work runs:

field type meaning
kind "dispatch" record discriminator
ts_dispatched int (ms) gate decision time
agent str dispatched agent/arm name
attempt_id str unique per dispatch attempt
prompt_fingerprint str stable hash of the task prompt (join key)
prompt_chars int prompt length
prompt_files_named int count of files named in the prompt
prompt_input_token_estimate int pre-dispatch token estimate
has_design_verb bool prompt contains a non-atomic "design" verb
has_acceptance_verb bool prompt contains a closed-form acceptance command
atomic_hook_outcome str "allow" | "deny" (gate decision)
block_reasons str[] why the gate blocked (empty on allow)
rationalization_match str|null which known "rationalization" pattern the prompt matched

kind: "finalize" — emitted at completion:

field type meaning
kind "finalize" record discriminator
ts_completed int (ms) completion time
agent / agent_id str arm + concrete instance id
prompt_fingerprint str join key back to the dispatch record
status str "completed" | "failed" | …
ms_elapsed int wall-clock
total_tokens / input_tokens / output_tokens int usage
cache_read_tokens / cache_creation_tokens int prompt-cache usage
acceptance_passed bool|null did the closed-form acceptance command pass
suggested_registry_edit obj|null producer-proposed capability/registry change
error str|null failure detail

The schema is generic dispatch telemetry — no secrets, no hostnames, no private product names. The source path is a CLI arg / env var, never hardcoded (the producer's state dir is private and machine-local).

Proposed changes

US-A — dispatch-outcomes source adapter + schema parser

New flat module afterburn/dispatch_outcomes.py. Parse the JSONL, tolerate unknown/extra fields (forward-compat), and join dispatchfinalize on prompt_fingerprint (+attempt_id) into a typed DispatchRecord. Skip malformed lines with a counted warning (never crash a run on one bad line).
Acceptance: pytest tests/test_dispatch_outcomes.py -k parse_and_join — a fixture with 2 dispatch + 2 finalize lines (one unmatched, one malformed) yields exactly 1 joined record + 1 unmatched + 1 skipped, asserted by count.

US-B — efficiency-trend analyzer + verdict finding

Compute input-tokens-per-completed-task (restrict to status==completed, optionally acceptance_passed==true) as a time-ordered series, per-arm and global. Fit a simple slope (no heavy deps — ordinary least squares in plain Python over the requests-only dependency floor) + report the cache-read ratio distribution. Emit one efficiency_trend finding per arm and one global, each with: n, slope sign, first-vs-last-decile median, and a PASS/FAIL on "trend is non-increasing."
Acceptance: pytest -k efficiency_trend — a synthetic descending series yields verdict: pass; a flat/ascending series yields verdict: fail. The finding JSON validates against the findings schema.

US-C — gate-friction findings (reuse the correction taxonomy from #3)

Map atomic_hook_outcome=="deny" + block_reasons + rationalization_match onto afterburn's existing friction finding type. The producer's "rationalizations" are structurally a correction taxonomy (cf. #3) — surface the top-N block reasons and their frequencies as friction findings with remediation hints ("prompt named >3 files", "design verb without decomposition", etc.).
Acceptance: pytest -k gate_friction — a fixture with 3 denies (2 sharing a rationalization) emits a friction finding ranking that rationalization first, count==2.

US-D — registry-edit suggestion findings

Surface non-null suggested_registry_edit values as candidate findings (dedupe by a signature characteristic, mirroring #8's signature_characteristic filter). This is the producer's half-built "outcome → capability-PR" loop; afterburn owns turning the suggestion into a reviewable, verifiable finding (status transitions via #8 US-D PATCH …/verify).
Acceptance: pytest -k registry_suggestions — 3 finalize records (2 identical suggestions) yield 1 deduped finding with occurrences==2.

US-E — CLI wiring + docs + tests

afterburn discover --source dispatch-outcomes --path <jsonl> (and AFTERBURN_DISPATCH_OUTCOMES_PATH env fallback). Default source set unchanged (transcript scan) — dispatch-outcomes is opt-in. README "What It Does" table gains a row; add the findings to the .afterburn/ output summary. Wire the new test module into the suite.
Acceptance: afterburn discover --source dispatch-outcomes --path tests/fixtures/outcomes.jsonl exits 0 and writes an efficiency_trend finding to the configured store; pytest tests/test_dispatch_outcomes.py green.

Why this matters

Non-ask (scope fences)

  • No change to the default transcript scan — dispatch-outcomes is an opt-in source.
  • No coupling to any specific producer — adapter reads a documented schema from a caller-supplied path; tolerates extra fields.
  • No new heavy deps — stay within the current requests-only floor (OLS in plain Python).
  • No secrets/paths/product names committed — path is runtime-supplied.

Estimate

~250 LOC impl (dispatch_outcomes.py + analyzers + CLI wiring) + ~120 LOC tests/fixtures.

Cross-links

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions