Tracking: live RL metrics suite in the statusline (gated on ruflo#2239 + RuVector#519)

## Tracking: live RL metrics suite in the statusline (gated on two upstream fixes)

**North star:** an RL metrics panel in the statusline footer, **live-updated from real CLI/API use and actual learnings** — not a synthetic one-shot proof. This issue collects every material needed to kick the work off the moment the two upstream blockers close, so we don't have to re-derive context.

### Blocked by (do not start until both are closed/fixed)

- [ ] **F3 — route Q-state encoder collapse** → ruvnet/ruflo#2239
  - `q-learning-router.js` `featureVectorToKey` discards features 0–31 (the entire keyword-presence block) via a 31-bit truncating hash fold; keyword-distinct tasks collapse to one Q-state. Directly degrades a metric we want to display (`|Q|` undercounts true task diversity).
- [ ] **F4 — SONA learn→inference loop unwired at the JS/WASM boundary** → ruvnet/RuVector#519
  - `WasmSonaEngine::learn_from_feedback` is a no-op stub; JS `SonaCoordinator.processInstantLearning` is empty; single-step trajectories yield Δ=0 (REINFORCE baseline) and only `up_proj` ever adapts. Gates the **SONA / Δ-LoRA** part of the panel — must not present Δ-LoRA as "improvement" until this lands.

### Upstream history (already resolved — context, not blockers)

- #2222 (closed) filed four findings; ruvnet fixed **F1/F2** (route-feedback persistence via `saveModel()`, 3.10.6, @pacphi credited). F2b neg-reward parser → 3.10.7. Route cache + `--explore false` → 3.10.8. ADR-142 model-router bandit → 3.10.9 (separate subsystem). Installed/latest = **3.10.10**.
- Full reconciliation + findings: [`docs/upstream/ruflo-self-improvement-findings.md`](../blob/feat/self-improvement-eval/docs/upstream/ruflo-self-improvement-findings.md).

---

## What to build once unblocked

Revive the **dropped Task 5** from the original design as a *live* panel. The original `📈 RL` line was deferred because its only source was the synthetic `improvement.json` eval — "thin without live value." The fix: source the genuinely-live signals from the persisted model, keep the eval as a periodic credibility anchor, and **gate** the SONA/Δ-LoRA signal behind F4.

| Panel | Source | Live from real CLI/API? | Gate |
|---|---|---|---|
| **Route Q-learner** — ε↓, δ̄ (TD error), \|Q\|, updateCount | `.swarm/q-learning-model.json` (persists since 3.10.6 F2 fix) | **Yes — now** | Footnote `\|Q\|` until **F3** lands (shape-bucketed, undercounts) |
| **Proof / verdict** — PASS/FAIL, Δpp, CI, p, d | `.claude-flow/improvement.json` (from `ruflo-improvement-eval`) | Periodic (eval/daemon) | none |
| **SONA patterns / trajectories** | `.claude-flow/neural/stats.json` | Yes (already in footer) | none |
| **Δ LoRA (adaptation signal)** | transient MicroLoRA delta | **No** | **Blocked on F4** — do not render as "improvement" until the learn path is wired |

### Activation checklist (when F3 + F4 close)

- [ ] Re-verify F3 fix shipped: `q-learning-router.js` encoder no longer truncates keyword groups (re-run the collapse probe in `bin/ruflo-improvement-eval --probe-states`; expect N tasks → N states).
- [ ] Re-verify F4 fix shipped: `learn_from_feedback` (or equivalent) actually moves weights; re-run the `repro_delta_zero` cargo test against the new ruvector — single-step Δ should be > 0 (or the API should expose a documented adapt+flush path).
- [ ] Implement the live route-Q panel in `shell/ruflo-functions.sh` (the `rufloActivationSegments` heredoc helper): fs-only read of `.swarm/q-learning-model.json` → `📈 RL  ε<e>↓ · δ̄<td>↓ · |Q|<n> · upd<c>`. Marker-guarded, upgrade-safe (same injector as the existing footer / `ruflo-fix-statusline-version`).
- [ ] Add the eval-verdict segment (PASS/FAIL + Δpp/CI/p/d from `improvement.json`) — render `◷ RL` on FAIL or absent.
- [ ] Un-gate the SONA Δ-LoRA segment only after F4 verification passes.
- [ ] Wire a refresh path so the eval verdict stays current (e.g. `ruflo daemon` worker or an on-demand `ruflo-improvement-eval` hook). Decide cadence.
- [ ] `ruflo-resync` must re-inject the segment after a ruflo/aqe upgrade regenerates the statusline.
- [ ] Honesty pass: docs state which signals are live vs periodic vs gated; no fabricated "improvement."

### Materials / references (everything needed to start)

- **Design spec (historical, Task 5 dropped):** [`docs/superpowers/specs/2026-05-28-self-improvement-eval-design.md`](../blob/feat/self-improvement-eval/docs/superpowers/specs/2026-05-28-self-improvement-eval-design.md) — §3.1 statusline `📈 RL` segment contract, R9–R11, S1–S5.
- **Implementation plan (historical):** [`docs/superpowers/plans/2026-05-28-self-improvement-eval.md`](../blob/feat/self-improvement-eval/docs/superpowers/plans/2026-05-28-self-improvement-eval.md) — **Task 5** has the full `rufloActivationSegments` heredoc patch, idempotency test, and live-apply steps (sourced `improvement.json`; adapt to source `.swarm/q-learning-model.json` for the live route panel).
- **Proof harness:** `bin/ruflo-improvement-eval` — `--probe-states` (F3 collapse check), `--cli-check` (F2 persistence, version-aware), `--inspect-decision`, `--smoke`, `--json`, `--check`. Writes `.claude-flow/improvement.json`.
- **F4 reproduction test:** the `repro_delta_zero.rs` cargo test (full source in ruvnet/RuVector#519) — re-run post-fix as the un-gate gate.
- **Statusline injector:** `shell/ruflo-functions.sh` → `ruflo-fix-statusline-version` (marker-guarded heredoc; never rewrite ruflo's native lines).
- **Existing footer fields** (`claude/ruflo-reference.md` "Status-line activation footer"): SONA patterns/traj, ⚡ HNSW, 🛡 aidefence, 🎓 Agentic QE, Δ LoRA (currently from `ruflo-neural-train` cache — revisit under F4).

### Acceptance

The footer shows a live `📈 RL` line whose route-Q fields (ε/δ̄/|Q|/updates) move in response to real `ruflo route feedback` / routing use, the eval verdict reflects the last A/B run, and the SONA adaptation signal is either un-gated (F4 fixed) or honestly absent. No metric is sourced from `route stats` (broken CLI path) and none is fabricated.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking: live RL metrics suite in the statusline (gated on ruflo#2239 + RuVector#519) #8

Tracking: live RL metrics suite in the statusline (gated on two upstream fixes)

Blocked by (do not start until both are closed/fixed)

Upstream history (already resolved — context, not blockers)

What to build once unblocked

Activation checklist (when F3 + F4 close)

Materials / references (everything needed to start)

Acceptance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Panel	Source	Live from real CLI/API?	Gate
Route Q-learner — ε↓, δ̄ (TD error), \|Q\|, updateCount	`.swarm/q-learning-model.json` (persists since 3.10.6 F2 fix)	Yes — now	Footnote `\|Q\|` until F3 lands (shape-bucketed, undercounts)
Proof / verdict — PASS/FAIL, Δpp, CI, p, d	`.claude-flow/improvement.json` (from `ruflo-improvement-eval`)	Periodic (eval/daemon)	none
SONA patterns / trajectories	`.claude-flow/neural/stats.json`	Yes (already in footer)	none
Δ LoRA (adaptation signal)	transient MicroLoRA delta	No	Blocked on F4 — do not render as "improvement" until the learn path is wired

Tracking: live RL metrics suite in the statusline (gated on ruflo#2239 + RuVector#519) #8

Description

Tracking: live RL metrics suite in the statusline (gated on two upstream fixes)

Blocked by (do not start until both are closed/fixed)

Upstream history (already resolved — context, not blockers)

What to build once unblocked

Activation checklist (when F3 + F4 close)

Materials / references (everything needed to start)

Acceptance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions