Skip to content

Tracking: live RL metrics suite in the statusline (gated on ruflo#2239 + RuVector#519) #8

@pacphi

Description

@pacphi

Tracking: live RL metrics suite in the statusline (gated on two upstream fixes)

North star: an RL metrics panel in the statusline footer, live-updated from real CLI/API use and actual learnings — not a synthetic one-shot proof. This issue collects every material needed to kick the work off the moment the two upstream blockers close, so we don't have to re-derive context.

Blocked by (do not start until both are closed/fixed)

Upstream history (already resolved — context, not blockers)

  • #2222 (closed) filed four findings; ruvnet fixed F1/F2 (route-feedback persistence via saveModel(), 3.10.6, @pacphi credited). F2b neg-reward parser → 3.10.7. Route cache + --explore false → 3.10.8. ADR-142 model-router bandit → 3.10.9 (separate subsystem). Installed/latest = 3.10.10.
  • Full reconciliation + findings: docs/upstream/ruflo-self-improvement-findings.md.

What to build once unblocked

Revive the dropped Task 5 from the original design as a live panel. The original 📈 RL line was deferred because its only source was the synthetic improvement.json eval — "thin without live value." The fix: source the genuinely-live signals from the persisted model, keep the eval as a periodic credibility anchor, and gate the SONA/Δ-LoRA signal behind F4.

Panel Source Live from real CLI/API? Gate
Route Q-learner — ε↓, δ̄ (TD error), |Q|, updateCount .swarm/q-learning-model.json (persists since 3.10.6 F2 fix) Yes — now Footnote |Q| until F3 lands (shape-bucketed, undercounts)
Proof / verdict — PASS/FAIL, Δpp, CI, p, d .claude-flow/improvement.json (from ruflo-improvement-eval) Periodic (eval/daemon) none
SONA patterns / trajectories .claude-flow/neural/stats.json Yes (already in footer) none
Δ LoRA (adaptation signal) transient MicroLoRA delta No Blocked on F4 — do not render as "improvement" until the learn path is wired

Activation checklist (when F3 + F4 close)

  • Re-verify F3 fix shipped: q-learning-router.js encoder no longer truncates keyword groups (re-run the collapse probe in bin/ruflo-improvement-eval --probe-states; expect N tasks → N states).
  • Re-verify F4 fix shipped: learn_from_feedback (or equivalent) actually moves weights; re-run the repro_delta_zero cargo test against the new ruvector — single-step Δ should be > 0 (or the API should expose a documented adapt+flush path).
  • Implement the live route-Q panel in shell/ruflo-functions.sh (the rufloActivationSegments heredoc helper): fs-only read of .swarm/q-learning-model.json📈 RL ε<e>↓ · δ̄<td>↓ · |Q|<n> · upd<c>. Marker-guarded, upgrade-safe (same injector as the existing footer / ruflo-fix-statusline-version).
  • Add the eval-verdict segment (PASS/FAIL + Δpp/CI/p/d from improvement.json) — render ◷ RL on FAIL or absent.
  • Un-gate the SONA Δ-LoRA segment only after F4 verification passes.
  • Wire a refresh path so the eval verdict stays current (e.g. ruflo daemon worker or an on-demand ruflo-improvement-eval hook). Decide cadence.
  • ruflo-resync must re-inject the segment after a ruflo/aqe upgrade regenerates the statusline.
  • Honesty pass: docs state which signals are live vs periodic vs gated; no fabricated "improvement."

Materials / references (everything needed to start)

Acceptance

The footer shows a live 📈 RL line whose route-Q fields (ε/δ̄/|Q|/updates) move in response to real ruflo route feedback / routing use, the eval verdict reflects the last A/B run, and the SONA adaptation signal is either un-gated (F4 fixed) or honestly absent. No metric is sourced from route stats (broken CLI path) and none is fabricated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions