You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tracking: live RL metrics suite in the statusline (gated on two upstream fixes)
North star: an RL metrics panel in the statusline footer, live-updated from real CLI/API use and actual learnings — not a synthetic one-shot proof. This issue collects every material needed to kick the work off the moment the two upstream blockers close, so we don't have to re-derive context.
Blocked by (do not start until both are closed/fixed)
q-learning-router.jsfeatureVectorToKey discards features 0–31 (the entire keyword-presence block) via a 31-bit truncating hash fold; keyword-distinct tasks collapse to one Q-state. Directly degrades a metric we want to display (|Q| undercounts true task diversity).
WasmSonaEngine::learn_from_feedback is a no-op stub; JS SonaCoordinator.processInstantLearning is empty; single-step trajectories yield Δ=0 (REINFORCE baseline) and only up_proj ever adapts. Gates the SONA / Δ-LoRA part of the panel — must not present Δ-LoRA as "improvement" until this lands.
Upstream history (already resolved — context, not blockers)
Revive the dropped Task 5 from the original design as a live panel. The original 📈 RL line was deferred because its only source was the synthetic improvement.json eval — "thin without live value." The fix: source the genuinely-live signals from the persisted model, keep the eval as a periodic credibility anchor, and gate the SONA/Δ-LoRA signal behind F4.
Blocked on F4 — do not render as "improvement" until the learn path is wired
Activation checklist (when F3 + F4 close)
Re-verify F3 fix shipped: q-learning-router.js encoder no longer truncates keyword groups (re-run the collapse probe in bin/ruflo-improvement-eval --probe-states; expect N tasks → N states).
Re-verify F4 fix shipped: learn_from_feedback (or equivalent) actually moves weights; re-run the repro_delta_zero cargo test against the new ruvector — single-step Δ should be > 0 (or the API should expose a documented adapt+flush path).
Implement the live route-Q panel in shell/ruflo-functions.sh (the rufloActivationSegments heredoc helper): fs-only read of .swarm/q-learning-model.json → 📈 RL ε<e>↓ · δ̄<td>↓ · |Q|<n> · upd<c>. Marker-guarded, upgrade-safe (same injector as the existing footer / ruflo-fix-statusline-version).
Add the eval-verdict segment (PASS/FAIL + Δpp/CI/p/d from improvement.json) — render ◷ RL on FAIL or absent.
Un-gate the SONA Δ-LoRA segment only after F4 verification passes.
Wire a refresh path so the eval verdict stays current (e.g. ruflo daemon worker or an on-demand ruflo-improvement-eval hook). Decide cadence.
ruflo-resync must re-inject the segment after a ruflo/aqe upgrade regenerates the statusline.
Honesty pass: docs state which signals are live vs periodic vs gated; no fabricated "improvement."
Materials / references (everything needed to start)
Implementation plan (historical):docs/superpowers/plans/2026-05-28-self-improvement-eval.md — Task 5 has the full rufloActivationSegments heredoc patch, idempotency test, and live-apply steps (sourced improvement.json; adapt to source .swarm/q-learning-model.json for the live route panel).
Existing footer fields (claude/ruflo-reference.md "Status-line activation footer"): SONA patterns/traj, ⚡ HNSW, 🛡 aidefence, 🎓 Agentic QE, Δ LoRA (currently from ruflo-neural-train cache — revisit under F4).
Acceptance
The footer shows a live 📈 RL line whose route-Q fields (ε/δ̄/|Q|/updates) move in response to real ruflo route feedback / routing use, the eval verdict reflects the last A/B run, and the SONA adaptation signal is either un-gated (F4 fixed) or honestly absent. No metric is sourced from route stats (broken CLI path) and none is fabricated.
Tracking: live RL metrics suite in the statusline (gated on two upstream fixes)
North star: an RL metrics panel in the statusline footer, live-updated from real CLI/API use and actual learnings — not a synthetic one-shot proof. This issue collects every material needed to kick the work off the moment the two upstream blockers close, so we don't have to re-derive context.
Blocked by (do not start until both are closed/fixed)
q-learning-router.jsfeatureVectorToKeydiscards features 0–31 (the entire keyword-presence block) via a 31-bit truncating hash fold; keyword-distinct tasks collapse to one Q-state. Directly degrades a metric we want to display (|Q|undercounts true task diversity).WasmSonaEngine::learn_from_feedbackis a no-op stub; JSSonaCoordinator.processInstantLearningis empty; single-step trajectories yield Δ=0 (REINFORCE baseline) and onlyup_projever adapts. Gates the SONA / Δ-LoRA part of the panel — must not present Δ-LoRA as "improvement" until this lands.Upstream history (already resolved — context, not blockers)
saveModel(), 3.10.6, @pacphi credited). F2b neg-reward parser → 3.10.7. Route cache +--explore false→ 3.10.8. ADR-142 model-router bandit → 3.10.9 (separate subsystem). Installed/latest = 3.10.10.docs/upstream/ruflo-self-improvement-findings.md.What to build once unblocked
Revive the dropped Task 5 from the original design as a live panel. The original
📈 RLline was deferred because its only source was the syntheticimprovement.jsoneval — "thin without live value." The fix: source the genuinely-live signals from the persisted model, keep the eval as a periodic credibility anchor, and gate the SONA/Δ-LoRA signal behind F4..swarm/q-learning-model.json(persists since 3.10.6 F2 fix)|Q|until F3 lands (shape-bucketed, undercounts).claude-flow/improvement.json(fromruflo-improvement-eval).claude-flow/neural/stats.jsonActivation checklist (when F3 + F4 close)
q-learning-router.jsencoder no longer truncates keyword groups (re-run the collapse probe inbin/ruflo-improvement-eval --probe-states; expect N tasks → N states).learn_from_feedback(or equivalent) actually moves weights; re-run therepro_delta_zerocargo test against the new ruvector — single-step Δ should be > 0 (or the API should expose a documented adapt+flush path).shell/ruflo-functions.sh(therufloActivationSegmentsheredoc helper): fs-only read of.swarm/q-learning-model.json→📈 RL ε<e>↓ · δ̄<td>↓ · |Q|<n> · upd<c>. Marker-guarded, upgrade-safe (same injector as the existing footer /ruflo-fix-statusline-version).improvement.json) — render◷ RLon FAIL or absent.ruflo daemonworker or an on-demandruflo-improvement-evalhook). Decide cadence.ruflo-resyncmust re-inject the segment after a ruflo/aqe upgrade regenerates the statusline.Materials / references (everything needed to start)
docs/superpowers/specs/2026-05-28-self-improvement-eval-design.md— §3.1 statusline📈 RLsegment contract, R9–R11, S1–S5.docs/superpowers/plans/2026-05-28-self-improvement-eval.md— Task 5 has the fullrufloActivationSegmentsheredoc patch, idempotency test, and live-apply steps (sourcedimprovement.json; adapt to source.swarm/q-learning-model.jsonfor the live route panel).bin/ruflo-improvement-eval—--probe-states(F3 collapse check),--cli-check(F2 persistence, version-aware),--inspect-decision,--smoke,--json,--check. Writes.claude-flow/improvement.json.repro_delta_zero.rscargo test (full source in Bug: SONA learn→inference loop unwired at the JS/WASM boundary — learn_from_feedback is a no-op; MicroLoRA adapts only on multi-step varying-reward trajectories ruvnet/RuVector#519) — re-run post-fix as the un-gate gate.shell/ruflo-functions.sh→ruflo-fix-statusline-version(marker-guarded heredoc; never rewrite ruflo's native lines).claude/ruflo-reference.md"Status-line activation footer"): SONA patterns/traj, ⚡ HNSW, 🛡 aidefence, 🎓 Agentic QE, Δ LoRA (currently fromruflo-neural-traincache — revisit under F4).Acceptance
The footer shows a live
📈 RLline whose route-Q fields (ε/δ̄/|Q|/updates) move in response to realruflo route feedback/ routing use, the eval verdict reflects the last A/B run, and the SONA adaptation signal is either un-gated (F4 fixed) or honestly absent. No metric is sourced fromroute stats(broken CLI path) and none is fabricated.