Skip to content

feat(mc): Pool B resilience suite (A+B+C+D)#30

Merged
aimerdoux merged 1 commit into
mainfrom
feat/pool-b-resilience
May 19, 2026
Merged

feat(mc): Pool B resilience suite (A+B+C+D)#30
aimerdoux merged 1 commit into
mainfrom
feat/pool-b-resilience

Conversation

@aimerdoux
Copy link
Copy Markdown
Owner

Summary

Four operator-visible improvements that together let WaveX run as a SaaS, not just a self-hosted toy. Each is independently shippable; bundling because they share verification surfaces (Pool B Health tab, mission control, connectors directory).

Change Surface
A Composio managed mode wavex-os-server env + ConnectorsSidebar UI
B UUID→slug translation (fixes "0/0 ready") plugin worker — affects every wavex-os-server call
C Inference life bar + throttle slider Pool B Health widget + new /api/pool-b-health/operator-quota endpoint
D Supabase Edge Function inference fallback new supabase/functions/wavex-inference-fallback/

A · Composio managed mode

Customers shouldn't paste their own Composio API key. Operator runs Composio, customer is billed for it via subscription tier.

  • New env WAVEX_COMPOSIO_MANAGED=1 on the server
  • /api/connectors/setup-status returns managed: boolean
  • UI's SetupScreen gate now skips the key-entry modal when managed: true — falls through to the catalog (or degraded catalog if Composio is transiently down)
  • BYOC stays the default when the env is unset — self-hosted users keep their existing flow

B · UUID→slug translation (THE big bug fix)

Diagnosed live:

/api/companies/<uuid>/agents (wavex-os-server) → []
/api/companies/<slug>/agents (wavex-os-server) → 35 agents
/api/companies/<uuid>/agents (paperclip core)  → 35 agents

Plugin context gives paperclip's UUID; wavex-os-server keys by the wavex slug (the ~/.wavex-os/instances/<slug>/ dir name). Every ${base}/api/.../<id>/... call was silently returning [].

  • New helper resolveWavexSlug(uuid) calls paperclip on first miss, caches the mapping in-process (immutable post-finalize)
  • ~30 sites replaced from const id = String(companyId ?? "") to const id = await resolveWavexSlug(...)
  • Falls back to raw input on lookup failure — degraded behavior unchanged for partial-wiring test cases

Net effect: Inception Status sidebar shows real agent count; Mission Control's Stream/Scoreboard/Map hydrate correctly; everything downstream of ${base}/api/.../<id>/... starts working.

C · Life bar + throttle slider

Operator wants a glance check before triggering big customer flows — "am I about to blow my Claude Max quota?" — plus a control surface to throttle.

  • GET /api/pool-b-health/operator-quota returns {tokens_used_*, cost_usd_*, requests_*, last_inference_at} over 24h / 7d / 30d windows
  • Pool B Health widget gains a top section: life bar with green/yellow/red zones (capped at $5/day soft visual ramp), numeric readout, throttle slider (1-120 req/min persisted to localStorage)
  • Throttle enforcement (queue/reject above cap on inference-server) is a follow-up PR; for now the slider is a planning surface. Life bar is fully live.

D · Supabase Edge Function inference fallback

When the operator's Mac is offline or rate-limited, the customer's 60s Realtime timeout currently fails silently. This adds a fallback the consumer can call:

POST /functions/v1/wavex-inference-fallback
body: { prompt, max_output_tokens, purpose, model? }
→ { ok: true, content, usage, source: "fallback" }
  • Auth-gated + subscription-gated (no free-tier free service on the operator's dime)
  • Defaults to claude-haiku-4-5-20251001 (~$0.005 per pillar-suggest)
  • Logs to usage_ledger with device_id=NULL so fallback usage is visible separately in the Pool B Health widget
  • Returns 503 + structured error if ANTHROPIC_API_KEY secret missing — browser gracefully no-ops

Consumer-side update (cloudInference.ts calling this on timeout) lands in a follow-up PR in wavex-experience-architect.

Deploy steps for D

  1. Generate Anthropic API key at console.anthropic.com (a backend key, NOT a Claude Code session token)
  2. supabase secrets set ANTHROPIC_API_KEY=sk-ant-...
  3. supabase functions deploy wavex-inference-fallback

What's NOT in this PR (next-up follow-ups)

  • Throttle enforcement server-side — slider is currently planning-only
  • Consumer cloudInference.ts calling the fallback function on timeout — needs a separate PR against wavex-experience-architect
  • Device pairing widget in paperclip — was in your follow-up list, will land in next PR
  • Subscription upgrade CTA in Pool B Health — same follow-up PR
  • Routing badge per turn in paperclip terminal — same follow-up PR

Verification

  • pnpm --filter @wavex-os/paperclip-plugin-wavex build → 240.0kb dist
  • packages/wavex-os-server tsc → clean
  • Test CI runs on push

🤖 Generated with Claude Code

Bundles four operator-visible improvements to the Pool B inference
stack that together let WaveX run as a SaaS instead of a self-hosted
toy. Companion to #28 (Pool B Health widget + claude CLI prereq).

A. Composio managed mode (operator-managed key, not customer-managed)
─────────────────────────────────────────────────────────────────────

The directory modal at /<company>/connectors used to demand each
customer paste their own Composio API key. For a hosted WaveX deploy
that's wrong: the operator runs the Composio account, the customer
just gets the connectors as part of their subscription tier.

  - New env: WAVEX_COMPOSIO_MANAGED=1 on wavex-os-server
  - /api/connectors/setup-status now returns { managed: boolean }
  - ConnectorsSidebar's SetupScreen gate skips the key-entry modal
    when managed: true — falls through to the catalog (or degraded
    catalog when Composio is transiently down) instead of demanding
    a key the customer doesn't have

Self-hosted users keep BYOC by leaving WAVEX_COMPOSIO_MANAGED unset.

B. UUID→slug translation in plugin worker (fixes "0/0 ready" + fleet
   showing 0 agents even when 35 are alive)
─────────────────────────────────────────────────────────────────────

The plugin context gives the worker paperclip's company UUID, but
wavex-os-server's loadCompanyManifest() / getOnboardingDir() index by
the wavex slug (~/.wavex-os/instances/<slug>/). Every single
${base}/api/.../<uuid>/... call was silently returning [] because the
manifest lookup failed.

Diagnosed live:
  /api/companies/<uuid>/agents       → []
  /api/companies/<slug>/agents       → 35 agents
  /api/companies/<uuid>/agents on paperclip core → 35 agents

  - New helper resolveWavexSlug(uuid) in worker.ts that calls paperclip
    on first miss, caches the mapping in-process (immutable post-finalize)
  - Replaces every `const id = String(companyId ?? "")` with the async
    resolver (~30 sites mechanical, plus the inception-status handler's
    inline pattern)
  - Falls back to the raw input on lookup failure for partial-wiring
    test cases — degraded behavior is the same as before this change

Net effect: Inception Status sidebar now shows the real agent count,
Mission Control's Stream / Scoreboard / Map all hydrate correctly,
everything that hit ${base}/api/.../<id>/... starts working.

C. Inference life bar + throttle slider in Pool B Health
─────────────────────────────────────────────────────────────────────

Operator wants a glance check before triggering big customer flows —
"am I about to blow my Claude Max quota?" — plus a control surface
to throttle Pool B request rate to spread load.

  - New endpoint GET /api/pool-b-health/operator-quota returning
    {tokens_used_24h, tokens_used_7d, tokens_used_30d, cost_usd_*,
    requests_24h, requests_7d, last_inference_at}. Aggregates
    wavex_os.usage_ledger rows over the windows; 30s cache.
  - New worker data registration pool-b-health-operator-quota.
  - UI: life bar with green/yellow/red zones (cap at $5/day soft
    visual ramp), 24h/7d/30d numeric readout, throttle slider 1-120
    req/min persisted to localStorage.

Throttle ENFORCEMENT (queue/reject above cap on the inference-server
side) lands in a follow-up PR — for now the slider is a planning
surface. The life bar is fully live.

D. Supabase Edge Function: Pool B inference fallback
─────────────────────────────────────────────────────────────────────

When the operator's Mac is offline or rate-limited, the customer's
60s Realtime timeout currently fails silently (empty pillar
suggestions, hardcoded narrate copy). This PR adds an Edge Function
that the consumer-side cloudInference() can call as a fallback after
its existing timeout fires:

  POST /functions/v1/wavex-inference-fallback
  body: { prompt, max_output_tokens, purpose, model? }
  → { ok: true, content, usage, source: "fallback" }

  - Auth-gated: requires the caller's JWT + an active/trialing
    subscription. Free tier returns 402 — no anonymous Anthropic calls
    on the operator's dime.
  - Defaults to claude-haiku-4-5 (cheap; ~$0.005 per pillar-suggest).
    Whitelist also allows haiku-4-5-20251001 + sonnet-4-6.
  - Logs each call to wavex_os.usage_ledger with device_id=NULL so
    fallback usage shows up in the Pool B Health widget separately
    from Mac-served calls.
  - Returns 503 with structured error if ANTHROPIC_API_KEY is not set
    in Supabase project secrets — browser gracefully falls back to
    its existing failure mode.

Consumer-side cloudInference.ts update (browser calls this on
timeout) lands in a follow-up PR in wavex-experience-architect.

Deploy steps for D:
  1. Generate Anthropic API key at console.anthropic.com (separate
     from CLI session token — backend use only).
  2. supabase secrets set ANTHROPIC_API_KEY=sk-ant-...
  3. supabase functions deploy wavex-inference-fallback

Verification
─────────────────────────────────────────────────────────────────────

- pnpm --filter @wavex-os/paperclip-plugin-wavex build → 240.0kb dist
- packages/wavex-os-server tsc → clean
- Diff: 7 files, ~330 insertions, ~30 deletions
- Test CI runs on push

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aimerdoux aimerdoux merged commit 1f170d8 into main May 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant