Skip to content

feat(eve): Gateway-owned realtime voice control plane (A-lite, greenfield)#132

Draft
kkdawkins wants to merge 5 commits into
kdawkins.realtime-voice-channelfrom
kdawkins.realtime-voice-gateway-control
Draft

feat(eve): Gateway-owned realtime voice control plane (A-lite, greenfield)#132
kkdawkins wants to merge 5 commits into
kdawkins.realtime-voice-channelfrom
kdawkins.realtime-voice-gateway-control

Conversation

@kkdawkins

Copy link
Copy Markdown

Summary

A second, greenfield approach to realtime voice for Eve, opened against the client-driven branch as the base so that path stays the shipping method and this is a clean side-by-side alternative.

  • Base branch (unchanged): the client/app-orchestrated method — /setup mints a realtime token, finalized transcripts run as normal /eve/v1/session durable turns, the client speaks on message.completed. No /turn.
  • This PR: a Gateway-owned control plane ("A-lite"). The control model is inverted — instead of the browser orchestrating turns, AI Gateway owns turn timing and dials Eve over a server-side control socket. Eve stays the brain; the realtime model is only ears + mouth.

Opt-in only: realtimeSpeechChannel({ control: true }) + useEveVoice({ controlMode: true }). Default behavior is unchanged.

How it works (control mode)

Browser ── audio WSS ──▶ AI Gateway ── provider WS ──▶ realtime model (ears+mouth)
   │ (controlMode)            │
   └─ /setup ──▶ Eve          └─ control WSS (outbound, per session) ──▶ Eve WS() route
        mints vcst_ with            Gateway→Eve: input.transcript.final, lifecycle, barge-in
        { control: {mode,url,token} } Eve→Gateway: response.delta / response.done / response.cancel
  • /setup mints a vcst_ carrying an Eve control config { mode, url, token } (sealed by the Gateway).
  • The Gateway dials Eve's WS() control route ({basePath}/ws) per live session with Authorization: Bearer <controlToken> and subprotocol eve-voice-control.v1; Eve verifies a stateless HMAC control token.
  • The Gateway transcribes browser audio and sends input.transcript.final to Eve; Eve runs a durable turn and streams semantic response.delta/response.done back; the Gateway injects them as TTS. Audio never flows through Eve.
  • A capability handshake (session.opened.engine.capabilities) lets Eve degrade gracefully (finals-only, barge-in only when supported, run the turn even without audio output).

What's here

New packages/eve/src/public/channels/vercel/:

  • control-token.ts — stateless HMAC control token (mint at /setup, verify at WS upgrade)
  • control-url.ts — builds control.url + deploy-protection bypass query
  • voice-control-protocol.tseve-voice-control.v1 envelope + capability parsing
  • voice-turn-coordinator.ts — settle-debounce, backchannel/duplicate suppression, durable turn → response.delta/done, barge-in, capability gating
  • speech.tscontrol opt-in + WS() control route
  • react/voice.tscontrolMode (audio-only; no client-driven turns)

Plus tests (token, url, protocol, coordinator, mint + WS-upgrade auth, client mode), docs, and an env-gated controlMode toggle in the Next demo.

Verification

Verified end-to-end against a live AI Gateway preview: mint → Gateway dials Eve → control-token verify → transcribe → durable turn (with a tool call) → response.delta/done → TTS audio back. Unit/integration/lint/typecheck/guard green on the Eve side.

⚠️ Dependency / caveat (experimental)

Requires an @ai-sdk/gateway change that threads a control config through getToken into the client-secret mint body. It is vendored as a tarball (vendor/…tgz) and pinned via a pnpm-workspace.yaml overrides entry so this branch is self-contained. This is temporary — to be replaced once that change lands upstream. Also requires the matching AI Gateway control-plane support. Not for release as-is; this is a first greenfield attempt for review.

…-lite)

A second, greenfield attempt at realtime voice that inverts the control
model: instead of the browser/app orchestrating turns (the client-driven
durable-session path on the base branch), AI Gateway owns turn timing and
dials Eve over a server-side control socket. Eve stays the brain; the
realtime model is only ears + mouth.

Opt in with `realtimeSpeechChannel({ control: true })` and
`useEveVoice({ controlMode: true })`. Default behavior is unchanged — the
client-driven path remains the shipping method.

How it works (control mode):
- `/setup` mints a vcst_ carrying an Eve control config { mode, url, token }.
- AI Gateway dials Eve's WS() control route ({basePath}/ws) per live session
  with `Authorization: Bearer <controlToken>` and subprotocol
  eve-voice-control.v1; Eve verifies a stateless HMAC control token.
- Browser audio stays browser <-> Gateway; the Gateway transcribes and sends
  input.transcript.final to Eve. Eve runs a durable turn and streams semantic
  response.delta/response.done back; the Gateway injects them as TTS.
- Eve never sees raw audio.

New (packages/eve/src/public/channels/vercel/):
- control-token.ts  — stateless HMAC control token (mint at /setup, verify at upgrade)
- control-url.ts     — builds control.url + deploy-protection bypass query
- voice-control-protocol.ts — eve-voice-control.v1 envelope + capability handshake
- voice-turn-coordinator.ts — settle-debounce, backchannel/dup suppression,
  durable turn -> response.delta/done, barge-in, capability gating
- speech.ts          — `control` opt-in + WS() control route
- react/voice.ts     — `controlMode` (audio-only; no client turns)

Verified end-to-end against a live AI Gateway preview: mint -> dial -> verify
-> transcribe -> durable turn (tool call) -> response.delta/done -> TTS.

Note: depends on an @ai-sdk/gateway change that threads `control` through
getToken into the client-secret mint body; it is vendored as a tarball
(vendor/) and pinned via a pnpm-workspace.yaml override until it lands
upstream. Experimental — not for release as-is.
@vercel

vercel Bot commented Jun 20, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
eve-docs Ready Ready Preview, Comment, Open in v0 Jun 21, 2026 8:20pm

Comment thread packages/eve/src/public/channels/vercel/voice-turn-coordinator.ts
In Gateway-control mode the durable turn runs server-side, so the typed-chat
feed never sees it. Build a live transcript feed from the realtime events the
browser does receive — the user's finalized transcript and the agent's streamed
spoken words — and show a Thinking… row in the gap while Eve runs the turn.
- Vendor-name the channel: vercelSpeechChannel + Vercel* types (was realtimeSpeech*).
- Wrap the AI SDK realtime client-secret behind eve-owned VercelRealtimeClientSecret
  so the public channel surface no longer re-exports Experimental_* AI SDK types.
- useEveVoice: expose a live transcript projection (messages + isThinking) that works
  in both control and client-driven modes, capped at 256 (MAX_VOICE_MESSAGES); demo
  consumes it instead of hand-rolling accumulation.
- Control plane: durable VoiceControlStateStore (continuation/cursor recovery across
  reconnects, in-memory default), opt-in AI_GATEWAY_API_KEY secret fallback,
  wss-only control URL with Vercel-host-scoped deploy-protection bypass, failed
  durable turns emit error (not response.done), barge-in drains the durable stream +
  duplicate-text suppression when the Gateway omits itemId.
- Drop the now-unused control-url request field; unify on allowGatewayKeyFallback.
- Docs + Next demo updated.
Tag turn.started/response.delta/response.done/response.cancel with a per-turn
turnId so the Gateway can drop frames from a superseded turn (post-barge-in
race) by id. Pairs with the Gateway-side consumer in ai-gateway #2334.
Replace Eve's hand-mirrored realtime voice control protocol with a thin
re-export shim over @ai-sdk/gateway's shared speech-engine module, so the
wire contract has a single source of truth across Eve and the Gateway.
Add @ai-sdk/gateway as an optional peer + dev dependency and revendor the
locally-built gateway tarball (now including the shared protocol, codec,
and GatewaySpeechEngineSession helper).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant