feat(eve): Gateway-owned realtime voice control plane (A-lite, greenfield)#132
Draft
kkdawkins wants to merge 5 commits into
Draft
feat(eve): Gateway-owned realtime voice control plane (A-lite, greenfield)#132kkdawkins wants to merge 5 commits into
kkdawkins wants to merge 5 commits into
Conversation
…-lite)
A second, greenfield attempt at realtime voice that inverts the control
model: instead of the browser/app orchestrating turns (the client-driven
durable-session path on the base branch), AI Gateway owns turn timing and
dials Eve over a server-side control socket. Eve stays the brain; the
realtime model is only ears + mouth.
Opt in with `realtimeSpeechChannel({ control: true })` and
`useEveVoice({ controlMode: true })`. Default behavior is unchanged — the
client-driven path remains the shipping method.
How it works (control mode):
- `/setup` mints a vcst_ carrying an Eve control config { mode, url, token }.
- AI Gateway dials Eve's WS() control route ({basePath}/ws) per live session
with `Authorization: Bearer <controlToken>` and subprotocol
eve-voice-control.v1; Eve verifies a stateless HMAC control token.
- Browser audio stays browser <-> Gateway; the Gateway transcribes and sends
input.transcript.final to Eve. Eve runs a durable turn and streams semantic
response.delta/response.done back; the Gateway injects them as TTS.
- Eve never sees raw audio.
New (packages/eve/src/public/channels/vercel/):
- control-token.ts — stateless HMAC control token (mint at /setup, verify at upgrade)
- control-url.ts — builds control.url + deploy-protection bypass query
- voice-control-protocol.ts — eve-voice-control.v1 envelope + capability handshake
- voice-turn-coordinator.ts — settle-debounce, backchannel/dup suppression,
durable turn -> response.delta/done, barge-in, capability gating
- speech.ts — `control` opt-in + WS() control route
- react/voice.ts — `controlMode` (audio-only; no client turns)
Verified end-to-end against a live AI Gateway preview: mint -> dial -> verify
-> transcribe -> durable turn (tool call) -> response.delta/done -> TTS.
Note: depends on an @ai-sdk/gateway change that threads `control` through
getToken into the client-secret mint body; it is vendored as a tarball
(vendor/) and pinned via a pnpm-workspace.yaml override until it lands
upstream. Experimental — not for release as-is.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
In Gateway-control mode the durable turn runs server-side, so the typed-chat feed never sees it. Build a live transcript feed from the realtime events the browser does receive — the user's finalized transcript and the agent's streamed spoken words — and show a Thinking… row in the gap while Eve runs the turn.
- Vendor-name the channel: vercelSpeechChannel + Vercel* types (was realtimeSpeech*). - Wrap the AI SDK realtime client-secret behind eve-owned VercelRealtimeClientSecret so the public channel surface no longer re-exports Experimental_* AI SDK types. - useEveVoice: expose a live transcript projection (messages + isThinking) that works in both control and client-driven modes, capped at 256 (MAX_VOICE_MESSAGES); demo consumes it instead of hand-rolling accumulation. - Control plane: durable VoiceControlStateStore (continuation/cursor recovery across reconnects, in-memory default), opt-in AI_GATEWAY_API_KEY secret fallback, wss-only control URL with Vercel-host-scoped deploy-protection bypass, failed durable turns emit error (not response.done), barge-in drains the durable stream + duplicate-text suppression when the Gateway omits itemId. - Drop the now-unused control-url request field; unify on allowGatewayKeyFallback. - Docs + Next demo updated.
Tag turn.started/response.delta/response.done/response.cancel with a per-turn turnId so the Gateway can drop frames from a superseded turn (post-barge-in race) by id. Pairs with the Gateway-side consumer in ai-gateway #2334.
Replace Eve's hand-mirrored realtime voice control protocol with a thin re-export shim over @ai-sdk/gateway's shared speech-engine module, so the wire contract has a single source of truth across Eve and the Gateway. Add @ai-sdk/gateway as an optional peer + dev dependency and revendor the locally-built gateway tarball (now including the shared protocol, codec, and GatewaySpeechEngineSession helper).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A second, greenfield approach to realtime voice for Eve, opened against the client-driven branch as the base so that path stays the shipping method and this is a clean side-by-side alternative.
/setupmints a realtime token, finalized transcripts run as normal/eve/v1/sessiondurable turns, the client speaks onmessage.completed. No/turn.Opt-in only:
realtimeSpeechChannel({ control: true })+useEveVoice({ controlMode: true }). Default behavior is unchanged.How it works (control mode)
/setupmints avcst_carrying an Eve control config{ mode, url, token }(sealed by the Gateway).WS()control route ({basePath}/ws) per live session withAuthorization: Bearer <controlToken>and subprotocoleve-voice-control.v1; Eve verifies a stateless HMAC control token.input.transcript.finalto Eve; Eve runs a durable turn and streams semanticresponse.delta/response.doneback; the Gateway injects them as TTS. Audio never flows through Eve.session.opened.engine.capabilities) lets Eve degrade gracefully (finals-only, barge-in only when supported, run the turn even without audio output).What's here
New
packages/eve/src/public/channels/vercel/:control-token.ts— stateless HMAC control token (mint at/setup, verify at WS upgrade)control-url.ts— buildscontrol.url+ deploy-protection bypass queryvoice-control-protocol.ts—eve-voice-control.v1envelope + capability parsingvoice-turn-coordinator.ts— settle-debounce, backchannel/duplicate suppression, durable turn →response.delta/done, barge-in, capability gatingspeech.ts—controlopt-in +WS()control routereact/voice.ts—controlMode(audio-only; no client-driven turns)Plus tests (token, url, protocol, coordinator, mint + WS-upgrade auth, client mode), docs, and an env-gated
controlModetoggle in the Next demo.Verification
Verified end-to-end against a live AI Gateway preview: mint → Gateway dials Eve → control-token verify → transcribe → durable turn (with a tool call) →
response.delta/done→ TTS audio back. Unit/integration/lint/typecheck/guard green on the Eve side.Requires an
@ai-sdk/gatewaychange that threads acontrolconfig throughgetTokeninto the client-secret mint body. It is vendored as a tarball (vendor/…tgz) and pinned via apnpm-workspace.yamloverridesentry so this branch is self-contained. This is temporary — to be replaced once that change lands upstream. Also requires the matching AI Gateway control-plane support. Not for release as-is; this is a first greenfield attempt for review.