feat(gateway): shared speech-engine control protocol + session helper#16286
Draft
kkdawkins wants to merge 1 commit into
Draft
feat(gateway): shared speech-engine control protocol + session helper#16286kkdawkins wants to merge 1 commit into
kkdawkins wants to merge 1 commit into
Conversation
…helper Introduce a vendor-neutral wire contract for Gateway-owned realtime voice control so the Gateway and any controller (Eve, or any "bring your own brain" client) share one source of truth: - gateway-speech-engine: subprotocol constant, event schemas (engine<->controller), capabilities + permissive defaults, engine descriptor, and the envelope encode/parse codec (turnId on client events for post-barge-in stale-frame dropping). - GatewaySpeechEngineSession: controller-side helper that surfaces finalized transcripts, streams replies back for TTS (string or LLM chunk stream), and does implicit barge-in (new transcript aborts and cancels the prior turn by id). Thread an optional `control` config through experimental_realtime.getToken into the minted client secret, and re-export GatewayRealtimeControlConfig from `ai`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a vendor-neutral wire contract in
@ai-sdk/gatewayfor Gateway-owned realtime voice control — the model where the Gateway owns the audio loop (STT, TTS, turn-taking) and a controller ("bring your own brain") drives turns over a server-side control socket. This makes@ai-sdk/gatewaythe single source of truth for the protocol so the Gateway and any controller can't drift.What's added
gateway-speech-engine— the shared contract:GATEWAY_SPEECH_ENGINE_SUBPROTOCOL(control-socket subprotocol)SpeechEngineServerEvent/SpeechEngineClientEvent, withturnIdon client events for post-barge-in stale-frame dropping)SpeechEngineCapabilities+DEFAULT_SPEECH_ENGINE_CAPABILITIES,SpeechEngineDescriptorencodeSpeechEngineEvent/parseSpeechEngineServerEvent/parseSpeechEngineClientEventGatewaySpeechEngineSession— a controller-side helper (the AI-SDK analogue of ElevenLabs' session): surface finalized transcripts, stream a reply back for TTS (string or LLM-chunk stream, with auto text extraction), and implicit barge-in (a new transcript aborts the prior turn'sAbortSignaland cancels it on the wire byturnId).controlconfig throughexperimental_realtime.getToken(sealed into the minted client secret; server-side mint only) and re-exportGatewayRealtimeControlConfigfromai.Why
The protocol was being hand-mirrored across consumers. Centralizing it here lets the AI Gateway server and Eve (and any future client, including ElevenLabs-as-a-provider use cases) import one contract.
Tests
gateway-speech-engine.test.tscovers the codec (round-trips, capability handshake, malformed/turnId-required rejection) and the session helper (transcript→reply with a consistentturnId, LLM-chunk extraction, supersede/cancel-by-id, no-cancel-without-capability). 8 tests,pnpm buildclean (ESM + DTS).Consumers (separate PRs)
Draft until the consumers are validated end-to-end against this build.