feat(voice): full Yapper voice integration (STT + TTS + streaming)#110
feat(voice): full Yapper voice integration (STT + TTS + streaming)#110
Conversation
dimakis
left a comment
There was a problem hiding this comment.
PR Review: feat(voice): full Yapper voice integration (STT + TTS + streaming)
Full rollup of phases 1–3. Solid architecture and test coverage (587 tests). Three issues to address:
🐛 Dual-recorder stream sharing bug (useVoice.ts)
startRecording creates both a StreamingRecorder and a batch Recorder on the same MediaStream. Both call stopTracks(stream) on stop/cancel. Whichever stops first kills the mic for the other — so if the streaming recorder stops tracks before the batch fallback tries to use the stream, recorder.stop() produces an empty/truncated blob.
Fix: only stop tracks once, in a single cleanup path, rather than letting both recorders independently kill the stream.
⚠️ Auto-speak fires during streaming, not on completion (ChatView.tsx)
The auto-speak useEffect triggers on msgState.messages changes but doesn't check msgState.running. This means TTS can fire before the assistant message is fully streamed — the messageId ref prevents duplicate speaks, but the message content may be incomplete when it first triggers.
Note: PR #108 already fixes this by extracting useAutoSpeak with a running guard, plus stripCodeForTts and truncateForTts. Merging #108 first and rebasing #110 resolves this and the stale tts.ts (which is missing those helpers and the playAudio idempotency fix).
💡 Use last partial as fallback on WS timeout (useVoice.ts)
When the 5s WS timeout fires without a final transcript, stopRecording returns empty string silently. The user loses their dictation with no indication. Consider returning partialTranscript as best-effort instead of ''.
Minor notes
mimeToFormat()is a plain function recreated every render — could be module-leveltts-playback.mdstill referencesmessages.lengthtracking but implementation usesmessageId- WS send queue is unbounded (not a real concern at audio chunk sizes, but worth a comment)
- No test for the dual-recorder stream conflict scenario
Recommended merge order
🤖 Generated with Claude Code
Text chunking at sentence boundaries with fragment merging, synthesis fetch with AbortSignal support, singleton AudioContext management, and WAV playback via AudioBufferSourceNode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds ttsAvailable (from health poll), ttsEnabled/selectedVoice with localStorage persistence, lazy voice list fetch, speak() with sequential chunk synthesis and AbortController cancellation, stopSpeaking(), and AudioContext lifecycle cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Speaker icon toggle for TTS on/off, voice selector dropdown grouped by language. Hidden when Yapper TTS is unavailable, voice picker shown only when TTS is enabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Auto-speak assistant text on message completion (tracked by messageId ref). Stop playback on user send. Render VoiceSettings in chat header. Update ChatInput voice mock with TTS fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Speaker toggle with active/speaking states, voice picker dropdown, pulse animation during playback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers: WebSocket client for Yapper /v1/transcribe/stream, streaming MediaRecorder with timeslice, live partial transcript preview in ChatInput, batch fallback, and protocol gotchas. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
createStreamingRecorder uses MediaRecorder.start(timeslice) to emit audio chunks during recording via onChunk callback. Supports cancel, auto-stop timer, and onStop notification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WebSocket wrapper for /v1/transcribe/stream with format negotiation, binary audio send, END signal, and partial/final transcript callbacks. Queues messages until connection is open. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend useVoice with streaming transcription via WebSocket client and streaming recorder. Adds partialTranscript state, sends audio chunks over WS for live partials, and falls back to batch transcription on WS error. Updates existing tests for streaming-first flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Show streaming transcription preview above the input row during recording. Overlay appears only when recording with a non-empty partial transcript, and disappears on stop or cancel. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…back - Add ownsStream option to createRecorder and createStreamingRecorder so multiple recorders can share a MediaStream without racing to kill tracks. The hook now owns stream cleanup via releaseStream(). - Return partialTranscript as best-effort fallback when WS timeout fires without a final transcript, instead of silently losing input. - Move mimeToFormat() to module level (pure function, no hook state). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
63d7fad to
afe2c57
Compare
#110's squash merge overwrote #108's review fixes in ChatView and tts.ts. This restores: - stripCodeForTts() and truncateForTts() so TTS doesn't read code - TTS_MAX_SPEAK_CHARS (2000) length guard - TTS_CHUNK_MIN_CHARS moved to constants - useAutoSpeak hook extraction with running guard - playAudio re-entrancy guard (started flag) - Tests for all of the above Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#110's squash merge overwrote #108's review fixes in ChatView and tts.ts. This restores: - stripCodeForTts() and truncateForTts() so TTS doesn't read code - TTS_MAX_SPEAK_CHARS (2000) length guard - TTS_CHUNK_MIN_CHARS moved to constants - useAutoSpeak hook extraction with running guard - playAudio re-entrancy guard (started flag) - Tests for all of the above Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
useVoicehook with Yapper health polling, push-to-talkMicButton, wired intoChatInput/ChatView/v1/synthesize, AudioContext singleton, sequential playback with AbortController cancellation,VoiceSettingscomponent (toggle + voice picker), auto-speak on assistant message completion/v1/transcribe/stream, streaming MediaRecorder with timeslice chunks, live partial transcript overlay in ChatInput, batch fallback on WS failureWhat's new
audio.tscreateRecorder()+createStreamingRecorder()with format negotiationyapper-ws.tscreateYapperStreamClient()— queued sends, JSON transcript parsingtts.tschunkText(),synthesize(),playAudio(), AudioContext singletonuseVoice.tsMicButton,VoiceSettings,ChatInputupdatesChatView.tsxuseVoice(), auto-speak effect, stops speaking on sendstreaming-stt.md,tts-playback.mdTest plan
npm test— all 25 test files green)🤖 Generated with Claude Code