feat(voice): tts playback via Yapper (phase 2)#108
feat(voice): tts playback via Yapper (phase 2)#108dimakis merged 7 commits intofeat/voice-stt-batchfrom
Conversation
Text chunking at sentence boundaries with fragment merging, synthesis fetch with AbortSignal support, singleton AudioContext management, and WAV playback via AudioBufferSourceNode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds ttsAvailable (from health poll), ttsEnabled/selectedVoice with localStorage persistence, lazy voice list fetch, speak() with sequential chunk synthesis and AbortController cancellation, stopSpeaking(), and AudioContext lifecycle cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Speaker icon toggle for TTS on/off, voice selector dropdown grouped by language. Hidden when Yapper TTS is unavailable, voice picker shown only when TTS is enabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Auto-speak assistant text on message completion (tracked by messageId ref). Stop playback on user send. Render VoiceSettings in chat header. Update ChatInput voice mock with TTS fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Speaker toggle with active/speaking states, voice picker dropdown, pulse animation during playback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e-entrancy - Guard auto-speak effect with !msgState.running so TTS waits for the full message instead of speaking partial streaming content - Add re-entrancy guard to playAudio.play() to prevent leaking duplicate AudioBufferSourceNodes - Destructure voice hook values in ChatView to stabilize useEffect deps Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dimakis
left a comment
There was a problem hiding this comment.
PR Review: feat(voice): tts playback via Yapper (phase 2)
Clean implementation overall — good singleton AudioContext, proper AbortSignal cancellation, solid test coverage (34 tests). A few things to address:
🐛 Code blocks read aloud (ChatView.tsx)
The auto-speak effect joins all blockType === 'text' blocks, but these can contain markdown code fences. TTS will read raw code aloud — import React from react semicolon.... Should strip fenced code blocks (and possibly inline code) before passing to speak().
⚠️ No length guard on TTS (ChatView.tsx)
Very long assistant responses (multi-paragraph explanations, large diffs in text blocks) will produce huge synthesis requests. Consider a max character limit with a truncation notice, or at least chunking at the ChatView level before delegating to speak(). The chunking in tts.ts handles synthesis-level splitting but doesn't guard against sending a 10k-char wall to begin with.
📝 Minor: MIN_FRAGMENT_LEN not in constants (tts.ts)
TTS_CHUNK_MAX_CHARS is in constants.ts but MIN_FRAGMENT_LEN (10) is hardcoded in tts.ts. Inconsistent — either move it to constants or document why it's different.
🧪 Missing test: ChatView auto-speak effect
The auto-speak useEffect has non-trivial logic (messageId tracking, running guard, text extraction) but no test coverage. The hook and component tests are solid, but this integration point is untested.
None of these are blockers for the current MVP scope, but the code-blocks-read-aloud issue would be worth fixing before merge — it'll be immediately noticeable in use.
🤖 Generated with Claude Code
…peak - Add stripCodeForTts() to remove fenced code blocks and inline code before speaking — prevents reading raw source aloud - Add truncateForTts() with TTS_MAX_SPEAK_CHARS (2000) cap to guard against enormous synthesis requests - Move MIN_FRAGMENT_LEN to constants as TTS_CHUNK_MIN_CHARS - Extract auto-speak logic into useAutoSpeak hook for testability - Add useAutoSpeak tests covering: streaming guard, deduplication, disabled states, empty text, multi-block joining - Add playAudio re-entrancy test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#110's squash merge overwrote #108's review fixes in ChatView and tts.ts. This restores: - stripCodeForTts() and truncateForTts() so TTS doesn't read code - TTS_MAX_SPEAK_CHARS (2000) length guard - TTS_CHUNK_MIN_CHARS moved to constants - useAutoSpeak hook extraction with running guard - playAudio re-entrancy guard (started flag) - Tests for all of the above Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
#110's squash merge overwrote #108's review fixes in ChatView and tts.ts. This restores: - stripCodeForTts() and truncateForTts() so TTS doesn't read code - TTS_MAX_SPEAK_CHARS (2000) length guard - TTS_CHUNK_MIN_CHARS moved to constants - useAutoSpeak hook extraction with running guard - playAudio re-entrancy guard (started flag) - Tests for all of the above Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
lib/tts.ts): text chunking at sentence boundaries, synthesis fetch withAbortSignalcancellation, singletonAudioContextmanagement, WAV playback viaAudioBufferSourceNodettsAvailablefrom health poll,ttsEnabled/selectedVoicewith localStorage persistence, lazy voice list fetch on first enable,speak()with sequential chunk synthesis,stopSpeaking()with abortAddresses all review feedback from #106: singleton AudioContext, AbortSignal on synthesize, messageId tracking instead of messages.length, sequential playback (no pipelining for MVP), lazy voice fetch, dynamic default voice.
Test plan
🤖 Generated with Claude Code