Skip to content

feat(voice): tts playback via Yapper (phase 2)#108

Merged
dimakis merged 7 commits intofeat/voice-stt-batchfrom
feat/voice-tts-playback-impl
Apr 6, 2026
Merged

feat(voice): tts playback via Yapper (phase 2)#108
dimakis merged 7 commits intofeat/voice-stt-batchfrom
feat/voice-tts-playback-impl

Conversation

@dimakis
Copy link
Copy Markdown
Owner

@dimakis dimakis commented Apr 5, 2026

Summary

  • TTS module (lib/tts.ts): text chunking at sentence boundaries, synthesis fetch with AbortSignal cancellation, singleton AudioContext management, WAV playback via AudioBufferSourceNode
  • useVoice extension: ttsAvailable from health poll, ttsEnabled/selectedVoice with localStorage persistence, lazy voice list fetch on first enable, speak() with sequential chunk synthesis, stopSpeaking() with abort
  • VoiceSettings component: speaker toggle (hidden when TTS unavailable), voice picker dropdown grouped by language
  • ChatView wiring: auto-speak on new assistant message (tracked by messageId ref, not array length), stop playback on user send
  • CSS: speaker toggle with active/speaking pulse, compact voice picker

Addresses all review feedback from #106: singleton AudioContext, AbortSignal on synthesize, messageId tracking instead of messages.length, sequential playback (no pipelining for MVP), lazy voice fetch, dynamic default voice.

Test plan

  • 34 new tests across 4 files (tts, useVoice TTS, VoiceSettings, ChatInput mock update)
  • Full suite passes: 579/579
  • Lint: 0 errors
  • Manual: enable TTS toggle → send message → hear response spoken
  • Manual: speaker toggle hidden when Yapper offline / TTS model not loaded
  • Manual: stop playback when sending new message
  • Manual: change voice → next response uses new voice
  • Manual: long response → chunked synthesis

🤖 Generated with Claude Code

dimakis and others added 5 commits April 5, 2026 20:32
Text chunking at sentence boundaries with fragment merging, synthesis
fetch with AbortSignal support, singleton AudioContext management,
and WAV playback via AudioBufferSourceNode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds ttsAvailable (from health poll), ttsEnabled/selectedVoice with
localStorage persistence, lazy voice list fetch, speak() with
sequential chunk synthesis and AbortController cancellation,
stopSpeaking(), and AudioContext lifecycle cleanup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Speaker icon toggle for TTS on/off, voice selector dropdown grouped
by language. Hidden when Yapper TTS is unavailable, voice picker
shown only when TTS is enabled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Auto-speak assistant text on message completion (tracked by messageId
ref). Stop playback on user send. Render VoiceSettings in chat header.
Update ChatInput voice mock with TTS fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Speaker toggle with active/speaking states, voice picker dropdown,
pulse animation during playback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e-entrancy

- Guard auto-speak effect with !msgState.running so TTS waits for
  the full message instead of speaking partial streaming content
- Add re-entrancy guard to playAudio.play() to prevent leaking
  duplicate AudioBufferSourceNodes
- Destructure voice hook values in ChatView to stabilize useEffect deps

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@dimakis dimakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: feat(voice): tts playback via Yapper (phase 2)

Clean implementation overall — good singleton AudioContext, proper AbortSignal cancellation, solid test coverage (34 tests). A few things to address:

🐛 Code blocks read aloud (ChatView.tsx)

The auto-speak effect joins all blockType === 'text' blocks, but these can contain markdown code fences. TTS will read raw code aloud — import React from react semicolon.... Should strip fenced code blocks (and possibly inline code) before passing to speak().

⚠️ No length guard on TTS (ChatView.tsx)

Very long assistant responses (multi-paragraph explanations, large diffs in text blocks) will produce huge synthesis requests. Consider a max character limit with a truncation notice, or at least chunking at the ChatView level before delegating to speak(). The chunking in tts.ts handles synthesis-level splitting but doesn't guard against sending a 10k-char wall to begin with.

📝 Minor: MIN_FRAGMENT_LEN not in constants (tts.ts)

TTS_CHUNK_MAX_CHARS is in constants.ts but MIN_FRAGMENT_LEN (10) is hardcoded in tts.ts. Inconsistent — either move it to constants or document why it's different.

🧪 Missing test: ChatView auto-speak effect

The auto-speak useEffect has non-trivial logic (messageId tracking, running guard, text extraction) but no test coverage. The hook and component tests are solid, but this integration point is untested.


None of these are blockers for the current MVP scope, but the code-blocks-read-aloud issue would be worth fixing before merge — it'll be immediately noticeable in use.

🤖 Generated with Claude Code

…peak

- Add stripCodeForTts() to remove fenced code blocks and inline code
  before speaking — prevents reading raw source aloud
- Add truncateForTts() with TTS_MAX_SPEAK_CHARS (2000) cap to guard
  against enormous synthesis requests
- Move MIN_FRAGMENT_LEN to constants as TTS_CHUNK_MIN_CHARS
- Extract auto-speak logic into useAutoSpeak hook for testability
- Add useAutoSpeak tests covering: streaming guard, deduplication,
  disabled states, empty text, multi-block joining
- Add playAudio re-entrancy test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimakis dimakis merged commit 5082f9b into feat/voice-stt-batch Apr 6, 2026
dimakis added a commit that referenced this pull request Apr 6, 2026
#110's squash merge overwrote #108's review fixes in ChatView and
tts.ts. This restores:

- stripCodeForTts() and truncateForTts() so TTS doesn't read code
- TTS_MAX_SPEAK_CHARS (2000) length guard
- TTS_CHUNK_MIN_CHARS moved to constants
- useAutoSpeak hook extraction with running guard
- playAudio re-entrancy guard (started flag)
- Tests for all of the above

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dimakis added a commit that referenced this pull request Apr 6, 2026
#110's squash merge overwrote #108's review fixes in ChatView and
tts.ts. This restores:

- stripCodeForTts() and truncateForTts() so TTS doesn't read code
- TTS_MAX_SPEAK_CHARS (2000) length guard
- TTS_CHUNK_MIN_CHARS moved to constants
- useAutoSpeak hook extraction with running guard
- playAudio re-entrancy guard (started flag)
- Tests for all of the above

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant