feat(voice): tts playback via Yapper (phase 2) by dimakis · Pull Request #108 · dimakis/mitzo

dimakis · 2026-04-05T19:38:22Z

Summary

TTS module (lib/tts.ts): text chunking at sentence boundaries, synthesis fetch with AbortSignal cancellation, singleton AudioContext management, WAV playback via AudioBufferSourceNode
useVoice extension: ttsAvailable from health poll, ttsEnabled/selectedVoice with localStorage persistence, lazy voice list fetch on first enable, speak() with sequential chunk synthesis, stopSpeaking() with abort
VoiceSettings component: speaker toggle (hidden when TTS unavailable), voice picker dropdown grouped by language
ChatView wiring: auto-speak on new assistant message (tracked by messageId ref, not array length), stop playback on user send
CSS: speaker toggle with active/speaking pulse, compact voice picker

Addresses all review feedback from #106: singleton AudioContext, AbortSignal on synthesize, messageId tracking instead of messages.length, sequential playback (no pipelining for MVP), lazy voice fetch, dynamic default voice.

Test plan

34 new tests across 4 files (tts, useVoice TTS, VoiceSettings, ChatInput mock update)
Full suite passes: 579/579
Lint: 0 errors
Manual: enable TTS toggle → send message → hear response spoken
Manual: speaker toggle hidden when Yapper offline / TTS model not loaded
Manual: stop playback when sending new message
Manual: change voice → next response uses new voice
Manual: long response → chunked synthesis

🤖 Generated with Claude Code

Text chunking at sentence boundaries with fragment merging, synthesis fetch with AbortSignal support, singleton AudioContext management, and WAV playback via AudioBufferSourceNode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds ttsAvailable (from health poll), ttsEnabled/selectedVoice with localStorage persistence, lazy voice list fetch, speak() with sequential chunk synthesis and AbortController cancellation, stopSpeaking(), and AudioContext lifecycle cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Speaker icon toggle for TTS on/off, voice selector dropdown grouped by language. Hidden when Yapper TTS is unavailable, voice picker shown only when TTS is enabled. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Auto-speak assistant text on message completion (tracked by messageId ref). Stop playback on user send. Render VoiceSettings in chat header. Update ChatInput voice mock with TTS fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Speaker toggle with active/speaking states, voice picker dropdown, pulse animation during playback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e-entrancy - Guard auto-speak effect with !msgState.running so TTS waits for the full message instead of speaking partial streaming content - Add re-entrancy guard to playAudio.play() to prevent leaking duplicate AudioBufferSourceNodes - Destructure voice hook values in ChatView to stabilize useEffect deps Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis

PR Review: feat(voice): tts playback via Yapper (phase 2)

Clean implementation overall — good singleton AudioContext, proper AbortSignal cancellation, solid test coverage (34 tests). A few things to address:

🐛 Code blocks read aloud (ChatView.tsx)

The auto-speak effect joins all blockType === 'text' blocks, but these can contain markdown code fences. TTS will read raw code aloud — import React from react semicolon.... Should strip fenced code blocks (and possibly inline code) before passing to speak().

⚠️ No length guard on TTS (ChatView.tsx)

Very long assistant responses (multi-paragraph explanations, large diffs in text blocks) will produce huge synthesis requests. Consider a max character limit with a truncation notice, or at least chunking at the ChatView level before delegating to speak(). The chunking in tts.ts handles synthesis-level splitting but doesn't guard against sending a 10k-char wall to begin with.

📝 Minor: `MIN_FRAGMENT_LEN` not in constants (tts.ts)

TTS_CHUNK_MAX_CHARS is in constants.ts but MIN_FRAGMENT_LEN (10) is hardcoded in tts.ts. Inconsistent — either move it to constants or document why it's different.

🧪 Missing test: ChatView auto-speak effect

The auto-speak useEffect has non-trivial logic (messageId tracking, running guard, text extraction) but no test coverage. The hook and component tests are solid, but this integration point is untested.

None of these are blockers for the current MVP scope, but the code-blocks-read-aloud issue would be worth fixing before merge — it'll be immediately noticeable in use.

🤖 Generated with Claude Code

…peak - Add stripCodeForTts() to remove fenced code blocks and inline code before speaking — prevents reading raw source aloud - Add truncateForTts() with TTS_MAX_SPEAK_CHARS (2000) cap to guard against enormous synthesis requests - Move MIN_FRAGMENT_LEN to constants as TTS_CHUNK_MIN_CHARS - Extract auto-speak logic into useAutoSpeak hook for testability - Add useAutoSpeak tests covering: streaming guard, deduplication, disabled states, empty text, multi-block joining - Add playAudio re-entrancy test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

#110's squash merge overwrote #108's review fixes in ChatView and tts.ts. This restores: - stripCodeForTts() and truncateForTts() so TTS doesn't read code - TTS_MAX_SPEAK_CHARS (2000) length guard - TTS_CHUNK_MIN_CHARS moved to constants - useAutoSpeak hook extraction with running guard - playAudio re-entrancy guard (started flag) - Tests for all of the above Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

#110's squash merge overwrote #108's review fixes in ChatView and tts.ts. This restores: - stripCodeForTts() and truncateForTts() so TTS doesn't read code - TTS_MAX_SPEAK_CHARS (2000) length guard - TTS_CHUNK_MIN_CHARS moved to constants - useAutoSpeak hook extraction with running guard - playAudio re-entrancy guard (started flag) - Tests for all of the above Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

dimakis and others added 5 commits April 5, 2026 20:32

style(voice): add VoiceSettings css with toggle and picker styling

0180db3

Speaker toggle with active/speaking states, voice picker dropdown, pulse animation during playback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis mentioned this pull request Apr 5, 2026

docs: update all docs to reflect post-v0.1.0 state #109

Merged

2 tasks

dimakis commented Apr 6, 2026

View reviewed changes

dimakis mentioned this pull request Apr 6, 2026

feat(voice): full Yapper voice integration (STT + TTS + streaming) #110

Merged

7 tasks

dimakis merged commit 5082f9b into feat/voice-stt-batch Apr 6, 2026

dimakis mentioned this pull request Apr 6, 2026

fix(voice): restore TTS review fixes lost in #110 squash merge #114

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(voice): tts playback via Yapper (phase 2)#108

feat(voice): tts playback via Yapper (phase 2)#108
dimakis merged 7 commits intofeat/voice-stt-batchfrom
feat/voice-tts-playback-impl

dimakis commented Apr 5, 2026

Uh oh!

dimakis left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dimakis commented Apr 5, 2026

Summary

Test plan

Uh oh!

dimakis left a comment

Choose a reason for hiding this comment

PR Review: feat(voice): tts playback via Yapper (phase 2)

🐛 Code blocks read aloud (ChatView.tsx)

⚠️ No length guard on TTS (ChatView.tsx)

📝 Minor: MIN_FRAGMENT_LEN not in constants (tts.ts)

🧪 Missing test: ChatView auto-speak effect

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📝 Minor: `MIN_FRAGMENT_LEN` not in constants (tts.ts)