docs(design): tts playback design doc (phase 2) by dimakis · Pull Request #106 · dimakis/mitzo

dimakis · 2026-04-05T19:15:03Z

Summary

Design doc for Phase 2 of voice integration: TTS playback via Yapper
Extends useVoice hook with speak(), stopSpeaking(), voice selection, and TTS toggle
New lib/tts.ts for text chunking (sentence-aware, pipelined synthesis) and AudioContext playback
New VoiceSettings.tsx component for speaker toggle + voice picker
ChatView auto-speaks on MESSAGE_END when TTS is enabled
Interruption rules, error handling, localStorage persistence, mobile autoplay considerations

No implementation code — design review only. Depends on #105 (Phase 1 batch STT).

Review focus

Chunking/pipelining approach — is two-ahead pipeline overkill for MVP?
AudioContext vs <audio> element tradeoff
Open questions at the bottom (session restore, speed control, visual indicator)

🤖 Generated with Claude Code

Covers: useVoice TTS extension, text chunking with pipelining, AudioContext playback, VoiceSettings component, ChatView auto-speak on message_end, interruption rules, voice selection, and error handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimakis · 2026-04-05T19:19:30Z

Design Review

Issues to address before implementation

AudioContext reuse — The playback code sample creates new AudioContext() per call. Browsers cap these at ~6. Reuse a single instance, created lazily on first setTtsEnabled(true) (as the mobile autoplay section suggests but the code doesn't reflect). Also needs AudioContext.close() cleanup on unmount.

AbortController on synthesize() — The design says "cancel all pending synthesis requests" on interrupt, but the synthesize() signature (Promise<Blob>) has no cancellation mechanism. Add an AbortSignal param so in-flight fetches can actually be cancelled.

useEffect dependency on messages.length — Fragile trigger. If a message is deleted and a new one added in the same render, length stays the same and TTS won't fire. Consider watching a message ID or a completion flag instead.

Suggestions (non-blocking)

Two-ahead pipeline — Agree this is overkill for MVP. Sequential synth-then-play is simpler and avoids concurrency bugs (race on stop, ordering). Add pipelining later if latency is a real problem.
chunkText regex (?<=[.!?])\s+ — Won't handle abbreviations ("Dr. Smith"), ellipses ("wait..."), or decimals ("3.14"). Acceptable for MVP but expect odd splits.
Voice list fetch — Currently fetches on mount regardless of ttsEnabled. Fetch on first enable instead to avoid a wasted request.
Hardcoded af_heart default — If Yapper's default changes, this silently breaks. Consider falling back to the first voice from /v1/voices.
Streaming TTS — Chunking at MESSAGE_END means the user waits for the entire response before hearing anything. Fine for MVP, but worth noting this design doesn't support streaming TTS without rework.

What looks good

Clean separation (lib/tts.ts → hook → component) for testability
"Voice is postprocessing" keeps the architecture simple
Block type filtering is correct and complete
Silent error handling is right for a non-critical feature
TDD implementation plan is solid

- AudioContext reuse: singleton with lazy creation and cleanup - AbortController on synthesize() for cancellable fetches - Track messageId instead of messages.length for TTS trigger - Simplify to sequential playback for MVP (no pipelining) - Lazy voice list fetch (on first TTS enable, not mount) - Dynamic default voice from /v1/voices response Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(design): tts playback design doc (phase 2) Covers: useVoice TTS extension, text chunking with pipelining, AudioContext playback, VoiceSettings component, ChatView auto-speak on message_end, interruption rules, voice selection, and error handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(design): address review feedback on tts design - AudioContext reuse: singleton with lazy creation and cleanup - AbortController on synthesize() for cancellable fetches - Track messageId instead of messages.length for TTS trigger - Simplify to sequential playback for MVP (no pipelining) - Lazy voice list fetch (on first TTS enable, not mount) - Dynamic default voice from /v1/voices response Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

dimakis merged commit 9c410a3 into feat/voice-stt-batch Apr 5, 2026

dimakis deleted the feat/voice-tts-playback branch April 5, 2026 19:22

dimakis mentioned this pull request Apr 5, 2026

feat(voice): tts playback via Yapper (phase 2) #108

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(design): tts playback design doc (phase 2)#106

docs(design): tts playback design doc (phase 2)#106
dimakis merged 2 commits intofeat/voice-stt-batchfrom
feat/voice-tts-playback

dimakis commented Apr 5, 2026

Uh oh!

dimakis commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dimakis commented Apr 5, 2026

Summary

Review focus

Uh oh!

dimakis commented Apr 5, 2026

Design Review

Issues to address before implementation

Suggestions (non-blocking)

What looks good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant