Skip to content

docs(design): tts playback design doc (phase 2)#106

Merged
dimakis merged 2 commits intofeat/voice-stt-batchfrom
feat/voice-tts-playback
Apr 5, 2026
Merged

docs(design): tts playback design doc (phase 2)#106
dimakis merged 2 commits intofeat/voice-stt-batchfrom
feat/voice-tts-playback

Conversation

@dimakis
Copy link
Copy Markdown
Owner

@dimakis dimakis commented Apr 5, 2026

Summary

  • Design doc for Phase 2 of voice integration: TTS playback via Yapper
  • Extends useVoice hook with speak(), stopSpeaking(), voice selection, and TTS toggle
  • New lib/tts.ts for text chunking (sentence-aware, pipelined synthesis) and AudioContext playback
  • New VoiceSettings.tsx component for speaker toggle + voice picker
  • ChatView auto-speaks on MESSAGE_END when TTS is enabled
  • Interruption rules, error handling, localStorage persistence, mobile autoplay considerations

No implementation code — design review only. Depends on #105 (Phase 1 batch STT).

Review focus

  • Chunking/pipelining approach — is two-ahead pipeline overkill for MVP?
  • AudioContext vs <audio> element tradeoff
  • Open questions at the bottom (session restore, speed control, visual indicator)

🤖 Generated with Claude Code

Covers: useVoice TTS extension, text chunking with pipelining,
AudioContext playback, VoiceSettings component, ChatView auto-speak
on message_end, interruption rules, voice selection, and error handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimakis
Copy link
Copy Markdown
Owner Author

dimakis commented Apr 5, 2026

Design Review

Issues to address before implementation

AudioContext reuse — The playback code sample creates new AudioContext() per call. Browsers cap these at ~6. Reuse a single instance, created lazily on first setTtsEnabled(true) (as the mobile autoplay section suggests but the code doesn't reflect). Also needs AudioContext.close() cleanup on unmount.

AbortController on synthesize() — The design says "cancel all pending synthesis requests" on interrupt, but the synthesize() signature (Promise<Blob>) has no cancellation mechanism. Add an AbortSignal param so in-flight fetches can actually be cancelled.

useEffect dependency on messages.length — Fragile trigger. If a message is deleted and a new one added in the same render, length stays the same and TTS won't fire. Consider watching a message ID or a completion flag instead.

Suggestions (non-blocking)

  • Two-ahead pipeline — Agree this is overkill for MVP. Sequential synth-then-play is simpler and avoids concurrency bugs (race on stop, ordering). Add pipelining later if latency is a real problem.
  • chunkText regex (?<=[.!?])\s+ — Won't handle abbreviations ("Dr. Smith"), ellipses ("wait..."), or decimals ("3.14"). Acceptable for MVP but expect odd splits.
  • Voice list fetch — Currently fetches on mount regardless of ttsEnabled. Fetch on first enable instead to avoid a wasted request.
  • Hardcoded af_heart default — If Yapper's default changes, this silently breaks. Consider falling back to the first voice from /v1/voices.
  • Streaming TTS — Chunking at MESSAGE_END means the user waits for the entire response before hearing anything. Fine for MVP, but worth noting this design doesn't support streaming TTS without rework.

What looks good

  • Clean separation (lib/tts.ts → hook → component) for testability
  • "Voice is postprocessing" keeps the architecture simple
  • Block type filtering is correct and complete
  • Silent error handling is right for a non-critical feature
  • TDD implementation plan is solid

- AudioContext reuse: singleton with lazy creation and cleanup
- AbortController on synthesize() for cancellable fetches
- Track messageId instead of messages.length for TTS trigger
- Simplify to sequential playback for MVP (no pipelining)
- Lazy voice list fetch (on first TTS enable, not mount)
- Dynamic default voice from /v1/voices response

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimakis dimakis merged commit 9c410a3 into feat/voice-stt-batch Apr 5, 2026
@dimakis dimakis deleted the feat/voice-tts-playback branch April 5, 2026 19:22
dimakis added a commit that referenced this pull request Apr 5, 2026
* docs(design): tts playback design doc (phase 2)

Covers: useVoice TTS extension, text chunking with pipelining,
AudioContext playback, VoiceSettings component, ChatView auto-speak
on message_end, interruption rules, voice selection, and error handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(design): address review feedback on tts design

- AudioContext reuse: singleton with lazy creation and cleanup
- AbortController on synthesize() for cancellable fetches
- Track messageId instead of messages.length for TTS trigger
- Simplify to sequential playback for MVP (no pipelining)
- Lazy voice list fetch (on first TTS enable, not mount)
- Dynamic default voice from /v1/voices response

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant