Skip to content

Read Aloud Feels Fragmented: Large Chunk Gaps and Incorrect Document Switching #3

@godlikeexcellence

Description

@godlikeexcellence

Title: Read Aloud UX: buffering gaps, document switching issues, and playback state synchronization

First: Mayari is the closest thing I've found to a true local AI audiobook reader. The overall concept is excellent and the native Kokoro integration is exactly the direction I'd like to see for long-form reading.

However, there are several issues around Read Aloud mode that make audiobook listening feel fragmented and unreliable.

1. Large pauses between chunks during live Read Aloud

Current behavior:

  • Document is split into chunks/paragraphs
  • A chunk is spoken
  • Playback pauses
  • "Preparing audio..." appears
  • The next chunk is generated
  • Playback resumes

This creates noticeable gaps between chunks.

As a listener, it constantly reminds me that I'm hearing thousands of separate generated clips rather than one continuous audiobook.

Interestingly, generated audiobook files sound much better because the gaps are far less noticeable.

Suggestions:

  • Generate upcoming chunks while the current chunk is still playing
  • Maintain a rolling audio buffer
  • Add an aggressive pre-buffering mode
  • Allow users to configure pause length between chunks
  • Optionally merge multiple paragraphs into larger synthesis units

For audiobook listening, seamless playback is arguably more important than voice quality.

2. Read Aloud sometimes remains attached to a previous document

I also frequently encounter situations where the visible document and the active Read Aloud source become desynchronized.

Example:

  1. Open Book A
  2. Start Read Aloud
  3. Open Book B
  4. The UI correctly displays Book B
  5. Press Read Aloud
  6. Playback continues from Book A

The text pane clearly shows the new document, but the TTS engine appears to remain attached to the previous document/session.

This creates a confusing situation where:

  • The screen shows one book
  • Audio comes from another book

Sometimes the app eventually switches to the new document on its own, but often the only reliable workaround is restarting the application.

Expected behavior:

Whenever a new document becomes active, Read Aloud should:

  • Automatically bind to the newly active document

or

  • Stop playback and clearly ask whether playback should continue from the new document

A visible indicator such as:

"Current TTS source: [Document Name]"

would also make the current playback source obvious.

##Overall

Both issues feel related to playback state management.

The document reader itself updates correctly, but the Read Aloud subsystem often feels detached from what is currently visible on screen.

The result is that the reading experience sometimes feels fragmented:

  • Audible pauses between chunks
  • Unclear playback ownership
  • Occasional playback from the wrong document

Mayari already has the foundations of an excellent local AI audiobook reader. Improving buffering and document/TTS synchronization would dramatically improve the day-to-day listening experience.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions