fix: UTF-8 corruption in streaming — read chunks instead of byte-by-byte by mitre88 · Pull Request #11 · walter-grace/mac-code

mitre88 · 2026-04-18T02:26:23Z

Problem

stream_llm() in agent.py:547 reads SSE responses byte-by-byte (resp.read(1)) and decodes each byte individually as UTF-8:

ch = resp.read(1)
buf += ch.decode("utf-8", errors="replace")

This corrupts all multi-byte UTF-8 characters:

Emojis: 🍎 → `` (4 bytes decoded as 4 replacement chars)
Accented: ñ, é, ü → `` (2 bytes each)
CJK: 中文 → `` (3 bytes each)

Every streamed response with non-ASCII characters is garbled.

Fix

Read 4096-byte chunks instead of single bytes:

chunk = resp.read(4096)
buf += chunk.decode("utf-8", errors="replace")

Multi-byte characters are now correctly assembled before decoding.

Impact

Before: Any emoji, accent, or non-ASCII char in streamed output is corrupted
After: All UTF-8 characters stream correctly
Performance: 4096-byte chunks are also faster (fewer syscalls)
Zero breaking changes: Same SSE parsing logic, just reads bigger chunks

stream_llm() reads SSE response byte-by-byte (resp.read(1)) and decodes each byte individually as UTF-8. This corrupts all multi-byte characters: emojis (🍎→????), accented chars (ñ→??), CJK text, etc. Fix: read 4096-byte chunks and decode the full chunk. Multi-byte characters are now correctly assembled before decoding. This is the same issue reported in PR walter-grace#10.

mitre88 · 2026-04-18T02:26:29Z

@walter-grace One-line fix: stream_llm() reads byte-by-byte which corrupts all multi-byte UTF-8 (emojis, accents, CJK). This PR reads 4KB chunks instead. Same issue as PR #10.

mitre88 closed this May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: UTF-8 corruption in streaming — read chunks instead of byte-by-byte#11

fix: UTF-8 corruption in streaming — read chunks instead of byte-by-byte#11
mitre88 wants to merge 1 commit into
walter-grace:mainfrom
mitre88:fix/utf8-streaming-corruption

mitre88 commented Apr 18, 2026

Uh oh!

mitre88 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mitre88 commented Apr 18, 2026

Problem

Fix

Impact

Uh oh!

mitre88 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant