fix: two bugs in Sampler — missing f-string in _normalize_token and XOR logic error in _remove_eos_token by Dhakshin2007 · Pull Request #618 · google-deepmind/gemma

Dhakshin2007 · 2026-04-07T13:24:34Z

Summary

This PR fixes two independent bugs in the sampling pipeline that can cause incorrect runtime behavior or silently swallow debugging information.

Bug 1 — Missing `f` prefix on f-string in `_normalize_token` (`_sampler.py`)

What's broken

When a user passes a stop_tokens or forbidden_tokens string that tokenizes to more than one token, a ValueError is raised. However the error message uses a plain string instead of an f-string:

# BEFORE (buggy) — {token!r} is printed literally, never interpolated
raise ValueError(
    'Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must'
    ' map to single token ids in the vocab.'
)

The user sees:

ValueError: Invalid token: {token!r}. `stop_token`s ...

...with no information about which token caused the problem.

Fix

# AFTER — f-prefix added so {token!r} interpolates correctly
raise ValueError(
    f'Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must'
    ' map to single token ids in the vocab.'
)

Impact

Affects every user of Sampler and ChatSampler who provides multi-token stop_tokens or forbidden_tokens. Without the fix, the error message is completely uninformative and makes debugging impossible.

Bug 2 — XOR (`^`) logic error in `_remove_eos_token` (`_chat_sampler.py`)

What's broken

_remove_eos_token is called in tool-calling flows to undo the EOS token that was generated before a <|tool_response> tag. It needs to clear done=True back to done=False for the affected batch elements. The current code uses XOR:

# BEFORE (buggy)
done=state.done ^ (state.last_token == tokenizer.special_tokens.EOS),

In a batched scenario, XOR can incorrectly set done=True for batch elements where done=False but last_token==EOS. For example:

`state.done`	`last_token==EOS`	XOR result	Correct result
`True`	`True`	`False` ✅	`False` ✅
`False`	`False`	`False` ✅	`False` ✅
`False`	`True`	`True` ❌	`False` ✅
`True`	`False`	`True` ✅	`True` ✅

Row 3 is the problematic case: XOR incorrectly marks a non-done sequence as done.

Fix

Use AND-NOT (& ~) which only ever clears the done flag, never sets it:

# AFTER — AND-NOT: only clears done when it was True AND last_token was EOS
done=state.done & ~(state.last_token == tokenizer.special_tokens.EOS),

Impact

Affects batched multi-turn tool-calling inference. In affected batches, sequences can be prematurely marked as done, causing generation to stop early — leading to truncated or missing tool responses.

Files changed

gemma/gm/text/_sampler.py — Bug 1 fix (1-char change: add f prefix)
gemma/gm/text/_chat_sampler.py — Bug 2 fix (^ → & ~)

The `done` flag was updated using XOR (^): done = state.done ^ (state.last_token == EOS) In a batched setting, this incorrectly flips `done` to True for any batch element whose last_token happens to equal EOS even when `state.done` was already False. The correct operation is AND-NOT (&~), which only ever clears the `done` flag (never sets it): done = state.done & ~(state.last_token == EOS)

Dhakshin2007 added 2 commits April 7, 2026 18:29

fix: missing f-string in _normalize_token + XOR bug in _remove_eos_token

d3b6aaf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: two bugs in Sampler — missing f-string in _normalize_token and XOR logic error in _remove_eos_token#618

fix: two bugs in Sampler — missing f-string in _normalize_token and XOR logic error in _remove_eos_token#618
Dhakshin2007 wants to merge 2 commits intogoogle-deepmind:mainfrom
Dhakshin2007:fix/sampler-fstring-and-xor-bugs

Dhakshin2007 commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Dhakshin2007 commented Apr 7, 2026

Summary

Bug 1 — Missing f prefix on f-string in _normalize_token (_sampler.py)

What's broken

Fix

Impact

Bug 2 — XOR (^) logic error in _remove_eos_token (_chat_sampler.py)

What's broken

Fix

Impact

Files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug 1 — Missing `f` prefix on f-string in `_normalize_token` (`_sampler.py`)

Bug 2 — XOR (`^`) logic error in `_remove_eos_token` (`_chat_sampler.py`)