Skip to content

Strip <think> blocks before evaluation#418

Merged
DhavalRepo18 merged 2 commits into
mainfrom
opencode_agent
Jun 28, 2026
Merged

Strip <think> blocks before evaluation#418
DhavalRepo18 merged 2 commits into
mainfrom
opencode_agent

Conversation

@ChathurangiShyalika

Copy link
Copy Markdown
Collaborator

Summary

This PR updates the evaluation flow to remove model reasoning blocks wrapped in <think>...</think> before scoring agent answers. This ensures the evaluator scores only the actual final answer, which improves reliability for strict-answer tasks such as integer counts and JSON outputs.

Changes

  • Added answer normalization in src/evaluation/evaluator.py.
  • Introduced _strip_think_blocks() to remove <think>...</think> content.
  • Updated evaluation to pass the cleaned answer to the scorer.
  • Reports now store the cleaned answer used for scoring.

Why

Without this cleanup, evaluation may include reasoning text that is not part of the requested answer. This can cause incorrect scoring even when the final answer is present after the think block.

Testing

  • Verified that answers with <think>...</think> blocks are cleaned before scoring.
  • Verified that answers without think blocks continue to evaluate normally.

Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>
Signed-off-by: Chathurangi Shyalika <chathurangishyalika@Chathurangis-MacBook-Pro.local>
@DhavalRepo18 DhavalRepo18 merged commit 54a3438 into main Jun 28, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants