Add EVIDENCE_BASED_INFORMATION section to system prompt by juanmichelini · Pull Request #3385 · OpenHands/software-agent-sdk

juanmichelini · 2026-05-26T03:23:18Z

Summary

Adds an always-on <EVIDENCE_BASED_INFORMATION> section to the default system prompt (system_prompt.j2) that instructs the agent to back every field value or factual claim with official documentation and verifiable sources.

When providing field values, the agent will now be instructed to include:

Field: the field name
Value: the value being assigned
Quote: the exact text from the source that supports this value
Source: the official link (URL) where the quote can be found

Changes

openhands-sdk/openhands/sdk/agent/prompts/system_prompt.j2: Added <EVIDENCE_BASED_INFORMATION> section between <PROCESS_MANAGEMENT> and the model-specific includes block.
tests/sdk/agent/test_system_prompt.py: Added test_default_system_prompt_contains_evidence_based_information to verify the section and its key fields appear in the rendered system prompt.

Why system prompt (not a skill)

As discussed in the issue, this is core agent behavior that should always be active - not an optional skill that requires a trigger. When the agent provides factual information, it should always cite sources. The system prompt (system_prompt.j2) is the correct location for always-on instructions, not AGENTS.md.

Fixes OpenHands/extensions#157

@juanmichelini can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d97422d-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d97422d-python \
  ghcr.io/openhands/agent-server:d97422d-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d97422d-golang-amd64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-golang-amd64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-golang-amd64
ghcr.io/openhands/agent-server:d97422d-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d97422d-golang-arm64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-golang-arm64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-golang-arm64
ghcr.io/openhands/agent-server:d97422d-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d97422d-java-amd64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-java-amd64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-java-amd64
ghcr.io/openhands/agent-server:d97422d-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d97422d-java-arm64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-java-arm64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-java-arm64
ghcr.io/openhands/agent-server:d97422d-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d97422d-python-amd64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-python-amd64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-python-amd64
ghcr.io/openhands/agent-server:d97422d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:d97422d-python-arm64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-python-arm64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-python-arm64
ghcr.io/openhands/agent-server:d97422d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:d97422d-golang
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-golang
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-golang
ghcr.io/openhands/agent-server:d97422d-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:d97422d-java
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-java
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-java
ghcr.io/openhands/agent-server:d97422d-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:d97422d-python
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-python
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-python
ghcr.io/openhands/agent-server:d97422d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., d97422d-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., d97422d-python-amd64) are also available if needed

When providing field values or factual claims, the agent should always support them with official documentation and verifiable sources. This adds an always-on section to the default system prompt that instructs the agent to provide: Field, Value, Quote, and Source for each claim. Fixes OpenHands/extensions#157 Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-05-26T03:23:42Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-26T03:23:51Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-05-26T03:25:44Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
TOTAL	27831	8182	70%

report-only-changed-files is enabled. No files were changed during this commit :)

juanmichelini · 2026-05-26T03:42:58Z

@OpenHands /codereview-roasted

openhands-ai · 2026-05-26T03:43:16Z

I'm on it! juanmichelini can track my progress at all-hands.dev

juanmichelini

🔥 Code Review: Add EVIDENCE_BASED_INFORMATION section to system prompt

Taste Rating: 🟡 Acceptable — The idea is reasonable but the always-on, always-apply-everywhere scope is too broad for a default coding agent prompt.

Three Questions

Is this solving a real problem? Yes — the linked issue (OpenHands/extensions#157) asks for evidence-backed responses with field/value/quote/source. That is a legitimate use case.
Is there a simpler way? The implementation is already simple (just prompt text). The question is whether it belongs here at all.
What will this break? Every agent using the default system prompt will now be instructed to cite sources on every factual claim or field value. This is a behavioral change to all downstream agents.

[CRITICAL ISSUES]

[system_prompt.j2, Line 135] "always" is too broad for a default coding agent prompt.

The instruction says: "When providing field values or factual claims, always support them with official documentation and verifiable sources."

For a general-purpose coding agent, most interactions involve writing code, debugging, refactoring, and answering technical questions — not producing research reports with citations. This instruction will push the agent to add Field/Value/Quote/Source scaffolding to responses where it adds zero value, like:

"The function name should be process_data" → Now the agent feels compelled to cite where it got that name
"Set timeout=30" → Now it needs to find the RFC or docs that justify 30 seconds
"This error means the file was not found" → Now it needs to source that claim

This will make routine coding interactions verbose and annoying. The instruction was designed for a specific use case (structured research/documentation lookup) and should not be applied universally.

Better alternatives:

Make it a skill (contradicting the PR description): The PR argues this should be a system prompt section because it is "core agent behavior that should always be active." But citation-of-sources is not core behavior for a coding agent — it is domain-specific behavior for research/documentation tasks. Skills are the right mechanism for opt-in specialized behavior.
Make it conditional: Add a template flag like {% if enable_evidence_citation %} so consumers can opt in.
Soften the scope: Change "always" to qualify when this applies (e.g., "When explicitly asked to provide evidence-backed information" or "When filling structured fields that require source verification").

[IMPROVEMENT OPPORTUNITIES]

[system_prompt.j2, Lines 134-142] No conditional rendering, unlike other sections.

Other sections in this template use conditional includes ({% if enable_browser %}, {% if llm_security_analyzer %}). This new section is always included. If the intent is truly "always on for every agent," then it should at least be justified why this is the only behavioral instruction that applies universally. If not, add a flag.

[system_prompt.j2, Line 136] "field values" is ambiguous in a coding context.

What counts as a "field value"? A function parameter? A config key? A variable assignment? The term makes sense for form-filling or documentation lookups but is unclear for general coding tasks. Consider scoping the language to make the applicability clear.

[TESTING GAPS]

The test (test_default_system_prompt_contains_evidence_based_information) verifies the section exists in the rendered prompt, which is fine. But this is a prompt change that modifies agent behavior — the real validation needed is eval/benchmark evidence.

Per the repo-specific review guidelines: "Do NOT submit an APPROVE review when the PR changes agent behavior or anything that could plausibly affect benchmark/evaluation performance." This is a system prompt change → behavioral change → eval risk.

No eval evidence has been provided in the PR description.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

This is a behavioral change to the default system prompt that affects every agent using the default template. The always-on, always-apply-everywhere scope of the citation instruction will likely degrade the agent's efficiency on routine coding tasks (more tokens spent on unnecessary citations, more verbose responses). The eval risk is real and unvalidated.

If the scope were narrowed (conditional flag, skill-based, or qualified with "when asked"), the risk would drop to 🟢 LOW.

VERDICT:

❌ Needs rework: The always-on, unconditional scope is the core issue. Either narrow the applicability (conditional flag, softened language), or move it to a skill where it belongs. Flagging for human maintainer to decide after evals.

KEY INSIGHT:

Citation-of-sources is a domain-specific behavior (research/documentation), not a universal coding-agent behavior. Putting it in the always-on system prompt is like requiring a bibliography for every code comment.

Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.

Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.

When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Was this review helpful? React with 👍 or 👎 to give feedback.

openhands-ai · 2026-05-26T03:48:49Z

Summary of Work

I performed a code review on PR #3385 ("Add EVIDENCE_BASED_INFORMATION section to system prompt") in response to the /codereview-roasted comment from @juanmichelini, and posted the review as a COMMENT on the PR.

What I did:

Retrieved the PR diff (2 files changed: system_prompt.j2 and test_system_prompt.py)
Read the linked issue (Search for official links and evidence extensions#157) for context
Verified all existing tests pass (9/9 passed)
Posted a COMMENT review (not an APPROVE) on PR Add EVIDENCE_BASED_INFORMATION section to system prompt #3385

Review verdict: ❌ Needs rework (🟡 Acceptable taste, 🟡 MEDIUM risk)

Key findings:

Critical: The "always" qualifier is too broad — requiring citation scaffolding (Field/Value/Quote/Source) on every factual claim will make routine coding interactions verbose and counterproductive. This instruction was designed for a specific research/documentation use case but is applied universally to all agents.
Better alternatives proposed: (1) Make it a skill, (2) Add a conditional template flag, (3) Soften the scope to "when explicitly asked"
Testing gap: No eval/benchmark evidence provided for a system prompt change that modifies agent behavior
Per repo guidelines: Did NOT approve because this PR changes agent behavior that could plausibly affect benchmark/evaluation performance — flagged for human maintainer decision after evals

openhands-ai Bot mentioned this pull request May 26, 2026

Search for official links and evidence OpenHands/extensions#157

Open

juanmichelini commented May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EVIDENCE_BASED_INFORMATION section to system prompt#3385

Add EVIDENCE_BASED_INFORMATION section to system prompt#3385
juanmichelini wants to merge 1 commit into
mainfrom
openhands/add-evidence-based-information-to-system-prompt

juanmichelini commented May 26, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

juanmichelini commented May 26, 2026

Uh oh!

openhands-ai Bot commented May 26, 2026

Uh oh!

juanmichelini left a comment

Uh oh!

openhands-ai Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

juanmichelini commented May 26, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why system prompt (not a skill)

Uh oh!

github-actions Bot commented May 26, 2026

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented May 26, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

juanmichelini commented May 26, 2026

Uh oh!

openhands-ai Bot commented May 26, 2026

Uh oh!

juanmichelini left a comment

Choose a reason for hiding this comment

🔥 Code Review: Add EVIDENCE_BASED_INFORMATION section to system prompt

Three Questions

[CRITICAL ISSUES]

[IMPROVEMENT OPPORTUNITIES]

[TESTING GAPS]

[RISK ASSESSMENT]

VERDICT:

KEY INSIGHT:

Uh oh!

openhands-ai Bot commented May 26, 2026

Summary of Work

What I did:

Review verdict: ❌ Needs rework (🟡 Acceptable taste, 🟡 MEDIUM risk)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

juanmichelini commented May 26, 2026 •

edited by github-actions Bot

Loading