Skip to content

Add EVIDENCE_BASED_INFORMATION section to system prompt#3385

Draft
juanmichelini wants to merge 1 commit into
mainfrom
openhands/add-evidence-based-information-to-system-prompt
Draft

Add EVIDENCE_BASED_INFORMATION section to system prompt#3385
juanmichelini wants to merge 1 commit into
mainfrom
openhands/add-evidence-based-information-to-system-prompt

Conversation

@juanmichelini
Copy link
Copy Markdown
Collaborator

@juanmichelini juanmichelini commented May 26, 2026

Summary

Adds an always-on <EVIDENCE_BASED_INFORMATION> section to the default system prompt (system_prompt.j2) that instructs the agent to back every field value or factual claim with official documentation and verifiable sources.

When providing field values, the agent will now be instructed to include:

  • Field: the field name
  • Value: the value being assigned
  • Quote: the exact text from the source that supports this value
  • Source: the official link (URL) where the quote can be found

Changes

  • openhands-sdk/openhands/sdk/agent/prompts/system_prompt.j2: Added <EVIDENCE_BASED_INFORMATION> section between <PROCESS_MANAGEMENT> and the model-specific includes block.
  • tests/sdk/agent/test_system_prompt.py: Added test_default_system_prompt_contains_evidence_based_information to verify the section and its key fields appear in the rendered system prompt.

Why system prompt (not a skill)

As discussed in the issue, this is core agent behavior that should always be active - not an optional skill that requires a trigger. When the agent provides factual information, it should always cite sources. The system prompt (system_prompt.j2) is the correct location for always-on instructions, not AGENTS.md.

Fixes OpenHands/extensions#157

@juanmichelini can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:d97422d-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-d97422d-python \
  ghcr.io/openhands/agent-server:d97422d-python

All tags pushed for this build

ghcr.io/openhands/agent-server:d97422d-golang-amd64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-golang-amd64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-golang-amd64
ghcr.io/openhands/agent-server:d97422d-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:d97422d-golang-arm64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-golang-arm64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-golang-arm64
ghcr.io/openhands/agent-server:d97422d-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:d97422d-java-amd64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-java-amd64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-java-amd64
ghcr.io/openhands/agent-server:d97422d-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:d97422d-java-arm64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-java-arm64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-java-arm64
ghcr.io/openhands/agent-server:d97422d-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:d97422d-python-amd64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-python-amd64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-python-amd64
ghcr.io/openhands/agent-server:d97422d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:d97422d-python-arm64
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-python-arm64
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-python-arm64
ghcr.io/openhands/agent-server:d97422d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:d97422d-golang
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-golang
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-golang
ghcr.io/openhands/agent-server:d97422d-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:d97422d-java
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-java
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-java
ghcr.io/openhands/agent-server:d97422d-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:d97422d-python
ghcr.io/openhands/agent-server:d97422d8d7fa584f2923ce84959f100efc2a8286-python
ghcr.io/openhands/agent-server:openhands-add-evidence-based-information-to-system-prompt-python
ghcr.io/openhands/agent-server:d97422d-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., d97422d-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., d97422d-python-amd64) are also available if needed

When providing field values or factual claims, the agent should always
support them with official documentation and verifiable sources. This
adds an always-on section to the default system prompt that instructs
the agent to provide: Field, Value, Quote, and Source for each claim.

Fixes OpenHands/extensions#157

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL27831818270% 
report-only-changed-files is enabled. No files were changed during this commit :)

@juanmichelini
Copy link
Copy Markdown
Collaborator Author

@OpenHands /codereview-roasted

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 26, 2026

I'm on it! juanmichelini can track my progress at all-hands.dev

Copy link
Copy Markdown
Collaborator Author

@juanmichelini juanmichelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 Code Review: Add EVIDENCE_BASED_INFORMATION section to system prompt

Taste Rating: 🟡 Acceptable — The idea is reasonable but the always-on, always-apply-everywhere scope is too broad for a default coding agent prompt.


Three Questions

  1. Is this solving a real problem? Yes — the linked issue (OpenHands/extensions#157) asks for evidence-backed responses with field/value/quote/source. That is a legitimate use case.
  2. Is there a simpler way? The implementation is already simple (just prompt text). The question is whether it belongs here at all.
  3. What will this break? Every agent using the default system prompt will now be instructed to cite sources on every factual claim or field value. This is a behavioral change to all downstream agents.

[CRITICAL ISSUES]

[system_prompt.j2, Line 135] "always" is too broad for a default coding agent prompt.

The instruction says: "When providing field values or factual claims, always support them with official documentation and verifiable sources."

For a general-purpose coding agent, most interactions involve writing code, debugging, refactoring, and answering technical questions — not producing research reports with citations. This instruction will push the agent to add Field/Value/Quote/Source scaffolding to responses where it adds zero value, like:

  • "The function name should be process_data" → Now the agent feels compelled to cite where it got that name
  • "Set timeout=30" → Now it needs to find the RFC or docs that justify 30 seconds
  • "This error means the file was not found" → Now it needs to source that claim

This will make routine coding interactions verbose and annoying. The instruction was designed for a specific use case (structured research/documentation lookup) and should not be applied universally.

Better alternatives:

  1. Make it a skill (contradicting the PR description): The PR argues this should be a system prompt section because it is "core agent behavior that should always be active." But citation-of-sources is not core behavior for a coding agent — it is domain-specific behavior for research/documentation tasks. Skills are the right mechanism for opt-in specialized behavior.
  2. Make it conditional: Add a template flag like {% if enable_evidence_citation %} so consumers can opt in.
  3. Soften the scope: Change "always" to qualify when this applies (e.g., "When explicitly asked to provide evidence-backed information" or "When filling structured fields that require source verification").

[IMPROVEMENT OPPORTUNITIES]

[system_prompt.j2, Lines 134-142] No conditional rendering, unlike other sections.

Other sections in this template use conditional includes ({% if enable_browser %}, {% if llm_security_analyzer %}). This new section is always included. If the intent is truly "always on for every agent," then it should at least be justified why this is the only behavioral instruction that applies universally. If not, add a flag.

[system_prompt.j2, Line 136] "field values" is ambiguous in a coding context.

What counts as a "field value"? A function parameter? A config key? A variable assignment? The term makes sense for form-filling or documentation lookups but is unclear for general coding tasks. Consider scoping the language to make the applicability clear.


[TESTING GAPS]

The test (test_default_system_prompt_contains_evidence_based_information) verifies the section exists in the rendered prompt, which is fine. But this is a prompt change that modifies agent behavior — the real validation needed is eval/benchmark evidence.

Per the repo-specific review guidelines: "Do NOT submit an APPROVE review when the PR changes agent behavior or anything that could plausibly affect benchmark/evaluation performance." This is a system prompt change → behavioral change → eval risk.

No eval evidence has been provided in the PR description.


[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

This is a behavioral change to the default system prompt that affects every agent using the default template. The always-on, always-apply-everywhere scope of the citation instruction will likely degrade the agent's efficiency on routine coding tasks (more tokens spent on unnecessary citations, more verbose responses). The eval risk is real and unvalidated.

If the scope were narrowed (conditional flag, skill-based, or qualified with "when asked"), the risk would drop to 🟢 LOW.


VERDICT:

Needs rework: The always-on, unconditional scope is the core issue. Either narrow the applicability (conditional flag, softened language), or move it to a skill where it belongs. Flagging for human maintainer to decide after evals.

KEY INSIGHT:

Citation-of-sources is a domain-specific behavior (research/documentation), not a universal coding-agent behavior. Putting it in the always-on system prompt is like requiring a bibliography for every code comment.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

Was this review helpful? React with 👍 or 👎 to give feedback.

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 26, 2026

Summary of Work

I performed a code review on PR #3385 ("Add EVIDENCE_BASED_INFORMATION section to system prompt") in response to the /codereview-roasted comment from @juanmichelini, and posted the review as a COMMENT on the PR.

What I did:

  1. Retrieved the PR diff (2 files changed: system_prompt.j2 and test_system_prompt.py)
  2. Read the linked issue (Search for official links and evidence extensions#157) for context
  3. Verified all existing tests pass (9/9 passed)
  4. Posted a COMMENT review (not an APPROVE) on PR Add EVIDENCE_BASED_INFORMATION section to system prompt #3385

Review verdict: ❌ Needs rework (🟡 Acceptable taste, 🟡 MEDIUM risk)

Key findings:

  • Critical: The "always" qualifier is too broad — requiring citation scaffolding (Field/Value/Quote/Source) on every factual claim will make routine coding interactions verbose and counterproductive. This instruction was designed for a specific research/documentation use case but is applied universally to all agents.
  • Better alternatives proposed: (1) Make it a skill, (2) Add a conditional template flag, (3) Soften the scope to "when explicitly asked"
  • Testing gap: No eval/benchmark evidence provided for a system prompt change that modifies agent behavior
  • Per repo guidelines: Did NOT approve because this PR changes agent behavior that could plausibly affect benchmark/evaluation performance — flagged for human maintainer decision after evals

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search for official links and evidence

2 participants