Trogers/sre hard assertion improvements#85
Open
teriyakichild wants to merge 2 commits into
Open
Conversation
Add deterministic tool-trace assertions grounded in the mock MCP's canned responses instead of relying solely on answer-text substring matching. New assertion types: - tool_args_contains: verify tool arguments (exact or substring) - tools_not_called: negative tool assertions (routing discipline) - answer_contains_any: alternative acceptable strings - answer_not_contains: hallucination guards Updated all 5 SRE-hard prompt specs with worker dispatch, tool name, tool argument, and negative assertions. Fixed answer_contains values to match actual probe paths (/healthz/ready, /actuator/health/ready). Tightened category_min from 6 to 7 (all data is in one API call). Bug fixes: - Readiness loops in runner now fail on timeout instead of silently falling through - model_name_from_config strips sre-hard-e2e- prefix - extract-continuation-frames.sh path iteration fixed Refs: #32
…prompt
The coordinator was producing meta-summaries referencing task numbers
("see Task 8") instead of inlining concrete findings from task results.
The user never sees task results directly, so these references were
opaque.
Root cause: neither the continuation prompt nor the respond_directly
tool description told the coordinator that its response IS the final
user-facing answer. The tool description actively misdirected toward
"general knowledge."
Changes:
- continuation_prompt.md: add synthesis rules requiring the coordinator
to inline all concrete data points and never reference tasks by number
- respond_directly tool: update description and response field to
clarify that the response is the only text the user sees
Before: multi-category-findings 0/9 alert names in answer
After: multi-category-findings 9/9 alert names in answer
Refs: #32
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.