Add context generation for BlueBoxAgent session replay#194
Conversation
Allow the agent to save a structured context file (JSON + Markdown) capturing the successful path through a session — routines used, parameters, post-processing code, and output description — so a new agent instance can replay it without trial and error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Users can now type `/generate_context focus on the flight search part` to guide the agent toward a specific aspect of the session when generating the context file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…formatting Single source of truth — the markdown rendering logic lives on the model, not duplicated in the agent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The nested Pydantic model produced a JSON schema with $defs/$ref that LLMs couldn't reliably fill. Switch to flat parameters (goal, summary, routines_used, etc.) and construct the model internally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_auto_discover_context returns paths like 'workspace/context/file.json' (already relative to cwd). _load_context_from_path was unconditionally prepending workspace root_path again, producing 'workspace/workspace/...'. Fix: try the path as-is first, only prepend workspace root if it doesn't exist. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agent was generating context files with routines_used=[] despite having executed routines. Two fixes: 1. _generate_context now falls back to _extract_routines_from_raw() which reads the raw/ execution result files and extracts routine_id, routine_name, and parameters from completed executions. This is a safety net — if the agent provides routines, those are used instead. 2. Strengthened the /generate_context prompt and system prompt rule to emphasize that routines_used must never be empty. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the @agent_tool generate_context (LLM kept failing to fill the schema) and replace it with a direct LLMClient.call_sync using response_model=BlueBoxAgentContext via OpenAI structured output. The /generate_context slash command now calls agent.generate_context() directly instead of prompting the agent loop, using previous_response_id to preserve conversation context without polluting the chat history. Also: - Rename RoutineUsed → UsedRoutine, add UsedRoutineParameter with concrete types (str|bool|int|float) to satisfy OpenAI strict schema - Create all workspace subdirs (raw/, outputs/, context/) in LocalWorkspace.__init__ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude review! |
|
Claude finished @dimavrem22's task in 2m 28s —— View job Code ReviewTasks:
Overall this is clean, well-structured work. The model design, round-trip serialization, auto-discovery logic, and context injection are all solid. The test coverage is thorough. A few issues worth fixing before merge: Issues1. Flaky test —
time.sleep(0.05) # ensure mtime differsOn some Linux filesystems (ext4 with old_file = context_dir / "old.json"
old_file.write_text(old.model_dump_json())
import os, time
os.utime(old_file, (time.time() - 1, time.time() - 1)) # force 1s in the past2.
import argparse
import shutil
from datetime import datetime
from pathlib import Path
from typing import TYPE_CHECKING, Any
import sys # ← should be grouped with the other stdlib imports abovePer CLAUDE.md ("Group imports: stdlib, third-party, local (with blank lines between groups)"), 3. Auto-discovery only scans
json_files = sorted(context_dir.glob("*.json"), key=lambda p: p.stat().st_mtime, reverse=True)Explicit 4. Status
# Only include completed executions
if rr.get("status") not in ("completed", None):
continueThe comment says "Only include completed executions" but 5. Truncation can slightly exceed
if len(section) > self._CONTEXT_PROMPT_MAX_CHARS:
section = section[:self._CONTEXT_PROMPT_MAX_CHARS] + (
"\n\n... (context truncated — use `read_workspace_file` to read "
"the full context files in `context/` for more detail)"
)After slicing to 6.
def _show_context_success(self, context: Any) -> None:Since 7.
assert isinstance(self._agent, BlueBoxAgent)A bare if not isinstance(self._agent, BlueBoxAgent):
raise TypeError(f"Expected BlueBoxAgent, got {type(self._agent).__name__}")Nits
Strengths
|
- Remove unused re-exports from data_models/agents/__init__.py - Fix import sys grouping in run_bluebox_agent.py - Auto-discover .md context files (prefer .json, fall back to .md) - Tighten _extract_routines_from_raw to only accept status=="completed" - Type _show_context_success param as BlueBoxAgentContext instead of Any - Replace bare assert isinstance with explicit TypeError - Replace flaky time.sleep with os.utime in test_auto_discovers_most_recent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude review again |
|
Claude finished @dimavrem22's task in 2m 39s —— View job Code Review (re-review)Tasks:
Great work addressing the feedback — 6 of 7 issues from the previous review are fixed. One new issue spotted, and a few nits remain. Fixed Since Last Review ✓
New IssueLazy imports in test file violate CLAUDE.md
# Repeated in _make_agent() of every test class, and in two standalone tests:
from bluebox.agents.bluebox_agent import BlueBoxAgent
from bluebox.agents.workspace import LocalWorkspaceCLAUDE.md is explicit: "NO lazy imports! All imports must be at the top of the file." Remaining from Previous ReviewTruncation slightly exceeds
The appended hint (~100 chars) pushes the result over the Remaining Nits
StrengthsAll the core logic is clean and the new |
…tamp assertion - Move BlueBoxAgent/LocalWorkspace imports to file top (CLAUDE.md: no lazy imports) - Fix _make_agent return types from Any to BlueBoxAgent - Remove unused Any import - Add generated_at assertion in test_from_markdown_roundtrip - Update save_file docstring to include "context" subdirectory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@claude check again! |
|
Claude finished @dimavrem22's task in 1m 25s —— View job Code Review (re-review #3)Tasks:
All previously flagged issues and nits from review #2 are now fixed. This PR is ready to merge. All Issues Resolved ✓
All Nits Resolved ✓
Still Minor (not blocking)Truncation slightly exceeds The appended hint (~100 chars) pushes the result over No new issues found. The code is clean, all feedback has been addressed, and the test coverage is solid. |
This comment was marked as duplicate.
This comment was marked as duplicate.
- generate_context now returns GenerateContextResult (NamedTuple) with context, json_path, and md_path so callers can display file locations - TUI _show_context_success displays the saved JSON and Markdown paths - TUI _print_welcome shows loaded context goal and routine count when a context file was auto-discovered or explicitly loaded - Add loaded_context property on BlueBoxAgent to expose context state Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… Files panel Callers now build their own filenames — routine results use timestamps, context files use fixed names (agent_context.json/md). Context files are dispatched to the TUI Saved Files pane via _add_saved_file(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…irectly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update readme for `context`
Summary
BlueBoxAgentContextPydantic model with JSON and Markdown round-trip serialization (to_markdown()/from_markdown())generate_contextagent tool that saves both a canonical JSON and companion Markdown file to the workspacecontext/directorycontext/dir or explicit--context-fileCLI flag)/generate_contextslash command in the TUI that prompts the agent to distill the current session into a reusable context file