Skip to content

Comments

Add context generation for BlueBoxAgent session replay#194

Merged
dimavrem22 merged 14 commits intomainfrom
context-generation
Feb 23, 2026
Merged

Add context generation for BlueBoxAgent session replay#194
dimavrem22 merged 14 commits intomainfrom
context-generation

Conversation

@dimavrem22
Copy link
Contributor

@dimavrem22 dimavrem22 commented Feb 23, 2026

Summary

  • Add BlueBoxAgentContext Pydantic model with JSON and Markdown round-trip serialization (to_markdown() / from_markdown())
  • Add generate_context agent tool that saves both a canonical JSON and companion Markdown file to the workspace context/ directory
  • Auto-discover and inject prior context into the system prompt on agent init (from workspace context/ dir or explicit --context-file CLI flag)
  • Add /generate_context slash command in the TUI that prompts the agent to distill the current session into a reusable context file
  • unit tests covering model serialization, markdown round-trip, context loading, prompt injection, truncation, and tool registration

Allow the agent to save a structured context file (JSON + Markdown) capturing
the successful path through a session — routines used, parameters, post-processing
code, and output description — so a new agent instance can replay it without
trial and error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dimavrem22 and others added 5 commits February 22, 2026 21:08
Users can now type `/generate_context focus on the flight search part` to
guide the agent toward a specific aspect of the session when generating
the context file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…formatting

Single source of truth — the markdown rendering logic lives on the model,
not duplicated in the agent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The nested Pydantic model produced a JSON schema with $defs/$ref that
LLMs couldn't reliably fill. Switch to flat parameters (goal, summary,
routines_used, etc.) and construct the model internally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
_auto_discover_context returns paths like 'workspace/context/file.json'
(already relative to cwd). _load_context_from_path was unconditionally
prepending workspace root_path again, producing 'workspace/workspace/...'.

Fix: try the path as-is first, only prepend workspace root if it doesn't
exist.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agent was generating context files with routines_used=[] despite having
executed routines. Two fixes:

1. _generate_context now falls back to _extract_routines_from_raw() which
   reads the raw/ execution result files and extracts routine_id, routine_name,
   and parameters from completed executions. This is a safety net — if the
   agent provides routines, those are used instead.

2. Strengthened the /generate_context prompt and system prompt rule to
   emphasize that routines_used must never be empty.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimavrem22 dimavrem22 marked this pull request as draft February 23, 2026 02:30
Remove the @agent_tool generate_context (LLM kept failing to fill
the schema) and replace it with a direct LLMClient.call_sync using
response_model=BlueBoxAgentContext via OpenAI structured output.

The /generate_context slash command now calls agent.generate_context()
directly instead of prompting the agent loop, using
previous_response_id to preserve conversation context without
polluting the chat history.

Also:
- Rename RoutineUsed → UsedRoutine, add UsedRoutineParameter with
  concrete types (str|bool|int|float) to satisfy OpenAI strict schema
- Create all workspace subdirs (raw/, outputs/, context/) in
  LocalWorkspace.__init__

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimavrem22
Copy link
Contributor Author

@claude review!

@claude
Copy link

claude bot commented Feb 23, 2026

Claude finished @dimavrem22's task in 2m 28s —— View job


Code Review

Tasks:

  • Gather context and understand request
  • Read all changed files
  • Analyze code quality, correctness, and test coverage
  • Post review feedback

Overall this is clean, well-structured work. The model design, round-trip serialization, auto-discovery logic, and context injection are all solid. The test coverage is thorough. A few issues worth fixing before merge:


Issues

1. Flaky test — time.sleep(0.05) may not be enough to differentiate mtimes

tests/unit/agents/test_bluebox_agent_context.py:233

time.sleep(0.05)  # ensure mtime differs

On some Linux filesystems (ext4 with relatime, tmpfs, CI NFS mounts), mtime granularity can be coarser than 50ms, making this test intermittently fail. Use os.utime to explicitly set the mtime instead:

old_file = context_dir / "old.json"
old_file.write_text(old.model_dump_json())
import os, time
os.utime(old_file, (time.time() - 1, time.time() - 1))  # force 1s in the past

Fix this →


2. import sys breaks import grouping

bluebox/scripts/run_bluebox_agent.py:28

import argparse
import shutil
from datetime import datetime
from pathlib import Path
from typing import TYPE_CHECKING, Any
import sys   # ← should be grouped with the other stdlib imports above

Per CLAUDE.md ("Group imports: stdlib, third-party, local (with blank lines between groups)"), import sys must be in the stdlib block at the top.

Fix this →


3. Auto-discovery only scans .json, not .md

bluebox/agents/bluebox_agent.py:340

json_files = sorted(context_dir.glob("*.json"), key=lambda p: p.stat().st_mtime, reverse=True)

Explicit context_file= accepts both .json and .md, but auto-discovery silently ignores .md files. If a user manually provides only a Markdown context file (no companion JSON), auto-discovery misses it. Either add .md discovery with JSON taking priority, or document this limitation in the docstring.


4. Status None included in "completed" filter — misleading comment

bluebox/agents/bluebox_agent.py:382

# Only include completed executions
if rr.get("status") not in ("completed", None):
    continue

The comment says "Only include completed executions" but status=None (missing field) is also accepted. If a raw file is missing the status key entirely, it passes through. Either update the comment to reflect the intent ("include completed or unknown-status executions") or tighten the check to != "completed" if failed routines should truly be excluded.


5. Truncation can slightly exceed _CONTEXT_PROMPT_MAX_CHARS

bluebox/agents/bluebox_agent.py:359–363

if len(section) > self._CONTEXT_PROMPT_MAX_CHARS:
    section = section[:self._CONTEXT_PROMPT_MAX_CHARS] + (
        "\n\n... (context truncated — use `read_workspace_file` to read "
        "the full context files in `context/` for more detail)"
    )

After slicing to _CONTEXT_PROMPT_MAX_CHARS, the appended hint adds ~100 characters, pushing the result over the limit. This is unlikely to matter in practice (100 extra chars on a 20K limit), but worth noting if the limit is ever used for strict token budgeting.


6. _show_context_success type annotation is Any

bluebox/scripts/run_bluebox_agent.py:170

def _show_context_success(self, context: Any) -> None:

Since generate_context returns BlueBoxAgentContext, this should be typed context: BlueBoxAgentContext. Per CLAUDE.md every function must have type hints.

Fix this →


7. assert isinstance(...) in TUI gives a cryptic error on failure

bluebox/scripts/run_bluebox_agent.py:164

assert isinstance(self._agent, BlueBoxAgent)

A bare assert produces an unhelpful AssertionError. Prefer an explicit check:

if not isinstance(self._agent, BlueBoxAgent):
    raise TypeError(f"Expected BlueBoxAgent, got {type(self._agent).__name__}")

Nits

  • TestContextLoading._make_agent return type (tests/unit/agents/test_bluebox_agent_context.py:185): annotated as Any but should be BlueBoxAgent. Same for TestContextPromptInjection._make_agent at line 291 and TestGenerateContext._make_agent at line 354.

  • test_from_markdown_roundtrip missing generated_at assertion (tests/unit/agents/test_bluebox_agent_context.py:138): the sample context has a fixed generated_at; the round-trip test checks all other fields but skips the timestamp. Adding assert restored.generated_at == sample_context.generated_at would increase confidence in the ISO format parsing.

  • AgentWorkspace.save_file docstring (bluebox/agents/workspace.py:51): the subdirectory description still says ("raw" or "outputs") — should include "context" now that context/ is a first-class directory.


Strengths

  • Clean Pydantic model with well-described fields and sensible defaults
  • The dual-format (JSON + Markdown) strategy is practical: JSON for machine parsing, Markdown for human readability and system-prompt injection
  • Safety net at generate_context:823 auto-populating routines_used from raw/ when the LLM leaves it empty is a nice guard
  • Truncation with a read_workspace_file hint is agent-friendly
  • Test coverage is comprehensive — good range of happy-path, edge cases (empty fields, bad JSON, missing files), and integration-level tests
  • Workspace test cleanup (removing redundant .mkdir() calls) is a clean side-effect fix
  • The "one-shot structured output call that doesn't advance _previous_response_id" pattern is well-commented and correct

- Remove unused re-exports from data_models/agents/__init__.py
- Fix import sys grouping in run_bluebox_agent.py
- Auto-discover .md context files (prefer .json, fall back to .md)
- Tighten _extract_routines_from_raw to only accept status=="completed"
- Type _show_context_success param as BlueBoxAgentContext instead of Any
- Replace bare assert isinstance with explicit TypeError
- Replace flaky time.sleep with os.utime in test_auto_discovers_most_recent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimavrem22
Copy link
Contributor Author

@claude review again

@dimavrem22 dimavrem22 marked this pull request as ready for review February 23, 2026 03:24
@claude
Copy link

claude bot commented Feb 23, 2026

Claude finished @dimavrem22's task in 2m 39s —— View job


Code Review (re-review)

Tasks:

  • Gather context and understand request
  • Read all changed files
  • Compare against previous review findings
  • Post updated review feedback

Great work addressing the feedback — 6 of 7 issues from the previous review are fixed. One new issue spotted, and a few nits remain.


Fixed Since Last Review ✓

# Issue Status
1 Flaky time.sleep(0.05) test ✅ Fixed — now uses os.utime with 10s offset
2 import sys import grouping ✅ Fixed — moved to stdlib block at line 25
3 Auto-discovery only scanned .json ✅ Fixed — now falls back to .md with JSON priority
4 status=None accepted in "completed" filter ✅ Fixed — now strictly != "completed"
5 _show_context_success typed as Any ✅ Fixed — now BlueBoxAgentContext at line 172
6 assert isinstance(...) cryptic error ✅ Fixed — TypeError with message at line 165

New Issue

Lazy imports in test file violate CLAUDE.md

tests/unit/agents/test_bluebox_agent_context.py:189, 294, 336, 357, 374

# Repeated in _make_agent() of every test class, and in two standalone tests:
from bluebox.agents.bluebox_agent import BlueBoxAgent
from bluebox.agents.workspace import LocalWorkspace

CLAUDE.md is explicit: "NO lazy imports! All imports must be at the top of the file." BlueBoxAgent and LocalWorkspace are imported inside method bodies in 5 separate places. They should be top-level imports alongside the existing from bluebox.data_models.agents.context import ... at line 18.

Fix this →


Remaining from Previous Review

Truncation slightly exceeds _CONTEXT_PROMPT_MAX_CHARS (still minor)

bluebox/agents/bluebox_agent.py:365–369

The appended hint (~100 chars) pushes the result over the 20_000 limit. Still acceptable unless this limit is ever used for strict token budgeting.


Remaining Nits

  • _make_agent return type Anytests/unit/agents/test_bluebox_agent_context.py:183, 293, 356: all three should return BlueBoxAgent instead of Any

  • Missing generated_at in test_from_markdown_roundtriptest_bluebox_agent_context.py:140: the round-trip check skips the timestamp field; adding assert restored.generated_at == sample_context.generated_at would confirm ISO format parsing works

  • AgentWorkspace.save_file docstringbluebox/agents/workspace.py:52: still says ("raw" or "outputs"), should include "context" now that it's a first-class subdirectory


Strengths

All the core logic is clean and the new _auto_discover_context fallback to .md is a nice improvement over the original. The strict != "completed" filter is correct and the TypeError message in the TUI is much more helpful. Ready to merge once the lazy-import issue is addressed.


…tamp assertion

- Move BlueBoxAgent/LocalWorkspace imports to file top (CLAUDE.md: no lazy imports)
- Fix _make_agent return types from Any to BlueBoxAgent
- Remove unused Any import
- Add generated_at assertion in test_from_markdown_roundtrip
- Update save_file docstring to include "context" subdirectory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimavrem22
Copy link
Contributor Author

@claude check again!

@claude
Copy link

claude bot commented Feb 23, 2026

Claude finished @dimavrem22's task in 1m 25s —— View job


Code Review (re-review #3)

Tasks:

  • Gather context and understand request
  • Read all changed files
  • Compare against previous review findings
  • Post updated review feedback

All previously flagged issues and nits from review #2 are now fixed. This PR is ready to merge.


All Issues Resolved ✓

# Issue Status
1 Flaky time.sleep(0.05) test ✅ Fixed — os.utime with 10s offset
2 import sys import grouping ✅ Fixed — stdlib block at line 25
3 Auto-discovery only scanned .json ✅ Fixed — falls back to .md
4 status=None accepted in "completed" filter ✅ Fixed — strictly != "completed"
5 _show_context_success typed as Any ✅ Fixed — BlueBoxAgentContext at line 172
6 assert isinstance(...) cryptic error ✅ Fixed — TypeError with message at line 166
7 Lazy imports in test file ✅ Fixed — top-level at lines 17–18

All Nits Resolved ✓

Nit Status
_make_agent return type Any (×3) ✅ Fixed — all three return BlueBoxAgent
Missing generated_at in test_from_markdown_roundtrip ✅ Fixed — assertion at line 156
AgentWorkspace.save_file docstring ✅ Fixed — includes "context" at line 52

Still Minor (not blocking)

Truncation slightly exceeds _CONTEXT_PROMPT_MAX_CHARSbluebox/agents/bluebox_agent.py:365–369

The appended hint (~100 chars) pushes the result over 20_000. Acceptable unless this limit is used for strict token budgeting.


No new issues found. The code is clean, all feedback has been addressed, and the test coverage is solid.


@rayruizhiliao

This comment was marked as duplicate.

dimavrem22 and others added 5 commits February 22, 2026 23:32
- generate_context now returns GenerateContextResult (NamedTuple) with
  context, json_path, and md_path so callers can display file locations
- TUI _show_context_success displays the saved JSON and Markdown paths
- TUI _print_welcome shows loaded context goal and routine count when
  a context file was auto-discovered or explicitly loaded
- Add loaded_context property on BlueBoxAgent to expose context state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… Files panel

Callers now build their own filenames — routine results use timestamps,
context files use fixed names (agent_context.json/md). Context files are
dispatched to the TUI Saved Files pane via _add_saved_file().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…irectly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dimavrem22 dimavrem22 merged commit 6285191 into main Feb 23, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants