Add context generation for BlueBoxAgent session replay by dimavrem22 · Pull Request #194 · VectorlyApp/bluebox

dimavrem22 · 2026-02-23T01:54:02Z

Summary

Add BlueBoxAgentContext Pydantic model with JSON and Markdown round-trip serialization (to_markdown() / from_markdown())
Add generate_context agent tool that saves both a canonical JSON and companion Markdown file to the workspace context/ directory
Auto-discover and inject prior context into the system prompt on agent init (from workspace context/ dir or explicit --context-file CLI flag)
Add /generate_context slash command in the TUI that prompts the agent to distill the current session into a reusable context file
unit tests covering model serialization, markdown round-trip, context loading, prompt injection, truncation, and tool registration

Allow the agent to save a structured context file (JSON + Markdown) capturing the successful path through a session — routines used, parameters, post-processing code, and output description — so a new agent instance can replay it without trial and error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Users can now type `/generate_context focus on the flight search part` to guide the agent toward a specific aspect of the session when generating the context file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…formatting Single source of truth — the markdown rendering logic lives on the model, not duplicated in the agent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The nested Pydantic model produced a JSON schema with $defs/$ref that LLMs couldn't reliably fill. Switch to flat parameters (goal, summary, routines_used, etc.) and construct the model internally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

_auto_discover_context returns paths like 'workspace/context/file.json' (already relative to cwd). _load_context_from_path was unconditionally prepending workspace root_path again, producing 'workspace/workspace/...'. Fix: try the path as-is first, only prepend workspace root if it doesn't exist. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The agent was generating context files with routines_used=[] despite having executed routines. Two fixes: 1. _generate_context now falls back to _extract_routines_from_raw() which reads the raw/ execution result files and extracts routine_id, routine_name, and parameters from completed executions. This is a safety net — if the agent provides routines, those are used instead. 2. Strengthened the /generate_context prompt and system prompt rule to emphasize that routines_used must never be empty. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove the @agent_tool generate_context (LLM kept failing to fill the schema) and replace it with a direct LLMClient.call_sync using response_model=BlueBoxAgentContext via OpenAI structured output. The /generate_context slash command now calls agent.generate_context() directly instead of prompting the agent loop, using previous_response_id to preserve conversation context without polluting the chat history. Also: - Rename RoutineUsed → UsedRoutine, add UsedRoutineParameter with concrete types (str|bool|int|float) to satisfy OpenAI strict schema - Create all workspace subdirs (raw/, outputs/, context/) in LocalWorkspace.__init__ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimavrem22 · 2026-02-23T03:15:35Z

@claude review!

claude · 2026-02-23T03:15:51Z

Claude finished @dimavrem22's task in 2m 28s —— View job

Code Review

Tasks:

Gather context and understand request
Read all changed files
Analyze code quality, correctness, and test coverage
Post review feedback

Overall this is clean, well-structured work. The model design, round-trip serialization, auto-discovery logic, and context injection are all solid. The test coverage is thorough. A few issues worth fixing before merge:

Issues

1. Flaky test — time.sleep(0.05) may not be enough to differentiate mtimes

tests/unit/agents/test_bluebox_agent_context.py:233

time.sleep(0.05)  # ensure mtime differs

On some Linux filesystems (ext4 with relatime, tmpfs, CI NFS mounts), mtime granularity can be coarser than 50ms, making this test intermittently fail. Use os.utime to explicitly set the mtime instead:

old_file = context_dir / "old.json"
old_file.write_text(old.model_dump_json())
import os, time
os.utime(old_file, (time.time() - 1, time.time() - 1))  # force 1s in the past

Fix this →

2. import sys breaks import grouping

bluebox/scripts/run_bluebox_agent.py:28

import argparse
import shutil
from datetime import datetime
from pathlib import Path
from typing import TYPE_CHECKING, Any
import sys   # ← should be grouped with the other stdlib imports above

Per CLAUDE.md ("Group imports: stdlib, third-party, local (with blank lines between groups)"), import sys must be in the stdlib block at the top.

Fix this →

3. Auto-discovery only scans .json, not .md

bluebox/agents/bluebox_agent.py:340

json_files = sorted(context_dir.glob("*.json"), key=lambda p: p.stat().st_mtime, reverse=True)

Explicit context_file= accepts both .json and .md, but auto-discovery silently ignores .md files. If a user manually provides only a Markdown context file (no companion JSON), auto-discovery misses it. Either add .md discovery with JSON taking priority, or document this limitation in the docstring.

4. Status None included in "completed" filter — misleading comment

bluebox/agents/bluebox_agent.py:382

# Only include completed executions
if rr.get("status") not in ("completed", None):
    continue

The comment says "Only include completed executions" but status=None (missing field) is also accepted. If a raw file is missing the status key entirely, it passes through. Either update the comment to reflect the intent ("include completed or unknown-status executions") or tighten the check to != "completed" if failed routines should truly be excluded.

5. Truncation can slightly exceed _CONTEXT_PROMPT_MAX_CHARS

bluebox/agents/bluebox_agent.py:359–363

if len(section) > self._CONTEXT_PROMPT_MAX_CHARS:
    section = section[:self._CONTEXT_PROMPT_MAX_CHARS] + (
        "\n\n... (context truncated — use `read_workspace_file` to read "
        "the full context files in `context/` for more detail)"
    )

After slicing to _CONTEXT_PROMPT_MAX_CHARS, the appended hint adds ~100 characters, pushing the result over the limit. This is unlikely to matter in practice (100 extra chars on a 20K limit), but worth noting if the limit is ever used for strict token budgeting.

6. _show_context_success type annotation is Any

bluebox/scripts/run_bluebox_agent.py:170

def _show_context_success(self, context: Any) -> None:

Since generate_context returns BlueBoxAgentContext, this should be typed context: BlueBoxAgentContext. Per CLAUDE.md every function must have type hints.

Fix this →

7. assert isinstance(...) in TUI gives a cryptic error on failure

bluebox/scripts/run_bluebox_agent.py:164

assert isinstance(self._agent, BlueBoxAgent)

A bare assert produces an unhelpful AssertionError. Prefer an explicit check:

if not isinstance(self._agent, BlueBoxAgent):
    raise TypeError(f"Expected BlueBoxAgent, got {type(self._agent).__name__}")

Nits

TestContextLoading._make_agent return type (tests/unit/agents/test_bluebox_agent_context.py:185): annotated as Any but should be BlueBoxAgent. Same for TestContextPromptInjection._make_agent at line 291 and TestGenerateContext._make_agent at line 354.
test_from_markdown_roundtrip missing generated_at assertion (tests/unit/agents/test_bluebox_agent_context.py:138): the sample context has a fixed generated_at; the round-trip test checks all other fields but skips the timestamp. Adding assert restored.generated_at == sample_context.generated_at would increase confidence in the ISO format parsing.
AgentWorkspace.save_file docstring (bluebox/agents/workspace.py:51): the subdirectory description still says ("raw" or "outputs") — should include "context" now that context/ is a first-class directory.

Strengths

Clean Pydantic model with well-described fields and sensible defaults
The dual-format (JSON + Markdown) strategy is practical: JSON for machine parsing, Markdown for human readability and system-prompt injection
Safety net at generate_context:823 auto-populating routines_used from raw/ when the LLM leaves it empty is a nice guard
Truncation with a read_workspace_file hint is agent-friendly
Test coverage is comprehensive — good range of happy-path, edge cases (empty fields, bad JSON, missing files), and integration-level tests
Workspace test cleanup (removing redundant .mkdir() calls) is a clean side-effect fix
The "one-shot structured output call that doesn't advance _previous_response_id" pattern is well-commented and correct

- Remove unused re-exports from data_models/agents/__init__.py - Fix import sys grouping in run_bluebox_agent.py - Auto-discover .md context files (prefer .json, fall back to .md) - Tighten _extract_routines_from_raw to only accept status=="completed" - Type _show_context_success param as BlueBoxAgentContext instead of Any - Replace bare assert isinstance with explicit TypeError - Replace flaky time.sleep with os.utime in test_auto_discovers_most_recent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimavrem22 · 2026-02-23T03:24:09Z

@claude review again

claude · 2026-02-23T03:24:21Z

Claude finished @dimavrem22's task in 2m 39s —— View job

Code Review (re-review)

Tasks:

Gather context and understand request
Read all changed files
Compare against previous review findings
Post updated review feedback

Great work addressing the feedback — 6 of 7 issues from the previous review are fixed. One new issue spotted, and a few nits remain.

Fixed Since Last Review ✓

#	Issue	Status
1	Flaky `time.sleep(0.05)` test	✅ Fixed — now uses `os.utime` with 10s offset
2	`import sys` import grouping	✅ Fixed — moved to stdlib block at line 25
3	Auto-discovery only scanned `.json`	✅ Fixed — now falls back to `.md` with JSON priority
4	`status=None` accepted in "completed" filter	✅ Fixed — now strictly `!= "completed"`
5	`_show_context_success` typed as `Any`	✅ Fixed — now `BlueBoxAgentContext` at line 172
6	`assert isinstance(...)` cryptic error	✅ Fixed — `TypeError` with message at line 165

New Issue

Lazy imports in test file violate CLAUDE.md

tests/unit/agents/test_bluebox_agent_context.py:189, 294, 336, 357, 374

# Repeated in _make_agent() of every test class, and in two standalone tests:
from bluebox.agents.bluebox_agent import BlueBoxAgent
from bluebox.agents.workspace import LocalWorkspace

CLAUDE.md is explicit: "NO lazy imports! All imports must be at the top of the file." BlueBoxAgent and LocalWorkspace are imported inside method bodies in 5 separate places. They should be top-level imports alongside the existing from bluebox.data_models.agents.context import ... at line 18.

Fix this →

Remaining from Previous Review

Truncation slightly exceeds _CONTEXT_PROMPT_MAX_CHARS (still minor)

bluebox/agents/bluebox_agent.py:365–369

The appended hint (~100 chars) pushes the result over the 20_000 limit. Still acceptable unless this limit is ever used for strict token budgeting.

Remaining Nits

_make_agent return type Any — tests/unit/agents/test_bluebox_agent_context.py:183, 293, 356: all three should return BlueBoxAgent instead of Any
Missing generated_at in test_from_markdown_roundtrip — test_bluebox_agent_context.py:140: the round-trip check skips the timestamp field; adding assert restored.generated_at == sample_context.generated_at would confirm ISO format parsing works
AgentWorkspace.save_file docstring — bluebox/agents/workspace.py:52: still says ("raw" or "outputs"), should include "context" now that it's a first-class subdirectory

Strengths

All the core logic is clean and the new _auto_discover_context fallback to .md is a nice improvement over the original. The strict != "completed" filter is correct and the TypeError message in the TUI is much more helpful. Ready to merge once the lazy-import issue is addressed.

…tamp assertion - Move BlueBoxAgent/LocalWorkspace imports to file top (CLAUDE.md: no lazy imports) - Fix _make_agent return types from Any to BlueBoxAgent - Remove unused Any import - Add generated_at assertion in test_from_markdown_roundtrip - Update save_file docstring to include "context" subdirectory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimavrem22 · 2026-02-23T03:37:17Z

@claude check again!

claude · 2026-02-23T03:37:33Z

Claude finished @dimavrem22's task in 1m 25s —— View job

Code Review (re-review #3)

Tasks:

Gather context and understand request
Read all changed files
Compare against previous review findings
Post updated review feedback

All previously flagged issues and nits from review #2 are now fixed. This PR is ready to merge.

All Issues Resolved ✓

#	Issue	Status
1	Flaky `time.sleep(0.05)` test	✅ Fixed — `os.utime` with 10s offset
2	`import sys` import grouping	✅ Fixed — stdlib block at line 25
3	Auto-discovery only scanned `.json`	✅ Fixed — falls back to `.md`
4	`status=None` accepted in "completed" filter	✅ Fixed — strictly `!= "completed"`
5	`_show_context_success` typed as `Any`	✅ Fixed — `BlueBoxAgentContext` at line 172
6	`assert isinstance(...)` cryptic error	✅ Fixed — `TypeError` with message at line 166
7	Lazy imports in test file	✅ Fixed — top-level at lines 17–18

All Nits Resolved ✓

Nit	Status
`_make_agent` return type `Any` (×3)	✅ Fixed — all three return `BlueBoxAgent`
Missing `generated_at` in `test_from_markdown_roundtrip`	✅ Fixed — assertion at line 156
`AgentWorkspace.save_file` docstring	✅ Fixed — includes `"context"` at line 52

Still Minor (not blocking)

Truncation slightly exceeds _CONTEXT_PROMPT_MAX_CHARS — bluebox/agents/bluebox_agent.py:365–369

The appended hint (~100 chars) pushes the result over 20_000. Acceptable unless this limit is used for strict token budgeting.

No new issues found. The code is clean, all feedback has been addressed, and the test coverage is solid.

bluebox/agents/bluebox_agent.py

- generate_context now returns GenerateContextResult (NamedTuple) with context, json_path, and md_path so callers can display file locations - TUI _show_context_success displays the saved JSON and Markdown paths - TUI _print_welcome shows loaded context goal and routine count when a context file was auto-discovered or explicitly loaded - Add loaded_context property on BlueBoxAgent to expose context state Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… Files panel Callers now build their own filenames — routine results use timestamps, context files use fixed names (agent_context.json/md). Context files are dispatched to the TUI Saved Files pane via _add_saved_file(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…irectly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update readme for `context`

dimavrem22 requested review from alex-w-99 and rayruizhiliao as code owners February 23, 2026 01:54

dimavrem22 and others added 5 commits February 22, 2026 21:08

Allow /generate_context to accept an optional focus prompt

08b7eba

Users can now type `/generate_context focus on the flight search part` to guide the agent toward a specific aspect of the session when generating the context file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use to_markdown() for context prompt injection instead of hand-built …

24a3d20

…formatting Single source of truth — the markdown rendering logic lives on the model, not duplicated in the agent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dimavrem22 marked this pull request as draft February 23, 2026 02:30

dimavrem22 marked this pull request as ready for review February 23, 2026 03:24

This comment was marked as duplicate.

Sign in to view

rayruizhiliao reviewed Feb 23, 2026

View reviewed changes

bluebox/agents/bluebox_agent.py Show resolved Hide resolved

rayruizhiliao reviewed Feb 23, 2026

View reviewed changes

bluebox/agents/bluebox_agent.py Show resolved Hide resolved

rayruizhiliao reviewed Feb 23, 2026

View reviewed changes

bluebox/agents/bluebox_agent.py Show resolved Hide resolved

dimavrem22 and others added 5 commits February 22, 2026 23:32

Remove GenerateContextResult NamedTuple, return BlueBoxAgentContext d…

2b86171

…irectly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

update readme

4d67767

Merge pull request #195 from VectorlyApp/context-readme

1e530ab

Update readme for `context`

rayruizhiliao approved these changes Feb 23, 2026

View reviewed changes

dimavrem22 merged commit 6285191 into main Feb 23, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add context generation for BlueBoxAgent session replay#194

Add context generation for BlueBoxAgent session replay#194
dimavrem22 merged 14 commits intomainfrom
context-generation

dimavrem22 commented Feb 23, 2026 •

edited

Loading

Uh oh!

dimavrem22 commented Feb 23, 2026

Uh oh!

claude bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

dimavrem22 commented Feb 23, 2026

Uh oh!

claude bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

dimavrem22 commented Feb 23, 2026

Uh oh!

claude bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

This comment was marked as duplicate.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

dimavrem22 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

dimavrem22 commented Feb 23, 2026

Uh oh!

claude bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Issues

Nits

Strengths

Uh oh!

dimavrem22 commented Feb 23, 2026

Uh oh!

claude bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review (re-review)

Fixed Since Last Review ✓

New Issue

Remaining from Previous Review

Remaining Nits

Strengths

Uh oh!

dimavrem22 commented Feb 23, 2026

Uh oh!

claude bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review (re-review #3)

All Issues Resolved ✓

All Nits Resolved ✓

Still Minor (not blocking)

Uh oh!

This comment was marked as duplicate.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dimavrem22 commented Feb 23, 2026 •

edited

Loading

claude bot commented Feb 23, 2026 •

edited

Loading

claude bot commented Feb 23, 2026 •

edited

Loading

claude bot commented Feb 23, 2026 •

edited

Loading