Problem
Two modules have no dedicated unit tests:
src/millstone/runtime/context.py — ContextManager
- 3 public methods:
get_group_context, accumulate_group_context, extract_context_summary
- ~248 lines; zero test coverage
- Cross-task context sharing is a key feature that could silently break
src/millstone/utils.py
- Core utility functions used throughout the orchestrator:
is_empty_response (all schema types + edge cases)
extract_claude_result (JSON wrapper extraction, non-JSON passthrough)
summarize_output (short/long text)
filter_reasoning_traces (Codex thinking block removal)
summarize_diff (multi-file diffs, empty input)
is_whitespace_or_comment_only_change
progress() (BrokenPipeError handling)
- Zero unit tests; only tested indirectly through integration
Fix
Create:
tests/unit/test_context_manager.py (~100 LoC): mock callbacks, test empty diffs, missing group files, LLM extraction failures
tests/unit/test_utils.py (~150 LoC): one test per function, covering main paths and edge cases
Problem
Two modules have no dedicated unit tests:
src/millstone/runtime/context.py— ContextManagerget_group_context,accumulate_group_context,extract_context_summarysrc/millstone/utils.pyis_empty_response(all schema types + edge cases)extract_claude_result(JSON wrapper extraction, non-JSON passthrough)summarize_output(short/long text)filter_reasoning_traces(Codex thinking block removal)summarize_diff(multi-file diffs, empty input)is_whitespace_or_comment_only_changeprogress()(BrokenPipeError handling)Fix
Create:
tests/unit/test_context_manager.py(~100 LoC): mock callbacks, test empty diffs, missing group files, LLM extraction failurestests/unit/test_utils.py(~150 LoC): one test per function, covering main paths and edge cases