test: add synthesizer deterministic contract tests by kavishkartha05 · Pull Request #47 · Codelab-Davis/Anchor

kavishkartha05 · 2026-05-07T01:56:28Z

Pull Request

Use a Conventional Commit title: feat: ..., fix: ..., docs: ..., chore: ..., refactor: ..., test: ..., build: ..., ci: ..., perf: ..., or revert: ....

Summary

Adds deterministic offline tests for all three branches of Synthesizer.synthesize(): empty chunks, single-chunk fast path, and multi-chunk prompt construction. Verifies the model is never called for the first two paths and that prompt content, source labels, and (current) question marking are correct for the multi-chunk path.

Linked context

Closes #18

PR Size

Small (< 100 LOC delta)

PR Type

Test

Context / Motivation

Synthesizer had no unit tests despite having three distinct code paths with specific output contracts. This PR pins those contracts so future changes to synthesis logic have a regression baseline.

Changes

Added tests/unit/test_synthesizer.py with 6 tests:

Empty chunk list returns fallback string, model never called
Single chunk returns formatted content directly, model never called
Multi-chunk with questions — prompt includes chunk content and [Source: X] labels
Single question in multi-chunk case is marked (current)
Multiple questions in multi-chunk case — only the final one is marked (current), others are plain
Multi-chunk without questions — prompt still includes chunk content and source labels

How to test

uv run pytest -m unit -v
All 6 tests in test_synthesizer.py should pass with no network or Ollama dependency

Checklist

I ran the relevant local checks.
I updated documentation or confirmed no docs changes were needed.
I linked related issues or pull requests and confirmed this is not duplicate work.
This change stays within the stated scope of the PR.

Notes for reviewers

The _spy_model helper follows the same pattern as test_decomposer_prompts.py, in that it captures messages[0]["content"] so assertions read against real prompt text. The single-chunk fast path test asserts the model is never called, which essentially documents the existing short-circuit behaviour and catches any future accidental removal

Summary by CodeRabbit

Tests
- Added comprehensive test coverage for synthesizer functionality to ensure reliability across various input scenarios.

coderabbitai · 2026-05-07T01:56:45Z

📝 Walkthrough

Walkthrough

A new test module is added for Synthesizer.synthesize() with a _spy_model helper that captures model prompts and six test cases covering empty input, single-chunk fast path, and multi-chunk scenarios with question marking behavior.

Changes

Synthesizer Unit Tests

Layer / File(s)	Summary
Test Helper & Empty Case `tests/unit/test_synthesizer.py` (lines 1–25)	`_spy_model` helper captures the first message content from the synthesizer and records model calls. Empty chunk list returns fallback message without invoking the model.
Single Chunk Fast Path `tests/unit/test_synthesizer.py` (lines 28–34)	Single chunk returns formatted output with `[Source: ...]` header without calling the model.
Multi-Chunk Prompt Construction `tests/unit/test_synthesizer.py` (lines 37–93)	Multi-chunk scenarios (with and without questions) generate a prompt containing all chunk contents and source tags, and invoke the model exactly once. Question marking tests verify that only the last question is suffixed with `"(current)"` in multi-question cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Issue #33: Related eval tests for Synthesizer.synthesize() using a real model to verify prompt-driven outputs.

Suggested reviewers

AlexanderGardiner

Poem

🐰 A spy on prompts, with tests so keen,
Chunks and questions, a deterministic scene,
Empty, single, multi—all paths explored,
The Synthesizer's contract now assured! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically identifies the main change: adding deterministic contract tests for the Synthesizer class.
Linked Issues check	✅ Passed	All coding requirements from issue `#18` are met: tests cover empty chunks, single-chunk fast path, and multi-chunk scenarios; model invocation is verified; prompt content and source labels are validated; current-marking is tested for single and multiple questions.
Out of Scope Changes check	✅ Passed	All changes are within scope—the PR adds only unit tests for Synthesizer.synthesize() with no evaluation of synthesis quality, prompt wording changes, or loop orchestration coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

tests/unit/test_synthesizer.py (2)

8-14: ⚡ Quick win

Make _spy_model resilient to message ordering.

Capturing messages[0]["content"] assumes index 0 is always the user prompt. Filtering by role makes these tests less brittle to harmless internal message-structure changes.

Suggested diff

 def _spy_model(response: str = "synthesized answer"):
     """Returns (model_fn, calls). Each call appends the user-role prompt text."""
     calls: list[str] = []

     def model_fn(messages: list[dict]) -> str:
-        calls.append(messages[0]["content"])
+        user_msg = next((m for m in messages if m.get("role") == "user"), messages[-1])
+        calls.append(user_msg["content"])
         return response

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_synthesizer.py` around lines 8 - 14, The test helper
_spy_model currently captures messages[0]["content"], which breaks if message
order changes; update the model_fn inside _spy_model to locate the user prompt
by filtering the messages list for entries where message["role"] == "user"
(e.g., pick the first matching user message or join multiple user contents) and
append that content to calls so the test records the user prompt regardless of
message ordering.

37-50: ⚡ Quick win

Assert model-response passthrough and explicit call count in all multi-chunk tests.

For multi-chunk paths, asserting the returned value equals the model response (and consistently asserting len(calls) == 1) strengthens the contract and gives clearer failures.

Suggested diff

 def test_multi_chunk_with_questions_prompt_contains_chunk_content_and_sources() -> None:
     model_fn, calls = _spy_model()
@@
-    Synthesizer(model_fn).synthesize(chunks, questions=["What is X?"])
+    result = Synthesizer(model_fn).synthesize(chunks, questions=["What is X?"])
     assert len(calls) == 1
+    assert result == "synthesized answer"
     prompt = calls[0]
@@
 def test_multi_chunk_with_single_question_marks_it_current() -> None:
@@
-    Synthesizer(model_fn).synthesize(chunks, questions=["Tell me about alpha"])
+    result = Synthesizer(model_fn).synthesize(chunks, questions=["Tell me about alpha"])
+    assert len(calls) == 1
+    assert result == "synthesized answer"
     assert "Tell me about alpha (current)" in calls[0]
@@
 def test_multi_chunk_with_multiple_questions_only_last_is_marked_current() -> None:
@@
-    Synthesizer(model_fn).synthesize(chunks, questions=questions)
+    result = Synthesizer(model_fn).synthesize(chunks, questions=questions)
+    assert len(calls) == 1
+    assert result == "synthesized answer"
     prompt = calls[0]
@@
 def test_multi_chunk_without_questions_prompt_contains_chunk_content_and_sources() -> None:
@@
-    Synthesizer(model_fn).synthesize(chunks)
+    result = Synthesizer(model_fn).synthesize(chunks)
     assert len(calls) == 1
+    assert result == "synthesized answer"
     prompt = calls[0]

Also applies to: 53-60, 64-78, 81-93

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_synthesizer.py` around lines 37 - 50, The multi-chunk tests
should assert the Synthesizer.synthesize return value equals the fake model
response and consistently assert a single model invocation; update each
multi-chunk test (e.g.,
test_multi_chunk_with_questions_prompt_contains_chunk_content_and_sources and
the other multi-chunk test functions that use _spy_model and calls) to capture
the synthesizer result (result = Synthesizer(model_fn).synthesize(...)), add
assert len(calls) == 1 immediately after calling synthesize, and add an
assertion like assert result == calls[0] (or the expected model response) so the
tests verify both single-call behavior and passthrough of the model response.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/test_synthesizer.py`:
- Around line 8-14: The test helper _spy_model currently captures
messages[0]["content"], which breaks if message order changes; update the
model_fn inside _spy_model to locate the user prompt by filtering the messages
list for entries where message["role"] == "user" (e.g., pick the first matching
user message or join multiple user contents) and append that content to calls so
the test records the user prompt regardless of message ordering.
- Around line 37-50: The multi-chunk tests should assert the
Synthesizer.synthesize return value equals the fake model response and
consistently assert a single model invocation; update each multi-chunk test
(e.g., test_multi_chunk_with_questions_prompt_contains_chunk_content_and_sources
and the other multi-chunk test functions that use _spy_model and calls) to
capture the synthesizer result (result = Synthesizer(model_fn).synthesize(...)),
add assert len(calls) == 1 immediately after calling synthesize, and add an
assertion like assert result == calls[0] (or the expected model response) so the
tests verify both single-call behavior and passthrough of the model response.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 390e0c01-f583-4c29-adbe-c9ed03032cb0

📥 Commits

Reviewing files that changed from the base of the PR and between 6c7558c and cc3453f.

📒 Files selected for processing (1)

tests/unit/test_synthesizer.py

test: add synthesizer deterministic contract tests

cc3453f

kavishkartha05 requested a review from AlexanderGardiner as a code owner May 7, 2026 01:56

github-actions Bot added the area: tests Automated tests and test infrastructure. label May 7, 2026

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add synthesizer deterministic contract tests#47

test: add synthesizer deterministic contract tests#47
kavishkartha05 wants to merge 1 commit intoCodelab-Davis:mainfrom
kavishkartha05:kavish/synthesizer-contract-tests

kavishkartha05 commented May 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kavishkartha05 commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Summary

Linked context

PR Size

PR Type

Context / Motivation

Changes

How to test

Checklist

Notes for reviewers

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kavishkartha05 commented May 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading