Skip to content

test: add synthesizer deterministic contract tests#47

Open
kavishkartha05 wants to merge 1 commit intoCodelab-Davis:mainfrom
kavishkartha05:kavish/synthesizer-contract-tests
Open

test: add synthesizer deterministic contract tests#47
kavishkartha05 wants to merge 1 commit intoCodelab-Davis:mainfrom
kavishkartha05:kavish/synthesizer-contract-tests

Conversation

@kavishkartha05
Copy link
Copy Markdown
Contributor

@kavishkartha05 kavishkartha05 commented May 7, 2026

Pull Request

Use a Conventional Commit title: feat: ..., fix: ..., docs: ..., chore: ..., refactor: ..., test: ..., build: ..., ci: ..., perf: ..., or revert: ....

Summary

Adds deterministic offline tests for all three branches of Synthesizer.synthesize(): empty chunks, single-chunk fast path, and multi-chunk prompt construction. Verifies the model is never called for the first two paths and that prompt content, source labels, and (current) question marking are correct for the multi-chunk path.

Linked context

Closes #18

PR Size

  • Small (< 100 LOC delta)

PR Type

  • Test

Context / Motivation

Synthesizer had no unit tests despite having three distinct code paths with specific output contracts. This PR pins those contracts so future changes to synthesis logic have a regression baseline.

Changes

Added tests/unit/test_synthesizer.py with 6 tests:

  • Empty chunk list returns fallback string, model never called
  • Single chunk returns formatted content directly, model never called
  • Multi-chunk with questions — prompt includes chunk content and [Source: X] labels
  • Single question in multi-chunk case is marked (current)
  • Multiple questions in multi-chunk case — only the final one is marked (current), others are plain
  • Multi-chunk without questions — prompt still includes chunk content and source labels

How to test

  1. uv run pytest -m unit -v
  2. All 6 tests in test_synthesizer.py should pass with no network or Ollama dependency

Checklist

  • I ran the relevant local checks.
  • I updated documentation or confirmed no docs changes were needed.
  • I linked related issues or pull requests and confirmed this is not duplicate work.
  • This change stays within the stated scope of the PR.

Notes for reviewers

The _spy_model helper follows the same pattern as test_decomposer_prompts.py, in that it captures messages[0]["content"] so assertions read against real prompt text. The single-chunk fast path test asserts the model is never called, which essentially documents the existing short-circuit behaviour and catches any future accidental removal

Summary by CodeRabbit

  • Tests
    • Added comprehensive test coverage for synthesizer functionality to ensure reliability across various input scenarios.

@github-actions github-actions Bot added the area: tests Automated tests and test infrastructure. label May 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

📝 Walkthrough

Walkthrough

A new test module is added for Synthesizer.synthesize() with a _spy_model helper that captures model prompts and six test cases covering empty input, single-chunk fast path, and multi-chunk scenarios with question marking behavior.

Changes

Synthesizer Unit Tests

Layer / File(s) Summary
Test Helper & Empty Case
tests/unit/test_synthesizer.py (lines 1–25)
_spy_model helper captures the first message content from the synthesizer and records model calls. Empty chunk list returns fallback message without invoking the model.
Single Chunk Fast Path
tests/unit/test_synthesizer.py (lines 28–34)
Single chunk returns formatted output with [Source: ...] header without calling the model.
Multi-Chunk Prompt Construction
tests/unit/test_synthesizer.py (lines 37–93)
Multi-chunk scenarios (with and without questions) generate a prompt containing all chunk contents and source tags, and invoke the model exactly once. Question marking tests verify that only the last question is suffixed with "(current)" in multi-question cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

  • Issue #33: Related eval tests for Synthesizer.synthesize() using a real model to verify prompt-driven outputs.

Suggested reviewers

  • AlexanderGardiner

Poem

🐰 A spy on prompts, with tests so keen,
Chunks and questions, a deterministic scene,
Empty, single, multi—all paths explored,
The Synthesizer's contract now assured! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: adding deterministic contract tests for the Synthesizer class.
Linked Issues check ✅ Passed All coding requirements from issue #18 are met: tests cover empty chunks, single-chunk fast path, and multi-chunk scenarios; model invocation is verified; prompt content and source labels are validated; current-marking is tested for single and multiple questions.
Out of Scope Changes check ✅ Passed All changes are within scope—the PR adds only unit tests for Synthesizer.synthesize() with no evaluation of synthesis quality, prompt wording changes, or loop orchestration coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/unit/test_synthesizer.py (2)

8-14: ⚡ Quick win

Make _spy_model resilient to message ordering.

Capturing messages[0]["content"] assumes index 0 is always the user prompt. Filtering by role makes these tests less brittle to harmless internal message-structure changes.

Suggested diff
 def _spy_model(response: str = "synthesized answer"):
     """Returns (model_fn, calls). Each call appends the user-role prompt text."""
     calls: list[str] = []

     def model_fn(messages: list[dict]) -> str:
-        calls.append(messages[0]["content"])
+        user_msg = next((m for m in messages if m.get("role") == "user"), messages[-1])
+        calls.append(user_msg["content"])
         return response
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_synthesizer.py` around lines 8 - 14, The test helper
_spy_model currently captures messages[0]["content"], which breaks if message
order changes; update the model_fn inside _spy_model to locate the user prompt
by filtering the messages list for entries where message["role"] == "user"
(e.g., pick the first matching user message or join multiple user contents) and
append that content to calls so the test records the user prompt regardless of
message ordering.

37-50: ⚡ Quick win

Assert model-response passthrough and explicit call count in all multi-chunk tests.

For multi-chunk paths, asserting the returned value equals the model response (and consistently asserting len(calls) == 1) strengthens the contract and gives clearer failures.

Suggested diff
 def test_multi_chunk_with_questions_prompt_contains_chunk_content_and_sources() -> None:
     model_fn, calls = _spy_model()
@@
-    Synthesizer(model_fn).synthesize(chunks, questions=["What is X?"])
+    result = Synthesizer(model_fn).synthesize(chunks, questions=["What is X?"])
     assert len(calls) == 1
+    assert result == "synthesized answer"
     prompt = calls[0]
@@
 def test_multi_chunk_with_single_question_marks_it_current() -> None:
@@
-    Synthesizer(model_fn).synthesize(chunks, questions=["Tell me about alpha"])
+    result = Synthesizer(model_fn).synthesize(chunks, questions=["Tell me about alpha"])
+    assert len(calls) == 1
+    assert result == "synthesized answer"
     assert "Tell me about alpha (current)" in calls[0]
@@
 def test_multi_chunk_with_multiple_questions_only_last_is_marked_current() -> None:
@@
-    Synthesizer(model_fn).synthesize(chunks, questions=questions)
+    result = Synthesizer(model_fn).synthesize(chunks, questions=questions)
+    assert len(calls) == 1
+    assert result == "synthesized answer"
     prompt = calls[0]
@@
 def test_multi_chunk_without_questions_prompt_contains_chunk_content_and_sources() -> None:
@@
-    Synthesizer(model_fn).synthesize(chunks)
+    result = Synthesizer(model_fn).synthesize(chunks)
     assert len(calls) == 1
+    assert result == "synthesized answer"
     prompt = calls[0]

Also applies to: 53-60, 64-78, 81-93

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_synthesizer.py` around lines 37 - 50, The multi-chunk tests
should assert the Synthesizer.synthesize return value equals the fake model
response and consistently assert a single model invocation; update each
multi-chunk test (e.g.,
test_multi_chunk_with_questions_prompt_contains_chunk_content_and_sources and
the other multi-chunk test functions that use _spy_model and calls) to capture
the synthesizer result (result = Synthesizer(model_fn).synthesize(...)), add
assert len(calls) == 1 immediately after calling synthesize, and add an
assertion like assert result == calls[0] (or the expected model response) so the
tests verify both single-call behavior and passthrough of the model response.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/test_synthesizer.py`:
- Around line 8-14: The test helper _spy_model currently captures
messages[0]["content"], which breaks if message order changes; update the
model_fn inside _spy_model to locate the user prompt by filtering the messages
list for entries where message["role"] == "user" (e.g., pick the first matching
user message or join multiple user contents) and append that content to calls so
the test records the user prompt regardless of message ordering.
- Around line 37-50: The multi-chunk tests should assert the
Synthesizer.synthesize return value equals the fake model response and
consistently assert a single model invocation; update each multi-chunk test
(e.g., test_multi_chunk_with_questions_prompt_contains_chunk_content_and_sources
and the other multi-chunk test functions that use _spy_model and calls) to
capture the synthesizer result (result = Synthesizer(model_fn).synthesize(...)),
add assert len(calls) == 1 immediately after calling synthesize, and add an
assertion like assert result == calls[0] (or the expected model response) so the
tests verify both single-call behavior and passthrough of the model response.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 390e0c01-f583-4c29-adbe-c9ed03032cb0

📥 Commits

Reviewing files that changed from the base of the PR and between 6c7558c and cc3453f.

📒 Files selected for processing (1)
  • tests/unit/test_synthesizer.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: tests Automated tests and test infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[task] Add synthesizer deterministic contract tests

1 participant