Backport agent e2e suite infrastructure and first backport test by TomasKorbar · Pull Request #510 · packit/ai-workflows

TomasKorbar · 2026-05-21T07:29:39Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces end-to-end (E2E) tests for the backport agent, featuring a new LLM-as-judge evaluation framework and artifact capture utilities. The backport agent's workflow has been refactored for improved modularity, and the triage skill documentation was significantly expanded to provide a more detailed workflow description. Infrastructure updates include new Makefile targets, Docker Compose services, and support for global git configurations in mock environments. Feedback on the new E2E tests identifies critical issues regarding the use of asyncio.run() within fixtures and test functions while pytest-asyncio is active, which can lead to runtime errors; the reviewer recommends converting these to async def and using await directly.

opohorel

LGTM, nice job!
I've verified that DRY_RUN backport works and I've run the test locally, which also passed

TomasTomecek

Opus is truly impressed:

Verdict

The core refactoring is solid and achieves its goal of making the backport workflow testable. The e2e test infrastructure is thoughtfully designed with the mock repo setup, artifact capture, and optional LLM judge layers. The mutable default field and type annotation issues are minor but worth fixing. The SKILL.md regeneration adds noise to the diff but doesn't change behavior. Overall this is good work — address the mutable default and it's ready.

I think these two issues it found are worth addressing:

Mutable default in BackportState (backport_agent.py around line 839 in the PR):
backport_log: list[str] = Field(default=[])

Pydantic's Field(default=[]) shares the same list object across instances. This should be Field(default_factory=list). The original code had the same bug inside main() but it was less likely to trigger since
only one State was created per process. With the class now at module level and used in tests, this is more risky.

local_tool_options initialized outside the mcp_tools context (backport_agent.py line ~862): local_tool_options = {"working_directory": None}
silent_run = os.getenv("SILENT_RUN", "false").lower() == "true"

local_tool_options is now created at the top of run_workflow(), but silent_run reads from env — this is fine for production but in tests SILENT_RUN is never set explicitly so it defaults to false. Not a bug
but worth documenting.

TomasTomecek · 2026-05-21T13:19:32Z

+    VERDICT: FAIL
+
+Before the verdict, provide a brief explanation for each criterion.
+"""


truly fascinating that the evaluation is this short, 200 lines of python, no need for a dedicated framework - brilliant!

TomasTomecek · 2026-05-21T13:20:43Z

+{diff_section}
+
+{spec_section}
+
+{patches_section}
+
+{result_section}
+
+{reference_section}


shouldn't all of these have headers so the model knows what that output is?

The headers are in the definion of these strings.

lbarcziova

this is great! 🏅

nforro

LGTM

TomasKorbar added 3 commits May 21, 2026 08:44

Add first test for backport

f7b3d48

Add new constraint concerning gitweb repositories to triage

e1caec3

Regenerate triage skill

7d9ac4b

TomasKorbar requested review from nforro and opohorel May 21, 2026 07:29

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

Comment thread ymir/agents/tests/e2e/backport_agent/test_backport.py

Comment thread ymir/agents/tests/e2e/backport_agent/test_backport.py

Comment thread ymir/agents/tests/e2e/backport_agent/test_backport.py

opohorel previously approved these changes May 21, 2026

View reviewed changes

nforro previously approved these changes May 21, 2026

View reviewed changes

TomasTomecek previously approved these changes May 21, 2026

View reviewed changes

TomasKorbar added 6 commits May 21, 2026 16:04

Fix mock_triage script

0b40642

Make Judge output in backport testing a json object

a1fa68e

Add diff as allowed extension of new patch in backport testing

f84be8e

Rename load_all_mock_configs to load_all_fixture_configs

59f3465

Use unidiff for parsing of patches in testing

6fc1e79

Set immutable default to BackportState

683e71a

lbarcziova previously approved these changes May 21, 2026

View reviewed changes

TomasKorbar dismissed stale reviews from lbarcziova, TomasTomecek, nforro, and opohorel via 683e71a May 21, 2026 14:57

nforro approved these changes May 21, 2026

View reviewed changes

TomasKorbar merged commit 91fe1b1 into packit:main May 21, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport agent e2e suite infrastructure and first backport test#510

Backport agent e2e suite infrastructure and first backport test#510
TomasKorbar merged 9 commits into
packit:mainfrom
TomasKorbar:backport-first-fix

TomasKorbar commented May 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

opohorel left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomasTomecek left a comment

Uh oh!

TomasTomecek May 21, 2026

Uh oh!

TomasTomecek May 21, 2026

Uh oh!

TomasKorbar May 21, 2026

Uh oh!

lbarcziova left a comment

Uh oh!

nforro left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

TomasKorbar commented May 21, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

opohorel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomasTomecek left a comment

Choose a reason for hiding this comment

Uh oh!

TomasTomecek May 21, 2026

Choose a reason for hiding this comment

Uh oh!

TomasTomecek May 21, 2026

Choose a reason for hiding this comment

Uh oh!

TomasKorbar May 21, 2026

Choose a reason for hiding this comment

Uh oh!

lbarcziova left a comment

Choose a reason for hiding this comment

Uh oh!

nforro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants