perf(scoring): parallelize per-requirement evaluation with asyncio.gather by renegatux · Pull Request #63 · ISG-Siegen/AutoRecLab

renegatux · 2026-06-28T16:32:36Z

Summary

Refactors MinimalAgent.score_code so that per-requirement evaluations run concurrently via asyncio.gather instead of a sequential for loop. Each requirement is evaluated by an independent LLM call, so parallelizing them is safe and substantially reduces wall-clock time.

Motivation

Currently, score_code iterates over node.requirements and awaits one LLM call per requirement before moving to the next:

for req in node.requirements:
    scoring_result = await Query(...).run(scoring_prompt, ScoreCode)

This loop runs for every executed node — both drafts and tree-search iterations. With the default configuration (num_draft_nodes=3, max_iterations=10) and N requirements per node, scoring contributes roughly 13 * N strictly sequential LLM calls in the critical path of a single run.

Change

Wraps the per-requirement scoring logic into an inner async def _score_requirement(req).
Dispatches all requirements concurrently with await asyncio.gather(...).
Existing per-requirement exception handling (mark unfulfilled, leave fallback feedback) is preserved verbatim.
Adds import asyncio at the top of the module.

No public API changes. No behavioral differences expected for individual requirements; only the order in which results return is non-deterministic, which the surrounding code does not rely on (all_fulfilled, the feedback assembly loop, and the coverage confirmation step are all order-independent).

Impact

Wall-clock time of the scoring phase scales with the slowest individual call instead of the sum of all calls. For a node with N requirements, this is up to an Nx speedup for that phase. Total API cost is unchanged (same number of calls).

Notes for reviewers

The inner async function captures node, self.task_desc, and self._mcp_docs from the enclosing scope — the same references the original loop used.
The bug-review LLM call earlier in score_code (the ReviewFunction call) already runs before this block, so the shared _MCP_CACHE in query.py is guaranteed to be warm before the parallel section. No cache-race window is introduced.

Optional follow-up: bounded concurrency

The cached MultiServerMCPClient is shared across all concurrent Query.run() calls via a single stdio transport to docs_search_server. If langchain_mcp_adapters does not safely multiplex concurrent tool calls over one stdio MCP client, or if a user's OpenAI tier has tight RPM limits, the unbounded gather could become a problem.

In that case, the fix is to bound the concurrency with a small semaphore:

_scoring_sem = asyncio.Semaphore(3)
async def _bounded_score(req):
    async with _scoring_sem:
        await _score_requirement(req)
await asyncio.gather(*(_bounded_score(req) for req in node.requirements))

This caps in-flight scoring calls at 3, which preserves most of the wall-clock benefit (~3x speedup) while removing any risk of overloading the shared MCP stdio channel. Happy to incorporate this into the PR directly if preferred.

…ther

perf(scoring): parallelize per-requirement evaluation with asyncio.ga…

32e43ae

…ther

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(scoring): parallelize per-requirement evaluation with asyncio.gather#63

perf(scoring): parallelize per-requirement evaluation with asyncio.gather#63
renegatux wants to merge 1 commit into
ISG-Siegen:developfrom
renegatux:feature/parallel-requirement-scoring

renegatux commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

renegatux commented Jun 28, 2026

Summary

Motivation

Change

Impact

Notes for reviewers

Optional follow-up: bounded concurrency

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant