Skip to content

perf(scoring): parallelize per-requirement evaluation with asyncio.gather#63

Open
renegatux wants to merge 1 commit into
ISG-Siegen:developfrom
renegatux:feature/parallel-requirement-scoring
Open

perf(scoring): parallelize per-requirement evaluation with asyncio.gather#63
renegatux wants to merge 1 commit into
ISG-Siegen:developfrom
renegatux:feature/parallel-requirement-scoring

Conversation

@renegatux

Copy link
Copy Markdown

Summary

Refactors MinimalAgent.score_code so that per-requirement evaluations run concurrently via asyncio.gather instead of a sequential for loop. Each requirement is evaluated by an independent LLM call, so parallelizing them is safe and substantially reduces wall-clock time.

Motivation

Currently, score_code iterates over node.requirements and awaits one LLM call per requirement before moving to the next:

for req in node.requirements:
    scoring_result = await Query(...).run(scoring_prompt, ScoreCode)

This loop runs for every executed node — both drafts and tree-search iterations. With the default configuration (num_draft_nodes=3, max_iterations=10) and N requirements per node, scoring contributes roughly 13 * N strictly sequential LLM calls in the critical path of a single run.

Change

  • Wraps the per-requirement scoring logic into an inner async def _score_requirement(req).
  • Dispatches all requirements concurrently with await asyncio.gather(...).
  • Existing per-requirement exception handling (mark unfulfilled, leave fallback feedback) is preserved verbatim.
  • Adds import asyncio at the top of the module.

No public API changes. No behavioral differences expected for individual requirements; only the order in which results return is non-deterministic, which the surrounding code does not rely on (all_fulfilled, the feedback assembly loop, and the coverage confirmation step are all order-independent).

Impact

Wall-clock time of the scoring phase scales with the slowest individual call instead of the sum of all calls. For a node with N requirements, this is up to an Nx speedup for that phase. Total API cost is unchanged (same number of calls).

Notes for reviewers

  • The inner async function captures node, self.task_desc, and self._mcp_docs from the enclosing scope — the same references the original loop used.
  • The bug-review LLM call earlier in score_code (the ReviewFunction call) already runs before this block, so the shared _MCP_CACHE in query.py is guaranteed to be warm before the parallel section. No cache-race window is introduced.

Optional follow-up: bounded concurrency

The cached MultiServerMCPClient is shared across all concurrent Query.run() calls via a single stdio transport to docs_search_server. If langchain_mcp_adapters does not safely multiplex concurrent tool calls over one stdio MCP client, or if a user's OpenAI tier has tight RPM limits, the unbounded gather could become a problem.

In that case, the fix is to bound the concurrency with a small semaphore:

_scoring_sem = asyncio.Semaphore(3)
async def _bounded_score(req):
    async with _scoring_sem:
        await _score_requirement(req)
await asyncio.gather(*(_bounded_score(req) for req in node.requirements))

This caps in-flight scoring calls at 3, which preserves most of the wall-clock benefit (~3x speedup) while removing any risk of overloading the shared MCP stdio channel. Happy to incorporate this into the PR directly if preferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant