perf(scoring): parallelize per-requirement evaluation with asyncio.gather#63
Open
renegatux wants to merge 1 commit into
Open
perf(scoring): parallelize per-requirement evaluation with asyncio.gather#63renegatux wants to merge 1 commit into
renegatux wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactors
MinimalAgent.score_codeso that per-requirement evaluations run concurrently viaasyncio.gatherinstead of a sequentialforloop. Each requirement is evaluated by an independent LLM call, so parallelizing them is safe and substantially reduces wall-clock time.Motivation
Currently,
score_codeiterates overnode.requirementsand awaits one LLM call per requirement before moving to the next:This loop runs for every executed node — both drafts and tree-search iterations. With the default configuration (
num_draft_nodes=3,max_iterations=10) and N requirements per node, scoring contributes roughly13 * Nstrictly sequential LLM calls in the critical path of a single run.Change
async def _score_requirement(req).await asyncio.gather(...).import asyncioat the top of the module.No public API changes. No behavioral differences expected for individual requirements; only the order in which results return is non-deterministic, which the surrounding code does not rely on (
all_fulfilled, the feedback assembly loop, and the coverage confirmation step are all order-independent).Impact
Wall-clock time of the scoring phase scales with the slowest individual call instead of the sum of all calls. For a node with N requirements, this is up to an Nx speedup for that phase. Total API cost is unchanged (same number of calls).
Notes for reviewers
node,self.task_desc, andself._mcp_docsfrom the enclosing scope — the same references the original loop used.score_code(theReviewFunctioncall) already runs before this block, so the shared_MCP_CACHEinquery.pyis guaranteed to be warm before the parallel section. No cache-race window is introduced.Optional follow-up: bounded concurrency
The cached
MultiServerMCPClientis shared across all concurrentQuery.run()calls via a single stdio transport todocs_search_server. Iflangchain_mcp_adaptersdoes not safely multiplex concurrent tool calls over one stdio MCP client, or if a user's OpenAI tier has tight RPM limits, the unboundedgathercould become a problem.In that case, the fix is to bound the concurrency with a small semaphore:
This caps in-flight scoring calls at 3, which preserves most of the wall-clock benefit (~3x speedup) while removing any risk of overloading the shared MCP stdio channel. Happy to incorporate this into the PR directly if preferred.