feat: parallel FMSR dispatch strategies for N×M LLM call bottleneck by yassinejebbouri · Pull Request #414 · IBM/AssetOpsBench

yassinejebbouri · 2026-06-27T07:06:44Z

Summary

Implements four execution strategies for the FMSR N×M bottleneck, where each
(failure mode, sensor) pair requires one sequential LLM call by default.

Sequential (baseline): one call at a time
Parallel dispatch: fixed thread pool with configurable concurrency cap
Adaptive ceiling-start (AIMD): starts at max concurrency, halves on rate-limit errors, increments on consecutive successes
Hedged execution: fires a duplicate request after 8 s of silence, uses whichever response arrives first and cancels the other — caps p95 latency at ~16 s

Results

Hedged achieves up to 36× speedup on worst-case scenarios (559 s → 15.5 s)
Adaptive ceiling is the best cost-efficient strategy (~20×, no doubled API usage)
Strategy selected at runtime via the FMSR_STRATEGY environment variable

Files changed

src/servers/fmsr/main.py

Test plan

Run an FMSR scenario with FMSR_STRATEGY=hedged
Run an FMSR scenario with FMSR_STRATEGY=adaptive_ceiling
Run src/benchmarking/bench_fmsr.py to reproduce the full comparison

The previous implementation retried failures with a simple loop inside _call_relevancy. Moved retry/backoff logic to LiteLLM Router (exponential backoff, circuit breaker). Per-request timeout was also missing: the Router constructor timeout is not forwarded automatically to individual .completion() calls, so hung WatsonX requests could block indefinitely. Now passed explicitly on every call.

Two new parallel execution strategies for the N×M FM↔sensor mapping: - _mapping_adaptive_ceiling: fires all pairs concurrently from t=0 and halves the semaphore limit immediately on any 500 error. Avoids the AIMD ramp-up penalty for small N where additive increase finishes only after all work is already done. - _mapping_hedged: ceiling-start combined with speculative duplicate requests. If any call stalls past FMSR_HEDGE_AFTER_S (default 8s), a rescue copy is fired on a background thread. Whichever copy responds first wins, capping tail latency at ~2×hedge_after_s instead of the full 90s Router timeout. Also removed unused intermediate implementations (_mapping_batched_parallel, _mapping_async, _call_relevancy_async) that were never wired into the benchmark, and cleaned up asyncio import left behind after their removal.

psutil: hardware sampling (CPU%, memory RSS, thread count) during runs. filelock: safe append-only writes to the shared JSONL results file when multiple processes run the benchmark concurrently or resume after crash. matplotlib: benchmark visualization plots (wall time, speedup, per-call latency distribution, hardware utilization).

…nv var The benchmark controls which parallelization path runs by passing FMSR_STRATEGY in the subprocess environment when spawning the server. The tool interface (inputs, outputs) is unchanged — only the internal N×M dispatch is selected at startup from the env var: sequential — one LLM call at a time (baseline) parallel — fixed thread pool (FMSR_PARALLEL_WORKERS workers) adaptive_ceiling — ceiling-start semaphore, halves on 500 error hedged — ceiling-start + speculative duplicate on stall Default remains parallel with 2 workers, matching the original behaviour when FMSR_STRATEGY is not set.

…ntation bench_hardware.py — samples CPU%, memory RSS, and thread count via psutil at a configurable interval during each run bench_stats.py — t-distribution confidence intervals, per-call stat aggregation, and the build_summary() roll-up used by the main benchmark and plot generator bench_instrumentation.py — timing hooks that patch fmsr._call_relevancy to capture per-call latency and phase boundaries; used by the in-process debug runner (test_scenario.py), not by the MCP benchmark where the server runs out-of-process

Covers wall time grouped bars, speedup line charts, per-call latency box plots, hardware utilization, phase breakdowns, and scenario scaling. Strategies and colors are configurable; defaults match the 4-strategy set.

Calls get_failure_mode_sensor_mapping through the FMSR MCP server (stdio subprocess) for each of 4 strategies across 15 scenarios × 3 runs. This matches the real agent execution path exactly — same subprocess spawn, same stdio protocol, same tool interface as workflow/executor.py. Key design points: - Sensors fetched live from iot-mcp-server (CouchDB), not hardcoded - Failure modes fetched live from fmsr-mcp-server (YAML), not hardcoded - Per-scenario sensor/FM slices derived from the real fetched lists using keyword filtering that mirrors what the agent query implies - Strategy selected by passing FMSR_STRATEGY to the server subprocess env - Resume support: skips (run, scenario, strategy) triples already in JSONL - Results written to results_mcp/ to preserve existing results/

test_scenario.py — debug tool that imports the FMSR server in-process and patches _call_relevancy to print every (sensor, FM) pair live as it executes. Sensors and failure modes are fetched from the live MCP servers at startup. Useful for tracing individual LLM calls without running the full benchmark. Usage: uv run python -m src.benchmarking.test_scenario --scenario 109 --strategy hedged eval_fmsr.py — original sequential-vs-parallel evaluation script that predates the multi-run benchmark; kept as a reference baseline and for quick one-off comparisons.

DariefMaes and others added 9 commits April 6, 2026 17:50

profiling of times

76e30aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: parallel FMSR dispatch strategies for N×M LLM call bottleneck#414

feat: parallel FMSR dispatch strategies for N×M LLM call bottleneck#414
yassinejebbouri wants to merge 9 commits into
IBM:mainfrom
yassinejebbouri:parallelization-results

yassinejebbouri commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yassinejebbouri commented Jun 27, 2026

Summary

Results

Files changed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants