feat: DB context prefetching to eliminate planner discovery steps by yassinejebbouri · Pull Request #416 · IBM/AssetOpsBench

yassinejebbouri · 2026-06-27T07:09:18Z

Summary

Before planning begins, prefetch assets, sensors, and failure modes from the
MCP servers and inject the results into the planner prompt. This allows the
planner to skip discovery steps entirely and go directly to the answer.

Results

Up to 5.7× faster on broad multi-asset queries (planner skips 5–11 discovery steps)
Regression on fast or targeted scenarios where the 3 s prefetch overhead exceeds the marginal savings

Files changed

src/workflow/runner.py — _prefetch_context() method and prefetch=True flag on run()

Test plan

Run runner.run(question, prefetch=True) and confirm discovery steps are absent from the plan
Run src/benchmarking/bench_opt0.py to reproduce speedup results

The previous implementation retried failures with a simple loop inside _call_relevancy. Moved retry/backoff logic to LiteLLM Router (exponential backoff, circuit breaker). Per-request timeout was also missing: the Router constructor timeout is not forwarded automatically to individual .completion() calls, so hung WatsonX requests could block indefinitely. Now passed explicitly on every call.

Two new parallel execution strategies for the N×M FM↔sensor mapping: - _mapping_adaptive_ceiling: fires all pairs concurrently from t=0 and halves the semaphore limit immediately on any 500 error. Avoids the AIMD ramp-up penalty for small N where additive increase finishes only after all work is already done. - _mapping_hedged: ceiling-start combined with speculative duplicate requests. If any call stalls past FMSR_HEDGE_AFTER_S (default 8s), a rescue copy is fired on a background thread. Whichever copy responds first wins, capping tail latency at ~2×hedge_after_s instead of the full 90s Router timeout. Also removed unused intermediate implementations (_mapping_batched_parallel, _mapping_async, _call_relevancy_async) that were never wired into the benchmark, and cleaned up asyncio import left behind after their removal.

psutil: hardware sampling (CPU%, memory RSS, thread count) during runs. filelock: safe append-only writes to the shared JSONL results file when multiple processes run the benchmark concurrently or resume after crash. matplotlib: benchmark visualization plots (wall time, speedup, per-call latency distribution, hardware utilization).

…nv var The benchmark controls which parallelization path runs by passing FMSR_STRATEGY in the subprocess environment when spawning the server. The tool interface (inputs, outputs) is unchanged — only the internal N×M dispatch is selected at startup from the env var: sequential — one LLM call at a time (baseline) parallel — fixed thread pool (FMSR_PARALLEL_WORKERS workers) adaptive_ceiling — ceiling-start semaphore, halves on 500 error hedged — ceiling-start + speculative duplicate on stall Default remains parallel with 2 workers, matching the original behaviour when FMSR_STRATEGY is not set.

…ntation bench_hardware.py — samples CPU%, memory RSS, and thread count via psutil at a configurable interval during each run bench_stats.py — t-distribution confidence intervals, per-call stat aggregation, and the build_summary() roll-up used by the main benchmark and plot generator bench_instrumentation.py — timing hooks that patch fmsr._call_relevancy to capture per-call latency and phase boundaries; used by the in-process debug runner (test_scenario.py), not by the MCP benchmark where the server runs out-of-process

Covers wall time grouped bars, speedup line charts, per-call latency box plots, hardware utilization, phase breakdowns, and scenario scaling. Strategies and colors are configurable; defaults match the 4-strategy set.

Calls get_failure_mode_sensor_mapping through the FMSR MCP server (stdio subprocess) for each of 4 strategies across 15 scenarios × 3 runs. This matches the real agent execution path exactly — same subprocess spawn, same stdio protocol, same tool interface as workflow/executor.py. Key design points: - Sensors fetched live from iot-mcp-server (CouchDB), not hardcoded - Failure modes fetched live from fmsr-mcp-server (YAML), not hardcoded - Per-scenario sensor/FM slices derived from the real fetched lists using keyword filtering that mirrors what the agent query implies - Strategy selected by passing FMSR_STRATEGY to the server subprocess env - Resume support: skips (run, scenario, strategy) triples already in JSONL - Results written to results_mcp/ to preserve existing results/

test_scenario.py — debug tool that imports the FMSR server in-process and patches _call_relevancy to print every (sensor, FM) pair live as it executes. Sensors and failure modes are fetched from the live MCP servers at startup. Useful for tracing individual LLM calls without running the full benchmark. Usage: uv run python -m src.benchmarking.test_scenario --scenario 109 --strategy hedged eval_fmsr.py — original sequential-vs-parallel evaluation script that predates the multi-run benchmark; kept as a reference baseline and for quick one-off comparisons.

Planner (planner.py): - Add optional context_block injection into the plan prompt - When real asset/sensor/failure-mode data is available up-front, the LLM receives it as a concrete context block and is instructed to skip discovery steps and write direct tool arguments instead Runner (runner.py): - Add _prefetch_context() which calls IoTAgent.assets, IoTAgent.sensors, and FMSRAgent.get_failure_modes via live MCP servers before planning - Returns (context_str, call_timings) with per-call wall times for assets, sensors, and failure modes separately - run() accepts prefetch=True flag; sub-call timings are recorded as named phases (prefetch_assets, prefetch_sensors, prefetch_failure_modes) for fine-grained breakdown in downstream analysis Benchmark (bench_opt0.py): - N_RUNS configurable runs per scenario (default 3, override via env) - Records wall time, plan steps, failed steps, and full phase breakdown including per-prefetch-call timings per individual run - Aggregates mean +/- std / min / max across runs per scenario x condition - Computes net time saved = (baseline_execute - prefetch_execute) - prefetch_overhead - Prints side-by-side comparison table with speedup and net-saved columns - Generates 7 plots: wall_time, plan_steps, speedup, phase_breakdown, prefetch_overhead_breakdown, net_time_saved, failed_steps - Baseline strategy: sequential (matching the proposal 70-call serial baseline)

DariefMaes and others added 10 commits April 6, 2026 17:50

profiling of times

76e30aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: DB context prefetching to eliminate planner discovery steps#416

feat: DB context prefetching to eliminate planner discovery steps#416
yassinejebbouri wants to merge 10 commits into
IBM:mainfrom
yassinejebbouri:opt0-prefetch-benchmark

yassinejebbouri commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yassinejebbouri commented Jun 27, 2026

Summary

Results

Files changed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants