A simulated business environment where an LLM agent autonomously sets prices, allocates marketing budget, and manages inventory. It uses MCTS to plan multi-week strategies and Bayesian belief tracking to learn demand curves from noisy observations.
Inspired by the challenge of AI systems that autonomously plan and execute business strategies under uncertainty.
- Experiment with agentic AI for business decisions in a simulated business environment
- Try MCTS + Bayesian Estimation for node uncertainy (like PlanU paper https://arxiv.org/pdf/2510.18442) and quantify advantages
- Compare LLM-guided vs. random action proposals for MCTS
- Experiment deeply with probabilistic programming tools like PyMC
Start simple (2 products, no seasonality, no cross-product effects, known costs) and progressively add complexity. This allows demonstrating the value of uncertainty handling at each level.
We expect:
- With 2 products and no seasonality, even a simple agent does okay.
- Add unknown elasticities and the Bayesian component becomes necessary.
- Add seasonality and multi-week inventory lag, and MCTS planning becomes necessary.
- Add cross-product substitution effects and you need both working together.
Test methods against baselines at every step to verify critical pieces.
The system to be tested is PlanU-style, where the full agent uses quantile distributions on the MCTS nodes. I.e. the MCTS uses distributional value estimates rather than mean values.
The LLM's primary role is proposing candidate actions for MCTS exploration; the mathematical demand model handles state transitions and the particle filter handles uncertainty tracking.
At each phase, the least complex inference + planning method that is expected to maximize performance is listed below.
| Phase | Complexity | Unknown parameters | Inference Method | Planning Method |
|---|---|---|---|---|
| 1. Basics | 2 products, fixed costs, no seasonality, no inventory constraint | 4 (base demand + elasticity per product) | Grid approximation | Greedy 1-step optimization (no MCTS) |
| 2. Add marketing | 2 products, marketing budget allocation | 6 (+ marketing response per product) | Grid approximation or Conjugate priors | Greedy 1-step optimization (no MCTS) |
| 3. Add seasonality | 2–3 products, seasonality, inventory lag, stockout penalties | 14–21 (+ 4 seasonal multipliers per product, for 2–3 products) | Particle filtering | MCTS |
| 4. Cross-product effects | 4–5 products, substitution/complementarity, seasonality, inventory lag | 40–55 (all previous + cross-elasticity per product pair, for 4–5 products) | Particle filtering; variational inference | MCTS (with wider branching) |
The full agent uses MCTS planning + Bayesian posteriors (particle filter) + LLM-guided action proposals. To isolate the contribution of each component, baselines vary one axis at a time across three dimensions: planning horizon, uncertainty handling, and action proposal method.
Additionally, a fixed heuristic baseline (set all markups to 1.5×, split marketing budget equally, reorder inventory when stock drops below a threshold) serves as a "no ML at all" reference point.
| # | Planning | Uncertainty | Action Proposals | What It Tests |
|---|---|---|---|---|
| B1 | Greedy | Point estimate | Random | Floor baseline — no intelligence |
| B2 | Greedy | Point estimate | LLM-guided | Value of LLM alone (no planning, no Bayesian) |
| B3 | Greedy | Bayesian | LLM-guided | Value of Bayesian alone (no planning) |
| B4 | MCTS | Point estimate | LLM-guided | Value of planning alone (no Bayesian) |
| B5 | MCTS | Bayesian | Random | Value of MCTS + Bayesian without LLM guidance |
| Full | MCTS | Bayesian | LLM-guided | Everything together |
"Bayesian" baselines use grid approximation in Phase 1–2 and particle filtering in Phase 3–4, matching the inference method from the Phases table.
Each adjacent comparison isolates exactly one variable:
| Comparison | Variable Isolated | Question Answered |
|---|---|---|
| B3 vs. Full | Greedy → MCTS | Does multi-step planning help? |
| B4 vs. Full | Point estimate → Bayesian | Does uncertainty handling help? |
| B5 vs. Full | Random → LLM-guided | Does the LLM as policy prior help? |
| B2 vs. B3 | Point estimate → Bayesian (both greedy) | Does Bayesian help even without planning? |
| B2 vs. B4 | Greedy → MCTS (both point estimate) | Does planning help even without Bayesian? |
In Phase 1–2 (no inventory lag, no seasonality), Baselines B1-B3 should be competitive with or match MCTS (B4-Full). That is, MCTS agents are not expected to show significant advantage over greedy. The chart would show all the intelligent agents clustered together well above random and heuristic. The story here is: "planning doesn't help much when decisions are independent."
In Phase 3-4 (inventory lag + seasonality), the separation is expected to appear. Greedy agents (B1-B2) would show periodic profit crashes — they get caught by stockouts when demand spikes seasonally because they didn't order inventory in advance. B3 would be expected to fare slightly better, having the ability to learn demand parameters. The MCTS agents would show smoother, higher cumulative profit because they anticipated the demand shift. The chart would look like: MCTS+Bayesian > MCTS-only > Greedy+Bayesian > Greedy > Heuristic > Random.
The most interesting comparison is Bayesian-without-planning vs. planning-without-Bayesian. In different scenarios, different ones win. Bayesian-without-planning excels when the main challenge is not knowing the demand parameters (early in the simulation, high uncertainty). Planning-without-Bayesian excels when parameters are roughly known but the challenge is sequential dependencies (inventory lag, seasonality). Planning-with-Bayesian is expected to outperform both.