An AI agent that acts as an intelligent supply chain analyst. Given a retail dataset, it autonomously explores data, engineers features, trains forecasting models, and answers natural language questions about demand — combining machine learning fundamentals with modern AI agent architecture.
Supply chain planning depends on accurate demand forecasts. Traditional approaches require analysts to manually load data, run models, interpret metrics, and generate reports — a time-consuming pipeline that doesn't scale.
Instead of a static forecasting script, this project implements a reasoning agent that decides what to do based on the question asked. Ask "which products are most volatile?" and it runs a volatility analysis. Ask "what if demand spikes 30%?" and it simulates the scenario with inventory impact calculations. The agent chains multiple tools together when needed — finding the hardest product to forecast, predicting its demand, and generating a chart, all from a single request.
The agent is also backed by a RAG pipeline: it can retrieve product descriptions and client policy documents from a vector store to ground its answers in domain context, not just model outputs.
User (HTTP request or terminal)
│
▼
┌───────────────────────────────┐
│ FastAPI (api.py) │ ← REST endpoint (POST /ask)
│ Receives JSON { question } │
└──────────────┬────────────────┘
│
▼
┌───────────────────────────────┐
│ Agent (LangGraph) │
│ Claude LLM ←→ Agent State │
│ Reasoning + tool selection │
└──────┬────────────────────────┘
│ tool calls
┌───┼──────────────────────────────┐
▼ ▼ ▼ ▼ ▼
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────────┐
│ Data │ │Fore- │ │Analy-│ │ Viz │ │ RAG │
│Tools │ │cast │ │sis │ │Tools │ │ Tool │
│ │ │Tools │ │Tools │ │ │ │ │
└──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ └────┬─────┘
└────────┴─────────┴────────┘ │
│ ▼
┌────────┴──────────┐ ┌────────────────────┐
│ ML Pipeline │ │ ChromaDB │
│ (pure Python) │ │ Vector Store │
│ │ │ │
│ data_loader │ │ Product docs │
│ feature_eng. │ │ Policy docs │
│ model (XGBoost) │ │ all-MiniLM-L6-v2 │
│ visualizations │ └────────────────────┘
└───────────────────┘
Key design decisions:
- ML pipeline is framework-agnostic.
model.pyandfeature_engineering.pyare pure Python with pandas/scikit-learn/XGBoost — zero dependency on LangChain. The tools layer is a thin wrapper. If the agent framework changes, only the wrapper needs updating. - RAG is decoupled from the ML pipeline. ChromaDB and sentence-transformers run independently. The agent decides when to call the RAG tool based on the question; it doesn't run on every request.
- Model caching in agent state. The trained XGBoost model is cached in memory after the first training call. Subsequent prediction and analysis requests reuse the cached model instantly instead of retraining.
End-to-end pipeline: load data → train model → predict → visualize.
A real supply chain investigation: identify volatile products → compare across stores → simulate demand spike → visualize.
One prompt, three chained tool calls. The agent finds the hardest product to forecast, predicts its demand, and generates the chart — all from a single ambiguous request.
End-to-end HTTP workflow: query the agent via FastAPI → RAG retrieval grounds the answer in product context → what-if simulation with chart. 📹 Watch demo
- Data exploration — Loads and summarizes 913K rows of retail sales data (10 stores × 50 items × 5 years)
- Automated feature engineering — Generates 24 features: time-based (day of week, month, quarter), lag features (1/7/14/28 day), rolling statistics (mean and std over 7/14/30 day windows), and holiday indicators
- XGBoost forecasting — Trains with early stopping, time-based train/test split, and per-item evaluation
- Natural language Q&A — Ask questions like "predict demand for item 5 in store 1" or "which products are hardest to forecast?"
- What-if simulation — Simulate demand spikes and see inventory shortfall impact
- Store comparison — Compare demand patterns across locations for inventory allocation
- RAG-powered context — Retrieve product descriptions and client policy documents via semantic search (ChromaDB + sentence-transformers)
- REST API — Query the agent over HTTP via FastAPI (
POST /ask) - Visualization — Generates sales trends, forecast vs actual charts, weekly patterns, volatility rankings, feature importance, demand distributions, and store comparisons
| Component | Technology | Why |
|---|---|---|
| Language | Python 3.10+ | Industry standard for ML |
| ML Model | XGBoost | Best performance on structured tabular data with engineered features |
| Data | pandas, NumPy | Standard data manipulation |
| Evaluation | scikit-learn | MAE, RMSE, MAPE metrics |
| Agent Framework | LangGraph | Explicit state management, stable API, replaces deprecated AgentExecutor |
| LLM | Claude (Anthropic) | Strong reasoning and tool-calling capabilities |
| Visualization | matplotlib | Reliable static chart generation |
| Features | holidays | US holiday calendar for demand signals |
| RAG | ChromaDB + sentence-transformers | Local vector store, no external API needed |
| API | FastAPI + Uvicorn | Lightweight async REST framework |
| Containerization | Docker + docker-compose | Reproducible one-command deployment |
XGBoost over LSTM/Neural Networks: Tree-based models consistently outperform deep learning on structured tabular data with hand-crafted features. XGBoost trains in seconds, provides interpretable feature importances, and doesn't require normalization or sequence windowing. LSTMs would add significant complexity for marginal benefit on this data type.
Time-based split over random split: The train/test split is by date (train up to Sept 2017, test Oct-Dec 2017), not random. Random splitting would leak future information into training data, invalidating the evaluation entirely.
Shift inside groupby transform: Rolling features use .transform(lambda x: x.rolling(...).mean().shift(1)) with the shift inside the transform. Placing shift outside would cause cross-group leakage — the first row of one product would incorrectly use another product's last rolling value.
ChromaDB over a hosted vector DB: All embeddings are computed locally with all-MiniLM-L6-v2 (sentence-transformers). No external API key, no network dependency, persisted to disk. The right choice for a self-contained portfolio project.
FastAPI over Flask: Async by default, automatic OpenAPI docs at /docs, and native Pydantic validation. Lower boilerplate for a single-endpoint agent wrapper.
Store Item Demand Forecasting from Kaggle.
- 10 stores × 50 items × 1,826 days = 913,000 rows
- Daily sales data from 2013-01-01 to 2017-12-31
- Clean data with zero missing values
- No built-in features (price, promotions) — all 24 features are engineered
- Item descriptions are simulated: the Kaggle competition provides no metadata, so product categories and policy documents were generated to demonstrate RAG retrieval on realistic supply chain text
| Metric | Value | Meaning |
|---|---|---|
| MAE | 5.93 | Forecast is off by ~6 units on average |
| RMSE | 7.68 | Typical error with large misses penalized more |
| MAPE | 13.0% | Average percentage error (acceptable for retail) |
Top features by importance: sales_rolling_mean_7 (34%), sales_rolling_mean_14 (26%), sales_lag_7 (25%) — confirming strong weekly seasonality and recent trend dependence.
demand-forecasting-agent/
│
├── main.py # Entry point — run the agent in terminal
├── api.py # FastAPI endpoint wrapping the agent
├── config.py # All hyperparameters and paths
├── docker-compose.yml # One-command Docker setup
├── Dockerfile
├── requirements.txt
├── .env.example # API key template
│
├── src/
│ ├── __init__.py
│ ├── data_loader.py # Load and validate raw data
│ ├── feature_engineering.py # Feature creation (lags, rolling, holidays)
│ ├── model.py # XGBoost training, prediction, evaluation
│ ├── visualizations.py # Chart generation
│ │
│ ├── tools/ # Agent tool wrappers (@tool decorator)
│ │ ├── __init__.py # Tool registry (all_tools list)
│ │ ├── data_tools.py # explore_dataset, get_item_details
│ │ ├── forecast_tools.py # train_forecast_model, predict_demand
│ │ ├── analysis_tools.py # find_volatile, simulate_spike, compare_stores
│ │ ├── viz_tools.py # Chart generation tools
│ │ └── rag_tool.py # @tool wrapper — exposes similarity search to the agent
│ │
│ ├── agent/
│ │ ├── __init__.py # Shared AgentState (model cache, data cache)
│ │ ├── graph.py # LangGraph ReAct agent definition
│ │ └── prompts.py # System prompt for the LLM
│ │
│ └── rag/
│ ├── __init__.py
│ ├── documents.py # Simulated product descriptions and client policy docs
│ └── vector_store.py # ChromaDB setup, document ingestion, similarity search
│
├── tests/
│ ├── test_data_loader.py
│ ├── test_feature_engineering.py
│ ├── test_model.py
│ ├── test_visualizations.py
│ ├── test_tools.py
│ └── test_agent.py
│
├── data/
│ ├── raw/ # Place train.csv here
│ └── chroma_db/ # ChromaDB persisted vector store (auto-created)
└── outputs/ # Agent-generated charts
- An Anthropic API key (get one here)
train.csvfrom the Kaggle competition placed indata/raw/
git clone https://github.com/Massi99RM/demand-forecasting-agent.git
cd demand-forecasting-agent
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEYNo Python environment setup needed. Requires Docker Desktop.
docker compose upThe API will be available at http://localhost:8000. Query the agent:
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{"question": "What data do we have?"}'Interactive API docs available at http://localhost:8000/docs.
Note:
docker-compose.ymlmounts bothdata/raw,data/chroma_dbandoutputsautomatically. Make suretrain.csvis indata/raw/before running.
pip install -r requirements.txt
python main.pyThe agent starts an interactive chat session in the terminal.
python tests/test_data_loader.py
python tests/test_feature_engineering.py
python tests/test_model.py
python tests/test_visualizations.py
python tests/test_tools.py
python tests/test_agent.py📊 You: What data do we have?
🤖 Agent: Here's what we're working with:
- 5 years of daily sales data (2013-2017)
- 10 stores × 50 items = 500 time series
- 913,000 total records
- Average daily sales: 52 units per item
📊 You: Train a forecasting model
🤖 Agent: Model trained successfully!
- MAE: 5.93 units (forecast off by ~6 units on average)
- MAPE: 13.0% (acceptable for retail forecasting)
📊 You: Which products are most volatile?
🤖 Agent: Store 6, Item 5 is the most unpredictable (CV: 0.373).
Item 5 appears across multiple stores in the top volatile list,
suggesting inherently unstable demand patterns.
📊 You: What if demand for item 5 spikes 50%?
🤖 Agent: The model can't handle it without intervention.
- 89% of days would be understocked
- ~700 unit shortfall over 3 months
Recommendation: increase safety stock by 60-70%.
📊 You: What do you know about item 4?
🤖 Agent: [RAG retrieval] Item 4 is a punnet of strawberries (400g). Here are the key characteristics:Product Details:- Type: Fresh strawberries in 400g punnets.
MIT