Skip to content

AndreTeixeira08/live_forecasting_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project II – Predictive Model for Stock Management

Objective

Forecast weekly product sales to optimize stock replenishment for four selected stores in Istanbul. The aim is to reduce stockouts and overstocking using time series models and engineered features that reflect business and temporal dynamics.


Architecture (Microsoft Fabric)

The project was developed using a Lakehouse architecture in Microsoft Fabric, structured into:

  • Bronze Layer: Raw data ingestion and conversion to Delta format
  • Silver Layer: Data cleaning, normalization, and enrichment
  • Gold Layer: Dimensional modeling using a star schema
  • ML Area: Feature engineering and machine learning models
  • Power BI: Visualization of results and reporting

Project Stages

1. Data Pipeline

Bronze Layer

  • Notebook: 01_bronze_transform.ipynb
  • Converted raw data files into Delta format in Projeto_II_Bronze_

Silver Layer

  • Notebook: 02_silver_cleaning.ipynb
  • Applied data cleaning: handled nulls, standardized formats, fixed inconsistent codes (e.g., store/city codes)

Gold Layer

  • Notebook: 03_gold_modeling.ipynb
  • Created a star schema:
    • dim_stores
    • dim_products
    • dim_dates (with holiday/weekend flags)
    • fact_sales

2. Exploratory Data Analysis (EDA)

Notebooks:

  • 04_eda_full_dataset.ipynb
  • 05_eda_define_poc.ipynb
  • 06_eda_initial_insights_poc.ipynb

Key insights:

  • Defined the Proof of Concept (PoC) with 4 Istanbul stores (one per store type: ST01 to ST04)
  • Focused on products from categories hierarchy1_id ∈ [H00, H01, H02, H03]
  • Detected seasonality, stockout patterns, outliers, and temporal trends
  • Identified missing data patterns and selected relevant variables for modeling

3. Feature Engineering

Notebook: 07_feature_engineering.ipynb

Engineered variables:

  • weekly_sales – average product sales per week
  • avg_stock – weekly average available stock
  • avg_price – average price during the week
  • promo_bin_*_rate – proportion of time in promotions
  • lag_sales_1w, lag_sales_2w lag_sales_3w – previous weeks' sales to capture autoregressive effects
  • is_weekend, is_holiday – calendar-based flags
  • num_obs – number of transactions per week

These features were used as input for time series and regression models.


4. Predictive Models

Notebooks:

  • 08_model_arima.ipynb
  • 09_model_sarima.ipynb
  • 10_model_sarimax.ipynb
  • 11_model_linear_regression.ipynb

Models used:

  • ARIMA – baseline model without seasonality or exogenous variables
  • SARIMA – includes weekly seasonality
  • ARIMAX/SARIMAX – adds exogenous variables (e.g., avg_stock, lag_sales_1w)
  • Linear Regression – serves as benchmark using manually engineered features

Models were tested by product_id across the selected stores and hierarchies. Evaluation results were logged using Microsoft Fabric’s ML Experiments.


5. Reports and Outputs

Power BI Reports:

  • project_II_final_report.pdf – Final model forecasts and performance

Project_II Presentation

  • project_II_presentation.pdf - Final project presentation

Output files:

  • Weekly forecasts and model evaluation results exported to .csv for further analysis

Suggested GitHub Folder Structure

project_II_stock_forecasting/
│
├── notebooks/
│   ├── 01_bronze_transform.ipynb
│   ├── 02_silver_cleaning.ipynb
│   ├── 03_gold_modeling.ipynb
│   ├── 04_eda_full_dataset.ipynb
│   ├── 05_eda_define_poc.ipynb
│   ├── 06_eda_initial_insights_poc.ipynb
│   ├── 07_feature_engineering.ipynb
│   ├── 08_model_arima.ipynb
│   ├── 09_model_sarima.ipynb
│   ├── 10_model_sarimax.ipynb
│   ├── 11_model_linear_regression.ipynb
│
├── reports/
│   ├── project_II_presentation.pdf
│   └── project_II_final_report.pdf
│
├── outputs/
│   └── README.txt # Output files generated by the models
│
├── data/
│   └── README.md  # How to access Fabric data (without uploading raw files)
│
├── requirements.txt
├── .gitignore
└── README.md

Model Evaluation

Models were evaluated using:

  • MAE – Mean Absolute Error
  • AIC / BIC – Model fit comparison for ARIMA-type models

Evaluation was done at the product level, grouped by store and hierarchy (store_id, hierarchy1_id).


Future Roadmap

  1. Demand-Instead-of-Sales Target

    • Detect stock = 0 situations and impute latent demand so the model learns true customer interest, not shelf availability.
  2. Marketplace-Seller Segmentation

    • Enrich the dataset with third-party marketplace sellers.
    • Cluster sellers via K-Means (or HDBSCAN for variable density) to uncover behaviour-based segments.
  3. Gradient-Boosting Experiments

    • Benchmark LightGBM / XGBoost against SARIMAX and Lasso to capture non-linear feature interactions.
    • Use SHAP values for interpretability.
  4. Continuous Forecasting Pipeline

    • Containerise the model and schedule weekly retraining + scoring (GitHub Actions → Azure Functions).
    • Push fresh forecasts to the replenishment_table and trigger Power BI refresh via REST API.

Team & Acknowledgments

Developed as part of the Postgraduate Program in Analytics & Data Science.

  • André Teixeira – Feature Engineering & Time Series Modeling
  • Patrícia Pereira – Exploratory Data Analysis
  • Rodrigo Diogo – Linear Regression Time Series Modeling
  • Vitor Meirelles – Lakehouse Architecture Microsoft Fabric

About

Forecasting weekly product demand using ARIMA, SARIMAX, and regression models. Built with Microsoft Fabric Lakehouse architecture to improve stock management and minimize stockouts.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors