Project II – Predictive Model for Stock Management

Objective

Forecast weekly product sales to optimize stock replenishment for four selected stores in Istanbul. The aim is to reduce stockouts and overstocking using time series models and engineered features that reflect business and temporal dynamics.

Architecture (Microsoft Fabric)

The project was developed using a Lakehouse architecture in Microsoft Fabric, structured into:

Bronze Layer: Raw data ingestion and conversion to Delta format
Silver Layer: Data cleaning, normalization, and enrichment
Gold Layer: Dimensional modeling using a star schema
ML Area: Feature engineering and machine learning models
Power BI: Visualization of results and reporting

Project Stages

1. Data Pipeline

Bronze Layer

Notebook: 01_bronze_transform.ipynb
Converted raw data files into Delta format in Projeto_II_Bronze_

Silver Layer

Notebook: 02_silver_cleaning.ipynb
Applied data cleaning: handled nulls, standardized formats, fixed inconsistent codes (e.g., store/city codes)

Gold Layer

Notebook: 03_gold_modeling.ipynb
Created a star schema:
- dim_stores
- dim_products
- dim_dates (with holiday/weekend flags)
- fact_sales

2. Exploratory Data Analysis (EDA)

Notebooks:

04_eda_full_dataset.ipynb
05_eda_define_poc.ipynb
06_eda_initial_insights_poc.ipynb

Key insights:

Defined the Proof of Concept (PoC) with 4 Istanbul stores (one per store type: ST01 to ST04)
Focused on products from categories hierarchy1_id ∈ [H00, H01, H02, H03]
Detected seasonality, stockout patterns, outliers, and temporal trends
Identified missing data patterns and selected relevant variables for modeling

3. Feature Engineering

Notebook: 07_feature_engineering.ipynb

Engineered variables:

weekly_sales – average product sales per week
avg_stock – weekly average available stock
avg_price – average price during the week
promo_bin_*_rate – proportion of time in promotions
lag_sales_1w, lag_sales_2w lag_sales_3w – previous weeks' sales to capture autoregressive effects
is_weekend, is_holiday – calendar-based flags
num_obs – number of transactions per week

These features were used as input for time series and regression models.

4. Predictive Models

Notebooks:

08_model_arima.ipynb
09_model_sarima.ipynb
10_model_sarimax.ipynb
11_model_linear_regression.ipynb

Models used:

ARIMA – baseline model without seasonality or exogenous variables
SARIMA – includes weekly seasonality
ARIMAX/SARIMAX – adds exogenous variables (e.g., avg_stock, lag_sales_1w)
Linear Regression – serves as benchmark using manually engineered features

Models were tested by product_id across the selected stores and hierarchies. Evaluation results were logged using Microsoft Fabric’s ML Experiments.

5. Reports and Outputs

Power BI Reports:

project_II_final_report.pdf – Final model forecasts and performance

Project_II Presentation

project_II_presentation.pdf - Final project presentation

Output files:

Weekly forecasts and model evaluation results exported to .csv for further analysis

Suggested GitHub Folder Structure

project_II_stock_forecasting/
│
├── notebooks/
│   ├── 01_bronze_transform.ipynb
│   ├── 02_silver_cleaning.ipynb
│   ├── 03_gold_modeling.ipynb
│   ├── 04_eda_full_dataset.ipynb
│   ├── 05_eda_define_poc.ipynb
│   ├── 06_eda_initial_insights_poc.ipynb
│   ├── 07_feature_engineering.ipynb
│   ├── 08_model_arima.ipynb
│   ├── 09_model_sarima.ipynb
│   ├── 10_model_sarimax.ipynb
│   ├── 11_model_linear_regression.ipynb
│
├── reports/
│   ├── project_II_presentation.pdf
│   └── project_II_final_report.pdf
│
├── outputs/
│   └── README.txt # Output files generated by the models
│
├── data/
│   └── README.md  # How to access Fabric data (without uploading raw files)
│
├── requirements.txt
├── .gitignore
└── README.md

Model Evaluation

Models were evaluated using:

MAE – Mean Absolute Error
AIC / BIC – Model fit comparison for ARIMA-type models

Evaluation was done at the product level, grouped by store and hierarchy (store_id, hierarchy1_id).

Future Roadmap

Demand-Instead-of-Sales Target
- Detect stock = 0 situations and impute latent demand so the model learns true customer interest, not shelf availability.
Marketplace-Seller Segmentation
- Enrich the dataset with third-party marketplace sellers.
- Cluster sellers via K-Means (or HDBSCAN for variable density) to uncover behaviour-based segments.
Gradient-Boosting Experiments
- Benchmark LightGBM / XGBoost against SARIMAX and Lasso to capture non-linear feature interactions.
- Use SHAP values for interpretability.
Continuous Forecasting Pipeline
- Containerise the model and schedule weekly retraining + scoring (GitHub Actions → Azure Functions).
- Push fresh forecasts to the replenishment_table and trigger Power BI refresh via REST API.

Team & Acknowledgments

Developed as part of the Postgraduate Program in Analytics & Data Science.

André Teixeira – Feature Engineering & Time Series Modeling
Patrícia Pereira – Exploratory Data Analysis
Rodrigo Diogo – Linear Regression Time Series Modeling
Vitor Meirelles – Lakehouse Architecture Microsoft Fabric

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data:		data:
notebooks:		notebooks:
outputs:		outputs:
reports:		reports:
README.md		README.md
gitignore		gitignore
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project II – Predictive Model for Stock Management

Objective

Architecture (Microsoft Fabric)

Project Stages

1. Data Pipeline

Bronze Layer

Silver Layer

Gold Layer

2. Exploratory Data Analysis (EDA)

3. Feature Engineering

4. Predictive Models

5. Reports and Outputs

Suggested GitHub Folder Structure

Model Evaluation

Future Roadmap

Team & Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project II – Predictive Model for Stock Management

Objective

Architecture (Microsoft Fabric)

Project Stages

1. Data Pipeline

Bronze Layer

Silver Layer

Gold Layer

2. Exploratory Data Analysis (EDA)

3. Feature Engineering

4. Predictive Models

5. Reports and Outputs

Suggested GitHub Folder Structure

Model Evaluation

Future Roadmap

Team & Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages