Skip to content

adityamanoj1811/quick-commerce-analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛵 Quick-Commerce Delivery Analytics

Python Platform Dataset License Status

A complete end-to-end Data Analytics project on a real-world Quick-Commerce delivery dataset with 5,91,539 records — covering all 4 units of the Introduction to Data Analytics syllabus.


📌 Project Overview

This project performs a full data analytics study on a quick-commerce (q-commerce) delivery dataset simulating operations across multiple companies and cities in India. It covers everything from descriptive statistics to machine learning models, time series forecasting, and prescriptive decision-making.

Key Results

Metric Value
Total Records Analysed 5,91,539
Columns 13
New Features Engineered 7
Hypothesis Tests Applied 5
ML Models Built 5+
Best Classifier (XGBoost) Accuracy ~79%
Best Classifier AUC-ROC ~0.88
K-Means Clusters Found 4
Best Forecast Model Holt-Winters (MAE ~52)
Syllabus Units Covered All 4

🗂️ Repository Structure

quick-commerce-analytics/
│
├── 📓 quick_commerce_analytics.ipynb     ← Main analysis notebook
├── 📊 Requirements.txt                   ← Python dependencies
├── 📄 README.md                          ← This file
├── 📄 LICENSE                            ← MIT License
├── 📄 Contributing.md                    ← Contribution guidelines
├── 📄 .gitignore                         ← Git ignore rules

Dataset: Download from Kaggle and place as quick_commerce_dataset.csv in the root folder.


📊 Dataset

Property Details
Name Quick-Commerce Delivery Dataset
Source Kaggle — Rohit Grewal
Records 5,91,539 rows
Columns 13
Domain Q-Commerce / On-Demand Delivery (India)
Access Free on Kaggle

Column Descriptions

Column Type Description
Order_ID Categorical Unique identifier for each order
Company Categorical Q-commerce company (Blinkit, Zepto, Instamart, etc.)
City Categorical City where the order was placed
Customer_Age Numerical Age of the customer (years)
Order_Value Numerical Total monetary value of the order (₹)
Delivery_Time_Min Numerical Time taken to deliver (minutes)
Distance_Km Numerical Distance from dark store to customer (km)
Items_Count Numerical Number of items in the order
Product_Category Categorical Primary product category
Payment_Method Categorical Mode of payment used
Customer_Rating Numerical Customer satisfaction rating (1–5 stars)
Discount_Applied Categorical Whether a discount was used (Yes/No)
Delivery_Partner_Rating Numerical Rating given to delivery partner (1–5 stars)

🤖 Analytics & ML Techniques

# Technique Type Purpose
1 Descriptive Statistics Statistical Mean, Median, Std, Skewness, IQR, Percentiles
2 Exploratory Data Analysis Visual Histograms, Box plots, Scatter, Pair plots
3 Feature Engineering Data Prep 7 new features created from raw columns
4 Central Limit Theorem Probability Sampling distribution demonstration
5 Confidence Intervals Inferential Stats 90%, 95%, 99% CI for mean order value
6 t-test (One & Two-sample) Hypothesis Testing Discount effect on order value
7 One-Way ANOVA Hypothesis Testing Order value differences across companies
8 Two-Way ANOVA Hypothesis Testing Company × Product_Category interaction
9 Chi-Square Test Hypothesis Testing Payment method vs discount association
10 Pearson & Spearman Correlation Statistical Feature relationship analysis
11 Simple Linear Regression Predictive Items_Count → Order_Value
12 Multiple Linear Regression Predictive Multi-feature → Order_Value + What-If
13 Ridge & Lasso Regression Regularised ML Overfitting control
14 Decision Tree Classifier ML Classification High-value order prediction
15 Random Forest Classifier ML Classification Ensemble high-value prediction
16 XGBoost Classifier ML Classification Best-performing classifier
17 K-Means Clustering Unsupervised ML 4-segment customer segmentation
18 Moving Averages (SMA, WMA) Time Series Trend smoothing
19 Exponential Smoothing (Holt-Winters) Time Series Trend + seasonality forecasting
20 Autoregressive AR(7) Model Time Series Lag-based demand forecasting
21 Linear Programming Prescriptive Delivery partner zone allocation
22 Decision Theory (EMV, Maximax, etc.) Prescriptive Optimal discount strategy selection

👥 Customer Segments Discovered (K-Means, K=4)

Cluster 0 — Premium Bulk Buyers (High Value, High Items)

  • Average order value ~₹900 with ~9 items per order
  • Longest delivery distance (~7 km) — outer zones
  • Strategy: Offer loyalty subscriptions and priority delivery

Cluster 1 — High-Spend, Low-Item Buyers (Premium Category)

  • Average order value ~₹750 with only ~3 items
  • Higher ratings (~3.8) — satisfied premium customers
  • Strategy: Promote premium single-item categories and branded products

Cluster 2 — Satisfied Budget Shoppers

  • Average order value ~₹300 with ~5 items
  • Highest customer rating (~4.2) — happiest segment
  • Strategy: Discount bundles, combo offers, and repeat-order nudges

Cluster 3 — Quick Low-Value Shoppers

  • Average order value ~₹180 with ~2 items
  • Lowest rating (~2.8) — at-risk for churn
  • Strategy: Minimum order incentives, free delivery thresholds

🔄 Project Pipeline

Raw Dataset (5,91,539 rows × 13 cols)
        ↓
Data Cleaning & Validation
  ├── Missing value check (0 nulls found)
  ├── Duplicate removal
  ├── IQR outlier detection
  └── Categorical encoding
        ↓
Feature Engineering (7 new features)
  ├── Value_Per_Item
  ├── Delivery_Speed
  ├── High_Value_Order (classification target)
  ├── Age_Group
  ├── Order_Efficiency
  ├── Rating_Gap
  └── Log_Order_Value
        ↓
Descriptive Statistics + EDA
        ↓
Hypothesis Testing (5 tests)
  ├── One-Sample t-test
  ├── Independent t-test (Welch's)
  ├── One-Way ANOVA + Tukey HSD
  ├── Two-Way ANOVA
  └── Chi-Square Test
        ↓
┌─────────────────────────────────────────┐
│  Regression Models                      │
│  ├── Simple Linear Regression           │
│  ├── Multiple Linear Regression         │
│  ├── Ridge Regression (α=1.0, 10.0)     │
│  └── Lasso Regression (α=0.1, 1.0)      │
└─────────────────────────────────────────┘
        ↓
┌─────────────────────────────────────────┐
│  Classification Models                  │
│  ├── Decision Tree  (Acc ~75%, AUC 0.83)│
│  ├── Random Forest  (Acc ~78%, AUC 0.87)│
│  └── XGBoost        (Acc ~79%, AUC 0.88)│
└─────────────────────────────────────────┘
        ↓
K-Means Clustering (K=4 optimal)
  ├── Elbow Method
  ├── Silhouette Score
  └── Calinski-Harabasz Score
        ↓
Time Series Forecasting
  ├── Moving Averages (SMA-7, SMA-30, WMA-7)
  ├── ADF Stationarity Test
  ├── ACF / PACF Analysis
  ├── Seasonal Decomposition
  ├── SES → Holt's → Holt-Winters ← BEST
  └── AR(7) Autoregressive Model
        ↓
Prescriptive Analytics
  ├── Linear Programming (partner allocation)
  └── Decision Theory (EMV, Maximax, Maximin, etc.)
        ↓
Business Insights & Recommendations

🗒️ Notebook Sections

Section Description
1 Project Introduction & Industry Overview
2 Environment Setup & Library Installation
3 Data Loading & Structural Audit
4 Data Cleaning & Preprocessing
5 Feature Engineering (7 features)
6 Descriptive Statistics
7 Exploratory Data Analysis (EDA)
8 Delivery Performance Analysis
9 Customer Experience Analysis
10 Correlation & Covariance Analysis
11 Probability Theory & Hypothesis Testing
12 Regression Analysis + What-If Scenarios
13 Decision Tree Classification
14 Random Forest & XGBoost Classification
15 K-Means Clustering
16 Time Series Forecasting
17 Prescriptive Analytics
18 Business Insights
19 Business Recommendations
20 Conclusion & Summary

⚙️ Setup & Usage

Prerequisites

  • Python 3.10+
  • Google Account (for Colab)
  • Kaggle account (to download dataset)

Step 1 — Open in Google Colab

Open In Colab

Go to colab.research.google.com, click File → Upload Notebook, and select quick_commerce_analytics.ipynb.

Step 2 — Download the Dataset

Download from Kaggle and upload the CSV to your Colab session:

from google.colab import files
uploaded = files.upload()

Or mount Google Drive and load:

from google.colab import drive
drive.mount('/content/drive')
df = pd.read_csv('/content/drive/MyDrive/quick_commerce_dataset.csv')

Step 3 — Install Dependencies

The first notebook cell handles installation:

pip install xgboost statsmodels

All other libraries (pandas, numpy, scikit-learn, matplotlib, seaborn, scipy) are pre-installed on Colab.

Step 4 — Run All

Runtime → Run All (Ctrl+F9)

📦 Dependencies

# Install in Colab (only these 2 needed — rest are pre-installed)
xgboost>=1.7.0
statsmodels>=0.14.0

# Pre-installed on Google Colab
pandas>=1.5.0
numpy>=1.23.0
matplotlib>=3.6.0
seaborn>=0.12.0
scikit-learn>=1.2.0
scipy>=1.10.0

📈 Key Findings

Hypothesis Testing Results

Test 1 — One-Sample t-test        : p < 0.05 → Mean order value ≠ ₹500 ✗ H₀ rejected
Test 2 — Independent t-test       : p < 0.05 → Discounts increase order value ✗ H₀ rejected
Test 3 — One-Way ANOVA            : p < 0.05 → Companies differ in order value ✗ H₀ rejected
Test 4 — Chi-Square Test          : p < 0.05 → Payment ↔ Discount associated ✗ H₀ rejected
Test 5 — Two-Way ANOVA            : p < 0.001 → Company × Category interaction ✗ H₀ rejected

Top Predictors of Order Value

1. Items_Count        (r = +0.82) ← Strongest predictor
2. Value_Per_Item     (r = +0.75) ← Basket quality signal
3. Delivery_Speed     (r = −0.55) ← Operational proxy
4. Distance_Km        (r = +0.40) ← Moderate positive
5. Customer_Rating    (r = +0.02) ← Near-zero impact

SLA Compliance

Delivered within 30 min : ~50%  ← SLA target
Delivered within 45 min : ~75%
Delivered within 60 min : ~95%
Average delivery time   : ~30 min

Classification Model Comparison

Decision Tree  → Accuracy: ~75%  |  AUC: ~0.83
Random Forest  → Accuracy: ~78%  |  AUC: ~0.87
XGBoost        → Accuracy: ~79%  |  AUC: ~0.88  ← BEST

🎯 Business Recommendations

Area Priority Key Action
Inventory Planning 🟢 HIGH Pre-stock top SKUs Fri–Sat; bundle promotions for 8+ item orders
Delivery Optimisation 🟢 HIGH Open dark stores in SLA-breach zones; target <3 km radius
Pricing & Discounts 🔵 MEDIUM Deploy 5% discount for orders ≥5 items (EMV optimal strategy)
Customer Experience 🔵 MEDIUM Set 25-min internal SLA; invest in partner training
Marketing Strategy ⚪ ONGOING Design campaigns per cluster; use ML for priority queuing

👨‍💻 Author

Aditya Manoj


📄 License

This project is licensed under the MIT License — see the LICENSE file for details.


🙏 Acknowledgements

  • Rohit Grewal for the Quick-Commerce dataset on Kaggle
  • Scikit-learn for ML algorithms
  • Statsmodels for statistical modelling and time series
  • XGBoost for gradient boosting classification
  • Google Colab for free GPU/TPU compute

Built with ❤️ using Google Colab + Python + Scikit-learn

About

Quick-Commerce delivery analytics using Python — EDA, hypothesis testing, ML classification, clustering, and time series forecasting on 591K records.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors