A portfolio of data analysis, visualization, and machine learning projects built with Power BI, Excel, Python, and R.
Stack: Power BI · Excel · Python (pandas, scikit-learn, XGBoost, seaborn) · R (tidymodels, ggplot2)
Tools: Power BI, Python
Data: Sales records (2017–2020) from CSV and Excel files — orders, pricing, product costs, sales territories.
Business objective: Replace static management reports with a self-service dashboard that gives the sales and marketing teams direct visibility into profitability by geography, product category, and customer segment — and identify which customer groups drive the most value.
What's in it:
- Interactive Power BI dashboard covering profitability by country, revenue and profit trends, top 10 customers by revenue, and profit breakdown by category with conditional formatting.
- Customer segmentation with KMeans clustering (Python) based on revenue, profit, AOV, profit margin, and customer lifecycle. Segment profiles integrated back into Power BI for behavioral insights (loyal high-spenders, one-time buyers, low-value shoppers).
Origin: Built as the capstone project for the 15 Days of Power BI Bootcamp (Ligency / Udemy), completed within the challenge period. Extended post-course with KMeans customer segmentation in Python.
Tools: Power BI
Data: U.S. Energy Information Administration (EIA) — electricity prices across residential, commercial, industrial, and transportation sectors from 2002 onward.
Business objective: Provide a navigable view of 20+ years of electricity pricing across U.S. states and sectors to support energy cost benchmarking, policy evaluation, and procurement decisions.
What's in it:
- Price trends by state and provider, with drillthrough-enabled state-level analysis.
- Forecasting future price trends and KPI indicators for highest/lowest price states.
Tools: Power BI, Excel, Python
Data: Kaggle dataset — firearm attributes including age, weight, muzzle velocity, max range, and price. My notebook.
Business objective: Identify which physical and performance characteristics drive firearm prices, and build a predictive model that can estimate market value from product specifications.
What's in it:
- Power BI: Key metrics dashboard (total revenue, average price), price trends by age and weight, muzzle velocity vs. max range scatter plot, performance gauges.
- Excel: Data preprocessing, pivot tables, price distribution, and correlation analysis.
- Python: Descriptive statistics, box plots, heatmaps, pairplots. Regression models for price prediction with R², RMSE evaluation and feature importance analysis.
Tools: Excel, Python
Data: Kaggle dataset — 15,000 records with biological measures: age, weight, height, heart rate, body temperature, workout duration.
Business objective: Understand which biological and workout parameters most strongly predict calorie expenditure, and evaluate how well different ML models can estimate it from those inputs.
What's in it:
- Excel: Descriptive stats (mean, median, std, skewness, kurtosis, CoV), outlier detection via IQR, pivot tables with slicers by gender, age group, and BMI category.
- Python: Feature engineering (BMI, Calories per Minute, Intensity Score, Temperature Deviation, age and BMI categories), EDA (correlation heatmap, pairplot, boxplots). Regression models: Linear Regression (baseline), Random Forest, XGBoost — evaluated with MAE, RMSE, R², feature importance, and Partial Dependence Plot.
Extended: The topic was later developed into a production ML pipeline and REST API — ml-training-intensity-prediction.
Tools: Python, R
Business objective: Demonstrate equivalent ML workflows in Python and R on a controlled classification problem, with a focus on comparing implementation patterns across both ecosystems.
What's in it:
iris_rf_python.ipynb— full ML pipeline: EDA, train/test split, Random Forest classifier, confusion matrix and accuracy evaluation.iris_rf_r.Rmd— equivalent analysis in R with tidymodels, ready to knit to PDF or HTML.- Side-by-side comparison of Python and R workflows for the same task.
MIT