Data Analysis Projects

A portfolio of data analysis, visualization, and machine learning projects built with Power BI, Excel, Python, and R.

Stack: Power BI · Excel · Python (pandas, scikit-learn, XGBoost, seaborn) · R (tidymodels, ggplot2)

Projects

1. Dataline Bike Company — Sales and Profit Analysis

Tools: Power BI, Python

Data: Sales records (2017–2020) from CSV and Excel files — orders, pricing, product costs, sales territories.

Business objective: Replace static management reports with a self-service dashboard that gives the sales and marketing teams direct visibility into profitability by geography, product category, and customer segment — and identify which customer groups drive the most value.

What's in it:

Interactive Power BI dashboard covering profitability by country, revenue and profit trends, top 10 customers by revenue, and profit breakdown by category with conditional formatting.
Customer segmentation with KMeans clustering (Python) based on revenue, profit, AOV, profit margin, and customer lifecycle. Segment profiles integrated back into Power BI for behavioral insights (loyal high-spenders, one-time buyers, low-value shoppers).

Origin: Built as the capstone project for the 15 Days of Power BI Bootcamp (Ligency / Udemy), completed within the challenge period. Extended post-course with KMeans customer segmentation in Python.

2. U.S. Electricity Prices — State and Sector Analysis

Tools: Power BI

Data: U.S. Energy Information Administration (EIA) — electricity prices across residential, commercial, industrial, and transportation sectors from 2002 onward.

Business objective: Provide a navigable view of 20+ years of electricity pricing across U.S. states and sectors to support energy cost benchmarking, policy evaluation, and procurement decisions.

What's in it:

Price trends by state and provider, with drillthrough-enabled state-level analysis.
Forecasting future price trends and KPI indicators for highest/lowest price states.

3. Gun Price Analysis and Prediction

Tools: Power BI, Excel, Python

Data: Kaggle dataset — firearm attributes including age, weight, muzzle velocity, max range, and price. My notebook.

Business objective: Identify which physical and performance characteristics drive firearm prices, and build a predictive model that can estimate market value from product specifications.

What's in it:

Power BI: Key metrics dashboard (total revenue, average price), price trends by age and weight, muzzle velocity vs. max range scatter plot, performance gauges.
Excel: Data preprocessing, pivot tables, price distribution, and correlation analysis.
Python: Descriptive statistics, box plots, heatmaps, pairplots. Regression models for price prediction with R², RMSE evaluation and feature importance analysis.

4. Calories Burnt Prediction

Tools: Excel, Python

Data: Kaggle dataset — 15,000 records with biological measures: age, weight, height, heart rate, body temperature, workout duration.

Business objective: Understand which biological and workout parameters most strongly predict calorie expenditure, and evaluate how well different ML models can estimate it from those inputs.

What's in it:

Excel: Descriptive stats (mean, median, std, skewness, kurtosis, CoV), outlier detection via IQR, pivot tables with slicers by gender, age group, and BMI category.
Python: Feature engineering (BMI, Calories per Minute, Intensity Score, Temperature Deviation, age and BMI categories), EDA (correlation heatmap, pairplot, boxplots). Regression models: Linear Regression (baseline), Random Forest, XGBoost — evaluated with MAE, RMSE, R², feature importance, and Partial Dependence Plot.

Extended: The topic was later developed into a production ML pipeline and REST API — ml-training-intensity-prediction.

5. Iris — Random Forest Classification (Python vs R)

Tools: Python, R

Business objective: Demonstrate equivalent ML workflows in Python and R on a controlled classification problem, with a focus on comparing implementation patterns across both ecosystems.

What's in it:

iris_rf_python.ipynb — full ML pipeline: EDA, train/test split, Random Forest classifier, confusion matrix and accuracy evaluation.
iris_rf_r.Rmd — equivalent analysis in R with tidymodels, ready to knit to PDF or HTML.
Side-by-side comparison of Python and R workflows for the same task.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Project 1 - Dataline Bike Company Sales and Profit Analysis (PBI + ML)		Project 1 - Dataline Bike Company Sales and Profit Analysis (PBI + ML)
Project 2 - U.S. Electricity Prices Analysis by State and Sector (PBI)		Project 2 - U.S. Electricity Prices Analysis by State and Sector (PBI)
Project 3 - Gun Price Analysis and Prediction (Excel vs PBI vs Python)		Project 3 - Gun Price Analysis and Prediction (Excel vs PBI vs Python)
Project 4 - Calories Burnt Analysis (PBI vs Python)		Project 4 - Calories Burnt Analysis (PBI vs Python)
Project 5 - Iris Random Forest (Python vs R)		Project 5 - Iris Random Forest (Python vs R)
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analysis Projects

Projects

1. Dataline Bike Company — Sales and Profit Analysis

2. U.S. Electricity Prices — State and Sector Analysis

3. Gun Price Analysis and Prediction

4. Calories Burnt Prediction

5. Iris — Random Forest Classification (Python vs R)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Analysis Projects

Projects

1. Dataline Bike Company — Sales and Profit Analysis

2. U.S. Electricity Prices — State and Sector Analysis

3. Gun Price Analysis and Prediction

4. Calories Burnt Prediction

5. Iris — Random Forest Classification (Python vs R)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages