Skip to content

draprar/data-analysis-projects

Repository files navigation

Data Analysis Projects

A portfolio of data analysis, visualization, and machine learning projects built with Power BI, Excel, Python, and R.

Stack: Power BI · Excel · Python (pandas, scikit-learn, XGBoost, seaborn) · R (tidymodels, ggplot2)


Projects

1. Dataline Bike Company — Sales and Profit Analysis

Tools: Power BI, Python

Data: Sales records (2017–2020) from CSV and Excel files — orders, pricing, product costs, sales territories.

Business objective: Replace static management reports with a self-service dashboard that gives the sales and marketing teams direct visibility into profitability by geography, product category, and customer segment — and identify which customer groups drive the most value.

What's in it:

  • Interactive Power BI dashboard covering profitability by country, revenue and profit trends, top 10 customers by revenue, and profit breakdown by category with conditional formatting.
  • Customer segmentation with KMeans clustering (Python) based on revenue, profit, AOV, profit margin, and customer lifecycle. Segment profiles integrated back into Power BI for behavioral insights (loyal high-spenders, one-time buyers, low-value shoppers).

Origin: Built as the capstone project for the 15 Days of Power BI Bootcamp (Ligency / Udemy), completed within the challenge period. Extended post-course with KMeans customer segmentation in Python.


2. U.S. Electricity Prices — State and Sector Analysis

Tools: Power BI

Data: U.S. Energy Information Administration (EIA) — electricity prices across residential, commercial, industrial, and transportation sectors from 2002 onward.

Business objective: Provide a navigable view of 20+ years of electricity pricing across U.S. states and sectors to support energy cost benchmarking, policy evaluation, and procurement decisions.

What's in it:

  • Price trends by state and provider, with drillthrough-enabled state-level analysis.
  • Forecasting future price trends and KPI indicators for highest/lowest price states.

3. Gun Price Analysis and Prediction

Tools: Power BI, Excel, Python

Data: Kaggle dataset — firearm attributes including age, weight, muzzle velocity, max range, and price. My notebook.

Business objective: Identify which physical and performance characteristics drive firearm prices, and build a predictive model that can estimate market value from product specifications.

What's in it:

  • Power BI: Key metrics dashboard (total revenue, average price), price trends by age and weight, muzzle velocity vs. max range scatter plot, performance gauges.
  • Excel: Data preprocessing, pivot tables, price distribution, and correlation analysis.
  • Python: Descriptive statistics, box plots, heatmaps, pairplots. Regression models for price prediction with R², RMSE evaluation and feature importance analysis.

4. Calories Burnt Prediction

Tools: Excel, Python

Data: Kaggle dataset — 15,000 records with biological measures: age, weight, height, heart rate, body temperature, workout duration.

Business objective: Understand which biological and workout parameters most strongly predict calorie expenditure, and evaluate how well different ML models can estimate it from those inputs.

What's in it:

  • Excel: Descriptive stats (mean, median, std, skewness, kurtosis, CoV), outlier detection via IQR, pivot tables with slicers by gender, age group, and BMI category.
  • Python: Feature engineering (BMI, Calories per Minute, Intensity Score, Temperature Deviation, age and BMI categories), EDA (correlation heatmap, pairplot, boxplots). Regression models: Linear Regression (baseline), Random Forest, XGBoost — evaluated with MAE, RMSE, R², feature importance, and Partial Dependence Plot.

Extended: The topic was later developed into a production ML pipeline and REST API — ml-training-intensity-prediction.


5. Iris — Random Forest Classification (Python vs R)

Tools: Python, R

Business objective: Demonstrate equivalent ML workflows in Python and R on a controlled classification problem, with a focus on comparing implementation patterns across both ecosystems.

What's in it:

  • iris_rf_python.ipynb — full ML pipeline: EDA, train/test split, Random Forest classifier, confusion matrix and accuracy evaluation.
  • iris_rf_r.Rmd — equivalent analysis in R with tidymodels, ready to knit to PDF or HTML.
  • Side-by-side comparison of Python and R workflows for the same task.

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors