This project aims to predict the selling price of a used car based on various features such as manufacturing year, brand, mileage, fuel type, and transmission. Built as a personal project inspired by CampusXβs tutorial, I took the core idea further by exploring different models and techniques to improve prediction accuracy and understand the factors affecting car prices.
The goal is to create a reliable regression model that can accurately estimate car prices using historical data. This kind of tool can help sellers set competitive prices and help buyers assess fair market value.
The dataset includes details such as:
- Year of manufacture
- Car brand/model
- Mileage driven
- Fuel type (Petrol/Diesel/CNG)
- Transmission (Manual/Automatic)
- Engine capacity
- Seller type
- Number of previous owners
- Current selling price
This information was preprocessed to remove irrelevant columns, handle missing values, and convert categorical variables into numeric formats suitable for machine learning.
- Older cars and higher mileage strongly correlate with lower prices.
- Fuel type and transmission impact resale value β diesel and manual cars generally depreciate faster.
- Popular brands like Maruti, Hyundai, and Honda retain value longer than others.
- Outliers in the price and mileage distributions were treated to reduce model distortion.
Visualizations such as scatter plots, box plots, and heatmaps were used to understand these relationships.
Multiple models were tested and compared:
- Linear Regression β Used as a baseline model.
- Decision Tree Regressor β Captured nonlinear relationships better.
- Random Forest Regressor β Delivered the highest accuracy and lowest error.
The Random Forest model was chosen as the final model based on its superior performance in predicting prices and handling feature interactions.
- RΒ² Score (Accuracy): ~0.85
- MAE (Mean Absolute Error): ~βΉ1,200
- RMSE (Root Mean Squared Error): ~βΉ1,800
These results show that the model can make reasonably accurate predictions based on the features provided.
- The importance of feature engineering in improving model accuracy.
- How different regression algorithms perform on real-world datasets.
- Visual storytelling with data through EDA.
- Building an end-to-end pipeline for prediction and analysis.
- Improve predictions using advanced algorithms like XGBoost.
- Build a web app for live predictions using Streamlit or Flask.
- Add more real-world features like car condition, insurance status, or photos.
- Expand dataset with listings from car resale platforms for better accuracy.