Forest CoverType Classification 🌲🌲

This project explores the Forest CoverType dataset to predict the type of forest cover from cartographic variables such as elevation, aspect, slope, soil type, and more.

It’s a multi-class classification problem with 7 cover type classes. Using ensemble methods, this project investigates how handling class imbalance and model choice impacts predictive performance.

📂 Dataset

Source: Forest CoverType Dataset – Kaggle
Classes: 7 types of forest cover.
Characteristics:
- Dataset already one-hot encoded, with no categorical variables present.
- No duplicates or missing values.
- Imbalanced classes: types 1 and 2 are majority classes, while 4 and 5 are underrepresented.

⚙️ Project Workflow

Data Preprocessing
- Verified dataset cleanliness (no nulls, no duplicates).
- Confirmed one-hot encoding of categorical variables.
- Addressed class imbalance using SMOTE oversampling.
Modeling
- Random Forest Classifier
  - Evaluated with confusion matrix, classification report, and feature importance visualization.
  - Validated with stratified cross-validation.
- XGBoost Classifier
  - Applied the same evaluation (confusion matrix, classification report, feature importances).
  - Validated with stratified cross-validation.
Evaluation Metrics
- Confusion Matrix
- Classification Report (Precision, Recall, F1-score)
- Stratified Cross-Validation

📊 Results

Random Forest delivered the strongest performance with an average CV F1 of 0.93, making it the best fit for this dataset.
XGBoost performed reasonably well with an average CV F1 of 0.8855, though slightly weaker than Random Forest.
Feature importance analysis highlighted which environmental and soil-related factors most influenced forest cover classification.

🔑 Key Takeaways

Class imbalance needed addressing for fairer evaluation — SMOTE oversampling proved useful.
Random Forest outperformed XGBoost on this dataset, showcasing its robustness for high-dimensional, imbalanced data.
Feature importance visualization provided ecological insights into which variables play the largest role in forest cover classification.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
covtype.csv		covtype.csv
forest-covertype-classification.ipynb		forest-covertype-classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forest CoverType Classification 🌲🌲

📂 Dataset

⚙️ Project Workflow

📊 Results

🔑 Key Takeaways

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Forest CoverType Classification 🌲🌲

📂 Dataset

⚙️ Project Workflow

📊 Results

🔑 Key Takeaways

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages