Skip to content

im-saif/Forest-Covertype-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Forest CoverType Classification 🌲🌲

This project explores the Forest CoverType dataset to predict the type of forest cover from cartographic variables such as elevation, aspect, slope, soil type, and more.

It’s a multi-class classification problem with 7 cover type classes. Using ensemble methods, this project investigates how handling class imbalance and model choice impacts predictive performance.


📂 Dataset

  • Source: Forest CoverType Dataset – Kaggle
  • Classes: 7 types of forest cover.
  • Characteristics:
    • Dataset already one-hot encoded, with no categorical variables present.
    • No duplicates or missing values.
    • Imbalanced classes: types 1 and 2 are majority classes, while 4 and 5 are underrepresented.

⚙️ Project Workflow

  1. Data Preprocessing

    • Verified dataset cleanliness (no nulls, no duplicates).
    • Confirmed one-hot encoding of categorical variables.
    • Addressed class imbalance using SMOTE oversampling.
  2. Modeling

    • Random Forest Classifier
      • Evaluated with confusion matrix, classification report, and feature importance visualization.
      • Validated with stratified cross-validation.
    • XGBoost Classifier
      • Applied the same evaluation (confusion matrix, classification report, feature importances).
      • Validated with stratified cross-validation.
  3. Evaluation Metrics

    • Confusion Matrix
    • Classification Report (Precision, Recall, F1-score)
    • Stratified Cross-Validation

📊 Results

  • Random Forest delivered the strongest performance with an average CV F1 of 0.93, making it the best fit for this dataset.
  • XGBoost performed reasonably well with an average CV F1 of 0.8855, though slightly weaker than Random Forest.
  • Feature importance analysis highlighted which environmental and soil-related factors most influenced forest cover classification.

🔑 Key Takeaways

  • Class imbalance needed addressing for fairer evaluation — SMOTE oversampling proved useful.
  • Random Forest outperformed XGBoost on this dataset, showcasing its robustness for high-dimensional, imbalanced data.
  • Feature importance visualization provided ecological insights into which variables play the largest role in forest cover classification.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors