Skip to content

markknoffler/KNN_for_image_segmentation_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

#KNN Classification on the Image Segmentation Dataset This project implements the K-Nearest Neighbors (KNN) algorithm to perform a classification task on the Image Segmentation dataset obtained from the UCI Machine Learning Repository. The dataset consists of multiple features derived from images, and the goal is to classify these into specific segments.

Key features of this project:

Data Preprocessing: Explored the effect of scaling the dataset using StandardScaler from the sklearn library. Observed that scaling introduced minor degradation in model performance compared to using raw data. This could be due to the feature-specific nature of KNN, which relies on distance metrics sensitive to scaling, potentially disrupting the dataset's inherent structure. Visualization: Created histograms for each feature and grouped classes in pairs to understand the distribution better. Model Implementation: KNN model implemented using the scikit-learn library. Evaluation metrics included precision, recall, F1-score, and accuracy. Performance: Without scaling: Achieved an overall accuracy of 94%, with slight variations in class-specific metrics. With scaling: Accuracy remained the same but showed minor performance drops in precision and recall for certain classes. The results highlight the subtle yet critical role preprocessing techniques play in classification models, especially for distance-based algorithms like KNN.

The KNN algorithm calculates distances between points in the feature space. While scaling helps normalize the range of values, making features comparable, it can sometimes disrupt patterns in datasets where feature magnitudes are naturally meaningful. In this case:

The features might already be on scales conducive to classification, meaning scaling altered their relative importance. KNN assumes neighborhoods defined by Euclidean distances. Scaling might inadvertently expand or compress the distances, especially if features naturally vary in their contribution to classification.

Features Dataset: Image Segmentation dataset from UCI Repository. Preprocessing: Raw feature analysis and scaling using StandardScaler. Histograms plotted for visualization of feature distributions. Model: Implemented KNN for classification using scikit-learn. Evaluation: Metrics: Precision, Recall, F1-score, and Accuracy.

How to Run Clone the repository: git clone https://github.com/markknoffler/KNN_for_image_segmentation_dataset.git cd knn-image-segmentation

Install dependencies: pip install -r requirements.txt

Execute the project: python src/knn_classifier.py

Results Performance without Scaling: Accuracy: 94% Macro Average (Precision/Recall/F1): 94%

Performance with Scaling: Accuracy: 94% Macro Average (Precision/Recall/F1): 94%

Insights Impact of Scaling: Scaling slightly reduced performance for certain classes despite improving feature comparability. Feature distributions might have been optimal for raw data. Class-Specific Observations: Classes 0 and 5 consistently achieved 100% precision and recall. Class 2 showed performance variation, possibly due to overlapping feature distributions.

About

This project uses the K-Nearest Neighbors (KNN) algorithm to classify instances from the Image Segmentation dataset. The dataset contains labeled image segments, with features derived from image characteristics. This implementation explores preprocessing techniques and evaluates their impact on model performance.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors