This repository contains an implementation of the CLARANS (Clustering Large Applications based on RANdomized Search) algorithm, applied to customer segmentation using the Mall Customers dataset.
CLARANS is an efficient partitioning-based clustering algorithm designed for large spatial data mining. It improves upon traditional algorithms like PAM (Partitioning Around Medoids) and CLARA by using a randomized search approach to explore the "graph" of possible medoid sets.
- Randomized Search: Efficiently explores the search space by randomly sampling neighbors.
- Custom Implementation: Built from scratch using Python and NumPy.
- Comparison: Includes a baseline comparison with the k-Means algorithm.
- Evaluation Metrics: Uses Silhouette Score and Davies-Bouldin Index to assess clustering quality.
- Visualization: Detailed plots showing original data distributions and final clustering results.
The project uses the Mall Customers Dataset, which is a popular dataset for customer segmentation.
- Features used: Annual Income (k$) and Spending Score (1-100).
- Preprocessing: Data is normalized to ensures equal weighting of features during distance calculations.
To run the analysis, you will need the following Python libraries:
numpypandasmatplotlib
- Ensure
Mall_Customers.csvis in the same directory as the notebook. - Open and run the
clarans_implementation.ipynbJupyter notebook.
CLARANS treats the clustering problem as searching through a graph where each node represents a set of
numlocal: The number of local minima to search for.maxneighbor: The maximum number of neighbors examined for each local minimum.
By randomly jumping between nodes (sets of medoids) only when a better configuration is found, CLARANS achieves a balance between the thoroughness of PAM and the efficiency of CLARA.
- Ng, R. T., & Han, J. (2002). CLARANS: A Method for Clustering Objects for Spatial Data Mining. IEEE Transactions on Knowledge and Data Engineering.


