Skip to content

Segmenting with Mixed Type Data - A Case Study Using K-Medoids on Subscription Data

Notifications You must be signed in to change notification settings

DiegoUsaiUK/K_Medoid_Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

K_Medoid_Clustering

In this project I revisit clustering, one of my favourite analytic methods, to explore and analyse a real-world dataset that included a mix of categorical and numerical feature. This required a different approach from the classical K-means algorithm that cannot be no directly applied to categorical data.

Instead, I used the K-medoids algorithm, also known as PAM (Partitioning Around Medoids), that has the advantage of working on distances other than numerical and lends itself well to analyse mixed-type data.

The silhouette coefficient helped to establish the optimal number of clusters, whilst t-SNE ( t-distributed stochastic neighbour embedding), a dimensionality reduction technique akin Principal Component Analysis and UMAP, unveiled good separation between clusters as well as closeness of elements within clusters, confirming the segmentation relevance.

Finally, I condensed the insight generated from the analysis into a number of actionable and data-driven recommendations that, applied correctly, could help improve product sign up.