Skip to content

Latest commit

 

History

History
15 lines (14 loc) · 993 Bytes

README.md

File metadata and controls

15 lines (14 loc) · 993 Bytes

Agglomerative Clustering Analysis on gene expression dataset

This a solution notebook to an assignment question given in a Data Mining graduate course. Each code block is accompanied by relevant analysis wherever required.
Dataset link: https://archive.ics.uci.edu/ml/datasets/gene+expression+cancer+RNA-Seq
Broadly, the following steps have been performed in this solution notebook:

  • Minimal preprocessing on the dataset
  • Explained wide usage of Agglomerative clustering over Divisive Clustering
  • Visualization of given class labels using TSNE
  • Ran agglomerative clustering using the following linkages {single, complete, group average, minimum variance}.
    • Compared the clustering performance both visually and empirically on the dataset.
    • Reported the best results on various cluster validity indices.
  • These above assumptions and the flow of work is according to the questions asked in assignment.