-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
95e6d09
commit 1dfb7fa
Showing
2 changed files
with
99 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
%%%------------- Hierarchical Clustering | ||
% Step 1: Consider each data point as a single cluster. | ||
% Step 2: Combine the two closest clusters and make them one cluster. | ||
% Step 3: Repeatedly combine clusters until there is only one cluster. | ||
|
||
% Approaches for finding closest clusters | ||
% Single Link: Min distance | ||
% Complete Link: Max distance | ||
% Average: Average distance | ||
|
||
% Types of Hierachical Clustering : Agglomerative, Divisive | ||
|
||
% Import the dataset | ||
data = readtable('Datasets\Mall_Customers.csv'); | ||
|
||
%Check for missing values | ||
missings = sum(ismissing(data)); | ||
|
||
%Plot variables to check for outliers | ||
IncomePlot = plot(data.AnnualIncome); | ||
SpendingPlot = plot(data.SpendingScore); | ||
|
||
% Perform Feature Scaling (Standardization Method) | ||
stand_income = (data.AnnualIncome - mean(data.AnnualIncome)) / std(data.AnnualIncome); | ||
data.AnnualIncome = stand_income; | ||
|
||
stand_spending = (data.SpendingScore - mean(data.SpendingScore)) / std(data.SpendingScore); | ||
data.SpendingScore = stand_spending; | ||
|
||
% Select columns for clustering | ||
selected_data = data(:,4:5); | ||
|
||
%Data must be an array to be used in clustering algorithm | ||
arrayed_data = table2array(selected_data); | ||
|
||
% Select linkage method | ||
z = linkage(arrayed_data, 'ward'); | ||
|
||
% Create dendogram | ||
dendrogram(z); | ||
|
||
% Determining tresholds for optimal number of clusters using links | ||
% inconsistencies | ||
i = inconsistent(z,7); % 7 is the deepest link depth | ||
% i produces 4 column | ||
% Col1 represents average height of all links in calculation | ||
% Col2 represents standard deviation of all links in calculation | ||
% Col3 represents number of links included in calculation | ||
% Col4 represents inconsistency score | ||
[a,b]= max(i(:,4)); % max inconsistency score | ||
|
||
|
||
%------- Visualization | ||
% data = arrayed_data; | ||
% figure, | ||
% | ||
% gscatter(data(:,1),data(:,2),idx); | ||
% hold on | ||
% | ||
% for i=1:6 | ||
% scatter(C(i,1),C(i,2),96,'black','filled'); | ||
% end | ||
% | ||
% legend({'Cluster 1', 'Cluster 2', 'Cluster 3', 'Cluster 4', 'Cluster 5','Cluster 6' }) | ||
% xlabel('Annual Income'); | ||
% ylabel('Spending Score'); | ||
% hold off |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters