Skip to content

Commit

Permalink
Updated
Browse files Browse the repository at this point in the history
  • Loading branch information
ozlemkorpe committed Aug 10, 2020
1 parent 95e6d09 commit 1dfb7fa
Show file tree
Hide file tree
Showing 2 changed files with 99 additions and 12 deletions.
67 changes: 67 additions & 0 deletions Hierarchical_Clustering_Guide.asv
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
%%%------------- Hierarchical Clustering
% Step 1: Consider each data point as a single cluster.
% Step 2: Combine the two closest clusters and make them one cluster.
% Step 3: Repeatedly combine clusters until there is only one cluster.

% Approaches for finding closest clusters
% Single Link: Min distance
% Complete Link: Max distance
% Average: Average distance

% Types of Hierachical Clustering : Agglomerative, Divisive

% Import the dataset
data = readtable('Datasets\Mall_Customers.csv');

%Check for missing values
missings = sum(ismissing(data));

%Plot variables to check for outliers
IncomePlot = plot(data.AnnualIncome);
SpendingPlot = plot(data.SpendingScore);

% Perform Feature Scaling (Standardization Method)
stand_income = (data.AnnualIncome - mean(data.AnnualIncome)) / std(data.AnnualIncome);
data.AnnualIncome = stand_income;

stand_spending = (data.SpendingScore - mean(data.SpendingScore)) / std(data.SpendingScore);
data.SpendingScore = stand_spending;

% Select columns for clustering
selected_data = data(:,4:5);

%Data must be an array to be used in clustering algorithm
arrayed_data = table2array(selected_data);

% Select linkage method
z = linkage(arrayed_data, 'ward');

% Create dendogram
dendrogram(z);

% Determining tresholds for optimal number of clusters using links
% inconsistencies
i = inconsistent(z,7); % 7 is the deepest link depth
% i produces 4 column
% Col1 represents average height of all links in calculation
% Col2 represents standard deviation of all links in calculation
% Col3 represents number of links included in calculation
% Col4 represents inconsistency score
[a,b]= max(i(:,4)); % max inconsistency score


%------- Visualization
% data = arrayed_data;
% figure,
%
% gscatter(data(:,1),data(:,2),idx);
% hold on
%
% for i=1:6
% scatter(C(i,1),C(i,2),96,'black','filled');
% end
%
% legend({'Cluster 1', 'Cluster 2', 'Cluster 3', 'Cluster 4', 'Cluster 5','Cluster 6' })
% xlabel('Annual Income');
% ylabel('Spending Score');
% hold off
44 changes: 32 additions & 12 deletions Hierarchical_Clustering_Guide.m
Original file line number Diff line number Diff line change
Expand Up @@ -33,20 +33,40 @@
%Data must be an array to be used in clustering algorithm
arrayed_data = table2array(selected_data);

% Select linkage method
z = linkage(arrayed_data, 'ward');

% Create dendogram
dendrogram(z);

%------- Visualization
data = arrayed_data;
figure,
% Determining tresholds for optimal number of clusters using links
% inconsistencies
i = inconsistent(z,7); % 7 is the deepest link depth
% i produces 4 column
% Col1 represents average height of all links in calculation
% Col2 represents standard deviation of all links in calculation
% Col3 represents number of links included in calculation
% Col4 represents inconsistency score
[a,b]= max(i(:,4)); % max inconsistency score and save it in a and its index in b

gscatter(data(:,1),data(:,2),idx);
hold on
% Perform clustering
% Set treshold as little less than the max inconsistent element.With cutoff
% method (heights)
C= cluster(z,'cutoff', z(b,3)-0.1, 'Criterion', 'distance');

for i=1:6
scatter(C(i,1),C(i,2),96,'black','filled');
end

legend({'Cluster 1', 'Cluster 2', 'Cluster 3', 'Cluster 4', 'Cluster 5','Cluster 6' })
xlabel('Annual Income');
ylabel('Spending Score');
hold off
%------- Visualization
% data = arrayed_data;
% figure,
%
% gscatter(data(:,1),data(:,2),idx);
% hold on
%
% for i=1:6
% scatter(C(i,1),C(i,2),96,'black','filled');
% end
%
% legend({'Cluster 1', 'Cluster 2', 'Cluster 3', 'Cluster 4', 'Cluster 5','Cluster 6' })
% xlabel('Annual Income');
% ylabel('Spending Score');
% hold off

0 comments on commit 1dfb7fa

Please sign in to comment.