Skip to content
This repository has been archived by the owner on Aug 26, 2022. It is now read-only.

Commit

Permalink
modified: cluster.py
Browse files Browse the repository at this point in the history
  • Loading branch information
freesinger committed Jan 6, 2019
1 parent 2a0118b commit 6c61d6a
Show file tree
Hide file tree
Showing 11 changed files with 36 additions and 6 deletions.
Binary file modified __pycache__/cluster.cpython-36.pyc
Binary file not shown.
25 changes: 25 additions & 0 deletions cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,28 @@ def classify(self, taginfo, srt_dens, min_num, maxid):
if taginfo[i] == -1:
taginfo[i] = taginfo[min_num[i]]
return taginfo

def analysis(self, centers, taginfo, distance, maxid):
'''
:rtype: plot cluster information
'''
num_centers = len(centers)
tmp = sorted(taginfo.items(), key=lambda k:k[1])
dvid_numbers = list()
for i in range(1, num_centers + 1):
cluster_i = list()
for pair in tmp:
if pair[1] == i:
cluster_i.append(pair[0])
dvid_numbers.append(cluster_i)

for i in range(1, num_centers + 1):
cur_set = dvid_numbers[i - 1]
d = list(distance[(j, i)] for j in cur_set)
plt.stackplot(cur_set, d)
plt.xlabel('Point Number')
plt.ylabel('Distance to Center')
plt.title('Cluster No.{}'.format(i))
plt.savefig('./images/Cluster{}'.format(i))
plt.close()

Binary file added images/Cluster1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cluster2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cluster3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cluster4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cluster5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cluster6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 7 additions & 3 deletions others/report.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,7 @@

#### 2.2.2 Entropy

对一个POP集$\{\varphi_1,\varphi_2,...,\varphi_n\}$,定义数据域的熵值$H=-\Sigma_{i=1}^n(\frac{\varphi_i}{Z})log(\frac{\varphi_i}{Z})$,熵值代表数据域的混乱度,我们需要求使得$H$最小的变量$\sigma$。

下图直观展示了$H$随$\sigma$的变化趋势:
对一个POP集$\{\varphi_1,\varphi_2,...,\varphi_n\}$,定义数据域的熵值$H=-\Sigma_{i=1}^n(\frac{\varphi_i}{Z})log(\frac{\varphi_i}{Z})$,熵值代表数据域的混乱度,我们需要求使得$H$最小的变量$\sigma$。 下图直观展示了$H$随$\sigma$的变化趋势:

![entropy](../images/entropy.png)

Expand Down Expand Up @@ -114,6 +112,12 @@ def classify(self, taginfo, srt_dens, min_num, maxid):

![cluster_gausse](../images/cluster_gausse.png)

#### 2.3.2 聚类效果

![Cluster6](../images/Cluster6.png)

由之前的实验结果可知聚类中心共6个,简单对六个簇的分类情况进行的可视化,横坐标为点标号,纵坐标为点到聚类中心的距离。由于点的个数较多,故采用面积图,如上图所示是第六个簇的效果图。

## 3. 总结

由于对距离定义未知,所以没有进行六类cluster的plot。文章中提到的聚类算法其实只实现了聚类中心的选择,在这基础上阅读了文章的增补内容,进行了聚类过程算法的补全,同时对截断距离的选取进行优化。在这基础之上还可以对聚类边界进行讨论,对离群点和交叉点进行划分。
Expand Down
Binary file modified report.pdf
Binary file not shown.
7 changes: 4 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,10 @@ def main():
center, tag = clust.locate_center(refer_info, maxid, threshold)
taginfo = clust.classify(tag, sort_dst, min_num, maxid)
print('Clustering done!')
# print(taginfo)
# gauss = solution.Guasse(dist, maxid, threshold)


# show each cluster results
clust.analysis(center, taginfo, dist, maxid)

# show cluster distribution info
temp = sorted(taginfo.items(), key=lambda k:k[1])
y, x = zip(*temp)
Expand Down

0 comments on commit 6c61d6a

Please sign in to comment.