Clustering.jl

This package provides a set of algorithms for data clustering.

Installation

Pkg.add("Clustering")

Algorithms

Currently working algorithms:

Kmeans
Affinity Propagation

To be available:

K medoids
DP means
ISO Data

Documentation

K-means

Interfaces:

 # perform K-means (centers are updated inplace)
result = kmeans!(x, centers, opts) 

 # perform K-means based on a given set of inital centers
result = kmeans(x, init_centers, opts)  

 # perform K-means to get K centers
result = kmeans(x, k, opts) 
result = kmeans(x, k)

All these methods return an instance of KmeansResult, it is defined as

type KmeansResult{T<:FloatingPoint}
    centers::Matrix{T}         # cluster centers (d x k)
    assignments::Vector{Int}   # assignments (n)
    costs::Vector{T}           # costs of the resultant assignments (n)
    counts::Vector{Int}        # number of samples assigned to each cluster (k)
    cweights::Vector{T}        # cluster weights (k)
    total_cost::Float64        # total cost (i.e. objective) (k)
    iterations::Int            # number of elapsed iterations 
    converged::Bool            # whether the procedure converged
end

Options:

Note: options are specified using keyword arguments.

name	descrption	default value
max_iters	maximum number of iterations	100
tol	tolerable objv change at convergence	1.0e-6
weights	sample weights (a vector or nothing)	nothing
display	verbosity (`:none`, `:final`, or `:iter`)	`:iter`

Example

x = rand(100, 10000)   # a set of 10000 samples (each of dimension 100)

k = 50  # the number of clustering
result = kmeans(x, k; max_iter=50, display=:iter)

Affinity Propagation

Affinity Propagation is an algorithm that uses loopy belief propagation to run MAP inference to identify some exemplars. Unlike kmeans, the exemplars are chosen from the original samples. After the algorithm returns, every sample will be assigned to one of the exemplars.

The input of the algorithm is a similarity matrix S. Unlike kmeans, you don't need to (and cannot) specify the number of clusters. But the diagonal values of S will affect how many clusters you will get at the end. Specifically, S[i,j] could be interpreted as the tendency of assigning point i to point j (when j is an exemplar). So

S need NOT to be symmetric
S[i,i] represents the willingness of assigned point i to itself. So generally larger diagonal values for S means more clusters. For example, if S[i,i]==max(S) for all i, then every point will be an exemplar itself.

Usually, assigning the diagonal of S to be the median of all the rest entries could lead to reasonable results.

Interfaces:

result = affinity_propagation(S, opts)

where the following options could be specified using keyword arguments

max_iter::Integer = 500,    # max number of iterations
n_stop_check::Integer = 10, # stop if exemplars not changed for this number of iterations
damp::FloatingPoint = 0.5,  # damping factor for message updating, 0 means no damping
display::Symbol = :iter     # whether progress is shown

the returning value is a struct that looks like this:

type AffinityPropagationResult
    exemplar_index ::Vector{Int} # index for exemplars (centers)
    assignments    ::Vector{Int} # assignments for each point
    iterations     ::Int         # number of iterations executed
    converged      ::Bool        # converged or not
end

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
src		src
test		test
LICENSE.md		LICENSE.md
README.md		README.md
REQUIRE		REQUIRE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering.jl

Installation

Algorithms

Documentation

K-means

Example

Affinity Propagation

About

Releases

Packages

Languages

License

cosinequanon/Clustering.jl

Folders and files

Latest commit

History

Repository files navigation

Clustering.jl

Installation

Algorithms

Documentation

K-means

Example

Affinity Propagation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages