Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a k-medians clustering algorithm #142

Open
Boarders opened this issue Jun 12, 2019 · 1 comment
Open

Write a k-medians clustering algorithm #142

Boarders opened this issue Jun 12, 2019 · 1 comment

Comments

@Boarders
Copy link
Collaborator

Boarders commented Jun 12, 2019

Currently our clustering pass uses the algorithms in AI.Clustering.Hierarchical. As it stands there is no easy way to make use of the k-means algorithm within this library.

We should write a k-medians algorithm which is linear time and has good convergence properties (it should also expose a good amount of parallelism). This will entail writing a k-medians function for each of our different character types i.e. functions that looks something like:

kMedian :: (Foldable t, Traversable t) => t CharacterType -> CharacterType

In the case of a discrete character this will function will look like an inclusion-exclusion formula and we should be able to write similar things for other character types.

This would then be added as a Clustering option (currently asking for median clustering gives an error explaining that this is not yet implemented).

@Boarders
Copy link
Collaborator Author

Boarders commented Jul 10, 2019

This commit gives a version of K-medians using essentially Lloyds's algorithm for refinement and RGC (centroids of random subsamples) for initialisation. It is abstract enough that it would work as both an implementation of k-medians or k-means. This also exposes some parallelism in how assignment is done for each cluster, but finding the correct way to chunk the input will require experimentation.

Still left on this issue is to define a k-medians function on CharacterSequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants