Skip to content

Code and Visualizations of NGPCA for clustering (in high-dimensional data spaces)

License

Notifications You must be signed in to change notification settings

NicoMigenda/NGPCA-Clustering

Repository files navigation

NGPCA: Clustering high-dimensional and non-stationary data streams

Local PCA Clustering for streaming and high-dimensional data distributions. This repo serves to reproduce the results from the publication: Adaptive local Principal Component Analysis improves the clustering of high-dimensional data. Cite as: Nico Migenda, Ralf Möller, Wolfram Schenck, "Adaptive local Principal Component Analysis improves the clustering of high-dimensional data", Pattern Recognition, Volume 146, 2024, https://doi.org/10.1016/j.patcog.2023.110030

View NGPCA: Neural Gas Principal Component Analysis on File Exchange

Table of contents

Quick start

Get started by downloading the latest release:

What's included

Within the download you'll find the following directories and files:

Download contents
  |-- Example_dynamic.m
  |-- Example_dynamic.mlx
  |-- Example_stationary.m
  |-- Example_stationary.mlx
  |-- README.md
  |-- Results
  |   `-- gif
  |       |-- dynamic.gif
  |       `-- s1.gif
  |-- data
  |   |-- rls.mat
  |   |-- s1.mat
  |   `-- vortex.m
  `-- ngpca
      |-- NGPCA.m
      |-- drawunits.m
      |-- eforrlsa.m
      |-- init.m
      |-- normalizedmi.m
      |-- plot_ellipse.m
      |-- potentialFunction.m
      |-- update.m
      |-- validate_CI.m
      `-- validate_NMI_DU.m

Getting Started

The latest release contains all files needed to directly run the algorithm:

1 Open either Example_dynamic.m or Example_stationary.m in Matlab or alternativly use the provided live script versions (.mlx)
2. Running the scripts will automatically perform NGPCA-Clustering on the s1 or ring-line-square + vortex data set or with standard settings

Optional:

  1. Change default settings or add optional parameters to the ngpca object creation or for the training process
  2. Train the model directly on a full data set using the fit_multiple() function or build your own training loops with fit_single()
  3. Visualize the clustering results with the draw() function
  4. Calculate validation metrics (CI, NMI, DU) by providing ground thruth and cluster shape information

Visualizations

The following visualizations represent the learning process on selected data sets of the standard clustering benchmark database. For all data sets the default settings are used.

Stationary example: Data set S1

s1

Non-Starionary example: Ring-Line-Square and Vortex

dynamic

Creators

Nico Migenda, Center for Applied Data Science Gütersloh, Bielefeld University of Applied Sciences and Arts, Germany

Ralf Möller, Computer Engineering Group, Faculty of Technology, Bielefeld University, Germany

Wolfram Schenck, Center for Applied Data Science Gütersloh, Bielefeld University of Applied Sciences and Arts, Germany

About

Code and Visualizations of NGPCA for clustering (in high-dimensional data spaces)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages