-
Notifications
You must be signed in to change notification settings - Fork 116
Density based partitioning
This page describes density-based partitioning algorithm in more details.
At the beginning of the partitioning procedure, the data set is considered as a large box full of points. The partitioning algorithm splits this box into 2 parts along its longest dimension (you can change the number of splits by setting PartitioningSettings.numberOfSplits property). Each of these 2 boxes are split again, and so on. This process is repeated until any of the following conditions is met:
- Number of points in the box becomes less than or equal to PartitioningSettings.numberOfPointsInBox property (50,000 by default);
- Maximum number of levels is reached (specified in PartitioningSettings.numberOfLevels property, 10 by default)
- The shortest side of a box becomes smaller than 2 * epsilon (epsilon is a parameter of the DBSCAN algorithm)
To make this process faster, the whole hierarchy of boxes is precomputed. Then each point in the data set is examined to check which boxes it falls into. Finally, each box is examined to check whether it should be selected as a partition. So, the whole partitioning process requires only a single pass through the data set.