GridPartitioning fails with odd number of samples to select #156

marco-2023 · 2023-08-10T17:21:44Z

@Ali-Tehrani I am going through the Jupyter Notebook (tutorial) and the GridPartitioning methods fail when used to select an odd number of samples. I followed the cause of the error to line 335 of the module partition. It calls the compute_diversity function (module diversity) to compute the diversity of the bins (an array of elements of the bin is passed as the only argument). The problem is in the function compute_diversity which by default uses the hypersphere_overlap_of_subset method (line 281 of diversity module), this needs two arguments (the data of the set and the total data) which compute_diversity cannot provide.

I don't know if an option would be changing the diversity function to use as an argument to compute_diversity in line 335 of the partition module?

The text was updated successfully, but these errors were encountered:

FanwangM · 2023-08-11T00:11:58Z

A little history tracing regarding this issue, #134.

marco-2023 · 2023-08-11T17:48:04Z

Thanks, @FanwangM, I see now that it will be taken care of.

FarnazH · 2023-09-09T01:10:38Z

@marco-2023, is this an issue? If so, can you please share a code snippet to show this failure?

marco-2023 · 2023-09-11T16:12:50Z

Yes, it is. This problem is present whenever happens that the function compute_diversity is used in line 335. Not only for an odd number of samples. Below is an example where the error shows.

from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances
import matplotlib.pyplot as plt
import numpy as np
from DiverseSelector import  GridPartitioning

# Generate synthetic data using make_blobs 100 samples, 2 features, 1 cluster
coords, class_labels = make_blobs(n_samples=100, n_features=2, centers=1, random_state=42)

# Selecting 13 diverse data points from the first dataset (100 points uniformly distributed in one
# cluster). 
selector = GridPartitioning(2,"equisized_independent")
selected_ids1 = selector.select(coords, size=13)

The result is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 13
     10 # Selecting 13 diverse data points from the first dataset (100 points  uniformly  distributed in one
     11 # cluster). 
     12 selector = GridPartitioning(2,"equisized_independent")
---> 13 selected_ids1 = selector.select(coords, size=13)

File [/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/methods/base.py:65](https://file+.vscode-resource.vscode-cdn.net/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/methods/base.py:65), in SelectionBase.select(self, arr, size, labels)
     60     raise ValueError(
     61         f"Size of subset {size} cannot be larger than number of samples {len(arr)}."
     62     )
     64 if labels is None:
---> 65     return self.select_from_cluster(arr, size)
     67 # compute the number of samples (i.e. population or pop) in each cluster
     68 unique_labels = np.unique(labels)

File [/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/methods/partition.py:335](https://file+.vscode-resource.vscode-cdn.net/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/methods/partition.py:335), in GridPartitioning.select_from_cluster(self, arr, num_selected, cluster_ids)
    333 diversity = []
    334 for bin_idx, bin_list in bins.items():
--> 335     diversity.append((compute_diversity(arr[bin_list]), bin_idx))
    336 diversity.sort(reverse=True)
    337 for _, bin_idx in diversity[:num_needed]:

File [/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/diversity.py:77](https://file+.vscode-resource.vscode-cdn.net/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/diversity.py:77), in compute_diversity(features, div_type)
...
---> 77     return func_dict[div_type](features)
     78 else:
     79     raise ValueError(f"Diversity type {div_type} not supported.")

TypeError: hypersphere_overlap_of_subset() missing 1 required positional argument: 'x'

@FanwangM pointed out in #134 (comment) that this problem should be fixed by merging #138. Here the default method for compute_diversity changes to "entropy" which only needs the array of samples to compute the diversity and thus is compatible with the way the function is called in GridPartitioning.

FanwangM · 2023-09-12T06:16:08Z

I will work on this issue later today, by resolving merging conflicts. Thanks for getting the detailed error information, which refreshed my mind on this problem. @marco-2023

FanwangM · 2023-09-16T04:38:30Z

The #138 is merged.

FarnazH · 2023-10-27T02:47:58Z

This problem is fixed, so I will close this issue. Please re-open or comment, if there is still sth wrong.

Ali-Tehrani mentioned this issue Sep 13, 2023

Refactor and Add methods in Grid Partitioning #162

Merged

FarnazH closed this as completed Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GridPartitioning fails with odd number of samples to select #156

GridPartitioning fails with odd number of samples to select #156

marco-2023 commented Aug 10, 2023

FanwangM commented Aug 11, 2023

marco-2023 commented Aug 11, 2023

FarnazH commented Sep 9, 2023

marco-2023 commented Sep 11, 2023

FanwangM commented Sep 12, 2023 •

edited

Loading

FanwangM commented Sep 16, 2023

FarnazH commented Oct 27, 2023

GridPartitioning fails with odd number of samples to select #156

GridPartitioning fails with odd number of samples to select #156

Comments

marco-2023 commented Aug 10, 2023

FanwangM commented Aug 11, 2023

marco-2023 commented Aug 11, 2023

FarnazH commented Sep 9, 2023

marco-2023 commented Sep 11, 2023

FanwangM commented Sep 12, 2023 • edited Loading

FanwangM commented Sep 16, 2023

FarnazH commented Oct 27, 2023

FanwangM commented Sep 12, 2023 •

edited

Loading