-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GridPartitioning fails with odd number of samples to select #156
Comments
A little history tracing regarding this issue, #134. |
Thanks, @FanwangM, I see now that it will be taken care of. |
@marco-2023, is this an issue? If so, can you please share a code snippet to show this failure? |
Yes, it is. This problem is present whenever happens that the function from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances
import matplotlib.pyplot as plt
import numpy as np
from DiverseSelector import GridPartitioning
# Generate synthetic data using make_blobs 100 samples, 2 features, 1 cluster
coords, class_labels = make_blobs(n_samples=100, n_features=2, centers=1, random_state=42)
# Selecting 13 diverse data points from the first dataset (100 points uniformly distributed in one
# cluster).
selector = GridPartitioning(2,"equisized_independent")
selected_ids1 = selector.select(coords, size=13) The result is: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[2], line 13
10 # Selecting 13 diverse data points from the first dataset (100 points uniformly distributed in one
11 # cluster).
12 selector = GridPartitioning(2,"equisized_independent")
---> 13 selected_ids1 = selector.select(coords, size=13)
File [/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/methods/base.py:65](https://file+.vscode-resource.vscode-cdn.net/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/methods/base.py:65), in SelectionBase.select(self, arr, size, labels)
60 raise ValueError(
61 f"Size of subset {size} cannot be larger than number of samples {len(arr)}."
62 )
64 if labels is None:
---> 65 return self.select_from_cluster(arr, size)
67 # compute the number of samples (i.e. population or pop) in each cluster
68 unique_labels = np.unique(labels)
File [/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/methods/partition.py:335](https://file+.vscode-resource.vscode-cdn.net/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/methods/partition.py:335), in GridPartitioning.select_from_cluster(self, arr, num_selected, cluster_ids)
333 diversity = []
334 for bin_idx, bin_list in bins.items():
--> 335 diversity.append((compute_diversity(arr[bin_list]), bin_idx))
336 diversity.sort(reverse=True)
337 for _, bin_idx in diversity[:num_needed]:
File [/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/diversity.py:77](https://file+.vscode-resource.vscode-cdn.net/mnt/Data/Work/Ayers/QC-Devs/DiverseSelector/DiverseSelector/diversity.py:77), in compute_diversity(features, div_type)
...
---> 77 return func_dict[div_type](features)
78 else:
79 raise ValueError(f"Diversity type {div_type} not supported.")
TypeError: hypersphere_overlap_of_subset() missing 1 required positional argument: 'x' @FanwangM pointed out in #134 (comment) that this problem should be fixed by merging #138. Here the default method for |
I will work on this issue later today, by resolving merging conflicts. Thanks for getting the detailed error information, which refreshed my mind on this problem. @marco-2023 |
The #138 is merged. |
This problem is fixed, so I will close this issue. Please re-open or comment, if there is still sth wrong. |
@Ali-Tehrani I am going through the Jupyter Notebook (tutorial) and the
GridPartitioning
methods fail when used to select an odd number of samples. I followed the cause of the error to line 335 of the modulepartition
. It calls thecompute_diversity
function (modulediversity
) to compute the diversity of the bins (an array of elements of the bin is passed as the only argument). The problem is in the functioncompute_diversity
which by default uses thehypersphere_overlap_of_subset
method (line 281 ofdiversity
module), this needs two arguments (the data of the set and the total data) whichcompute_diversity
cannot provide.I don't know if an option would be changing the diversity function to use as an argument to
compute_diversity
in line 335 of thepartition
module?The text was updated successfully, but these errors were encountered: