Fancy indexing is not supported #47

mikejiang · 2018-01-03T22:45:31Z

@jreadey , I think you were aware of it since you mentioned Coordinate list (dset[(x,y,z),:]) not being supported yet (not sure if you actually meant list). But I considered it as the one of most used selection type that is worth the efforts to be added.

ds_local[1,[1,3,5]]
Out[64]: array([ 0.,  0.,  0.], dtype=float32)
ds_remote[1,[1,3,5]]
Traceback (most recent call last):
  File "/home/wjiang2/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-65-bc569229dc91>", line 1, in <module>
    ds_remote[1,[1,3,5]]
  File "/home/wjiang2/.local/lib/python3.6/site-packages/h5pyd/_hl/dataset.py", line 796, in __getitem__
    raise ValueError("selection type not supported")
ValueError: selection type not supported

The text was updated successfully, but these errors were encountered:

jananzhu · 2021-04-16T00:45:20Z

Hi @jreadey, we're interested in using fancy indexing with HSDS datasets as well. Initially, we tried using the equivalent point selection as suggested in #48 as a workaround, but are finding that the performance is poor relative to a hyperslab selection on a superset of the point selection once you get past the scale of 10k points.

Just curious to see if there's any update on what it would take to implement fancy indexing at this point. I was looking through the RESTful HDF5 white paper and noticed that there is a section under the Dataset POST spec that mentions "set-theoretical combinations of hyperslabs" but a detailed example request is not given. It's also not mentioned in the h5serv documentation so I'm wondering if it made it into the final version of the spec.

jreadey · 2021-04-16T22:38:50Z

Yes, I've been meaning to get to this...
I think a fairly simple extension to the dataset GET api should work (basically passing in the h5py index as parameters).
They h5py docs have a warning that performance could be sub-optimal, but I'd want o make a related HSDS update to do the fancy selection efficiently.

You are looking to handle this type of idexing:

dset[4:6, [2,5,9]]

correct?

Also, I see that h5py recently added support for Multi-Block selection: https://docs.h5py.org/en/stable/high/dataset.html#multi-block-selection. Is that of interest as well/

jananzhu · 2021-04-19T19:35:06Z

That's right. We don't have a specific need for multi-block selection at the moment, just fancy indexing as you've described.

To clarify, is it be possible to make a fancy indexing request (albeit less efficiently) via the dataset value GET API currently? Wondering if we could test this out with a modification of the h5pyd client or if an HSDS update would be required.

jreadey · 2021-04-27T15:30:03Z

Yes, you can just do multiple regular selections.
E.g., instead of dset[4:6, [2,5,9]], do the following:

arr = np.zeros((2,3))
for i in (2,5,9):
  arr[:,i] = dset[4:6,i]

It's unfortunate that h5pyd doesn't support asynchronous requests which would let you do all the fetches without waiting for responses, but this should work till we have the fancy selection going.

jreadey · 2021-12-13T16:22:02Z

I believe I have this feature working now. Code changes are in the fancyindx branch of hsds and fancyindx branch of h5pyd.
I'll be doing some additional testing and evaluation and then merge into master later this week.

jreadey · 2021-12-21T17:18:37Z

The feature is merged into master now and in PyPI (version 0.9.2).

Here's a simple performance test comparing using fancy indexing vs iterating through a set of columns: https://gist.github.com/jreadey/bd75c469559f03596bd2d274dfb5a315. In my testing using fancy indexing was ~8x faster (running with 4 HSDS nodes).

jreadey closed this as completed Dec 21, 2021

jananzhu mentioned this issue Jan 11, 2022

Fancy indexing index list length is limited by GET query size #113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fancy indexing is not supported #47

Fancy indexing is not supported #47

mikejiang commented Jan 3, 2018

jananzhu commented Apr 16, 2021

jreadey commented Apr 16, 2021

jananzhu commented Apr 19, 2021

jreadey commented Apr 27, 2021

jreadey commented Dec 13, 2021

jreadey commented Dec 21, 2021

Fancy indexing is not supported #47

Fancy indexing is not supported #47

Comments

mikejiang commented Jan 3, 2018

jananzhu commented Apr 16, 2021

jreadey commented Apr 16, 2021

jananzhu commented Apr 19, 2021

jreadey commented Apr 27, 2021

jreadey commented Dec 13, 2021

jreadey commented Dec 21, 2021