-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataLoader performance is bad? #521
Comments
what is going on with the tcm loading! okay thanks for testing, will check this asap. |
It looks like this performance issue could be related to these: https://forum.hdfgroup.org/t/performance-reading-data-with-non-contiguous-selection/8979 and h5py/h5py#1597 We're going to try the same workaround in |
Where do we do indexed reading in the data loader other than in |
the idx read appears to have been a red herring. I found a factor of ~3 speed up for that tcm read step (really: build entry list) in data_loader and another factor of ~2 in lgdo.table.get_dataframe. However the overall read is still a factor of ~3 slower than Patrick's low-level read:
There is a lot of complexity in data_loader having to do with dealing with multi-level cuts and other generalizations, but I hope it can be sped up more with some refactoring. It looks like there is still a lot going on in the inner loops. |
Nice! Performance of the LGDO conversion methods is also a topic for legend-exp/legend-pydataobj#30 by @MoritzNeuberger. Did anyone try profiling the code to spot straightforward bottlenecks? |
At the analysis workshop, it was argued that the slowness of the DataLoader was only due to the IO speed. When I loaded the data with "low-level routines", I never felt it was that slow.
To test this quantitatively, I wrote a quick script to compare low-level loading with DataLoader loading.
The script is:
First, I would ask you if you can find any unfair treatments in one or the other routine.
Booth routines should:
The result on LNGS of the script is:
The text was updated successfully, but these errors were encountered: