Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LH5Store.read() performance #71

Closed
gipert opened this issue Feb 8, 2024 · 0 comments · Fixed by #100
Closed

LH5Store.read() performance #71

gipert opened this issue Feb 8, 2024 · 0 comments · Fixed by #100
Labels
lh5 HDF5 I/O performance Code performance

Comments

@gipert
Copy link
Member

gipert commented Feb 8, 2024

I just time-profiled LH5Store.read(), used to read skimmed files from p03-p08. Attached are the results.

As you can see reading data sets with h5py takes only 50% of the time, and I am wondering if we can improve this. Relevant excerpts from the profiling:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   319        66       5295.0     80.2      4.2          if not h5f or name not in h5f:
   320                                                       msg = f"'{name}' not in {h5f.filename}"
   321                                                       raise KeyError(msg)
...
   337        66      11320.5    171.5      9.0          datatype = h5f[name].attrs["datatype"]
...
   338        66       9917.8    150.3      7.9          datatype, shape, elements = parse_datatype(datatype)
...
   763        59       7936.4    134.5      6.3              ds_n_rows = h5f[name].shape[0]
...
   818        58      62259.3   1073.4     49.4                  nda = h5f[name][source_sel]
...
   837        58       7681.7    132.4      6.1              attrs = h5f[name].attrs
...
   840        58       7633.8    131.6      6.1                      return Array(nda=nda, attrs=attrs), n_rows_to_read

To reproduce

Install line_profiler:

pip install line_profiler

test.py:

from lgdo import lh5
import glob

store = lh5.LH5Store()
store.read("skm", glob.glob("*-tier_skm.lh5"))

Run profiling:

kernprof -l test.py

Dump results:

python -m line_profiler -rmt "test.py.lprof"
@gipert gipert added performance Code performance lh5 HDF5 I/O labels Feb 8, 2024
@gipert gipert linked a pull request Aug 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lh5 HDF5 I/O performance Code performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant