`LH5Store.read()` performance #71

gipert · 2024-02-08T11:32:51Z

I just time-profiled LH5Store.read(), used to read skimmed files from p03-p08. Attached are the results.

As you can see reading data sets with h5py takes only 50% of the time, and I am wondering if we can improve this. Relevant excerpts from the profiling:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   319        66       5295.0     80.2      4.2          if not h5f or name not in h5f:
   320                                                       msg = f"'{name}' not in {h5f.filename}"
   321                                                       raise KeyError(msg)
...
   337        66      11320.5    171.5      9.0          datatype = h5f[name].attrs["datatype"]
...
   338        66       9917.8    150.3      7.9          datatype, shape, elements = parse_datatype(datatype)
...
   763        59       7936.4    134.5      6.3              ds_n_rows = h5f[name].shape[0]
...
   818        58      62259.3   1073.4     49.4                  nda = h5f[name][source_sel]
...
   837        58       7681.7    132.4      6.1              attrs = h5f[name].attrs
...
   840        58       7633.8    131.6      6.1                      return Array(nda=nda, attrs=attrs), n_rows_to_read

To reproduce

Install line_profiler:

pip install line_profiler

test.py:

from lgdo import lh5
import glob

store = lh5.LH5Store()
store.read("skm", glob.glob("*-tier_skm.lh5"))

Run profiling:

kernprof -l test.py

Dump results:

python -m line_profiler -rmt "test.py.lprof"

The text was updated successfully, but these errors were encountered:

gipert added performance Code performance lh5 HDF5 I/O labels Feb 8, 2024

lvarriano mentioned this issue Mar 21, 2024

Increase read speed by x20-100 for most data #78

Closed

gipert linked a pull request Aug 12, 2024 that will close this issue

I/O performance improvements #100

Merged

gipert closed this as completed in #100 Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`LH5Store.read()` performance #71

`LH5Store.read()` performance #71

gipert commented Feb 8, 2024

LH5Store.read() performance #71

LH5Store.read() performance #71

Comments

gipert commented Feb 8, 2024

To reproduce

`LH5Store.read()` performance #71

`LH5Store.read()` performance #71