Can I use fitsio to loop quickly over 20k small fits-file? #335

Nestak2 · 2021-11-09T20:45:57Z

Hi, I need to extract information from a few columns in ~20k different fits files. Each file is relatively small, ~0.2MB. I have been doing this so far with a loop and astropy like this

from astropy.io import fits

data = []
for file_name in fits_files_list:
    with fits.open(file_name, memmap=False) as hdulist:
        lam = np.around(10**hdulist[1].data['loglam'], 4)
        flux = np.around(hdulist[1].data['flux'], 4)
        z = np.around(hdulist[2].data['z'], 4)
    data.append([lam, flux, z])

This takes for the 20k fits files ~2.5 hours and from time to time I need to loop through the files for other reasons. So I wanted to minimize the time for that and I tried out fitsio in this way:

import fitsio
from fitsio import FITS,FITSHDR

for file_name in fits_files_list[:300]:
    hdulist=fitsio.FITS(file_name)
    lam = np.around(10**hdulist[1]['loglam'][:], 4)
    flux = np.around(hdulist[1]['flux'][:], 4)
    z = np.around(hdulist[2]['z'][:], 4)
    data.append([lam, flux, z])

But unfortunately, it doesn't give me much of a time improvement, if at all. So my question is: Can I improve the time for looping with fistio? Do you know of other packages that would help me? Or do you know if I can change my algorithm to make it run faster, e.g. somehow vectorize the loop? Or some software to stack quickly 20k fits files into one fits-file (TOPCAT has no function that does this for more than 2 files)? Thanks for any ideas and comments!

The text was updated successfully, but these errors were encountered:

esheldon · 2021-11-11T18:36:16Z

It might be good to profile this, to see if it is limited by reading from disk.

If it is read limited, then the best way to speed it up would be to run multiple jobs on different machines and combine the results afterward.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use fitsio to loop quickly over 20k small fits-file? #335

Can I use fitsio to loop quickly over 20k small fits-file? #335

Nestak2 commented Nov 9, 2021

esheldon commented Nov 11, 2021

Can I use fitsio to loop quickly over 20k small fits-file? #335

Can I use fitsio to loop quickly over 20k small fits-file? #335

Comments

Nestak2 commented Nov 9, 2021

esheldon commented Nov 11, 2021