-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FFCV doesn't work for large dataset #389
Comments
I applied this fix and this seems to make that part of the code finish immediately. What other changes are needed to support the large dataset regime? Also, the length of my dataset can be very large in some cases - up to 100 million frames. I wonder if there is a code bottleneck there as well for FFCV. |
I made the dataset length much smaller and I'm able to load my dataloader:
However, I get a segfault immediately when I try to access it. I presume this is due to using the memmap. Is there a suggestion for how to make this whole setup work with the large dataset? Segfault happens right after the pdb:
|
I'm finding that both for the initial beton creation and the initial dataloader load, it's requiring the full sizes of the dataset to load into memory - getting OOM errors otherwise. This is even with os_cache=False. |
I got things to work by changing I guess FFCV is doing something internally that blows up the memory upon Loader(...) when num_workers > 0. Still have not tested if things work during beton creation stage, was getting OOM there unless I had more memory than the dataset size. I was using 60 workers. |
I am trying to load a 600GB dataset.
It froze for one hour on np.from_file in ffcv -> ffcv -> reader.py line 70 before I gave up and cancelled it.
I tried to fix this by using np.memmap.
The first time I did this, for some reason the subsequent code changed my 262GB Beton file to 6.2TB.
I need to recreate the beton now to try with just the read flag for memmap to see if I can get this working. Otherwise any tips?
The text was updated successfully, but these errors were encountered: