Out-of-memory when mksquashfs'ing 200M files #238

nh2 · 2023-04-01T19:06:25Z

Hi,

I'm having trouble finding concrete information on whether squashfs is designed to handle packing and unpacking large amounts of files with low/constant RAM usage.

I ran mksquashfs on directory with 200 million files, around 20 TB total size.

I used flags -no-duplicates -no-hardlinks; mksquashfs version 4.5.1 (2022/03/17) on Linux x86_64.
It OOM'ed with 53 GB resident memory usage.

Should mksquashfs handle this? If yes, I guess the OOM should be considered a bug.

Otherwise, I'd put it as a feature request, as it would be very nice to have a tool that can handle this.

The text was updated successfully, but these errors were encountered:

plougher · 2023-04-05T02:14:42Z

This is an interesting request. Back in the early days of Squashfs (from 2002 to about 2006), Mksquashfs did one pass over the source filesystem creating the Squashfs filesystem as it went. This did not require caching any of the source filesystem and so it was very light on memory use.

Unfortunately adding features such as real inode numbers, hard-link support (including inode nlinks) and "." and ".." directories (the first two versions of Squashfs didn't have any of these) requires fully scanning the source filesystem to build an in-memory representation.

This takes memory and so unfortunately 53 GB is probably correct for around 200 million files, and so this is expected and not a bug.

But if someone was happy to forgo hard-link detection and advanced support such as pseudo files and actions, it may be possible to reduce the in-memory representation and move more to the original single pass in a "memory light mode".

I'll add it to the list of enhancements, and see if priorities allow it to be looked at for the next release.

nh2 · 2023-04-08T09:54:01Z

if someone was happy to forgo hard-link detection and advanced support such as pseudo files and actions, it may be possible

@plougher Yes, that's exactly what I'm after.

This is also why I tried with -no-duplicates -no-hardlinks, as those features already sound like they need O(number of files) memory.

squashfs is sometimes recommended as a good alternative to tar/zip that supports modern compression and random access.

Constant memory use seems to be the most critical thing missing to truly replace them.

SvanteRichter · 2024-12-09T13:47:47Z

Hello!

I'm trying something similar, creating an archive with 271m (271799653 to be exact) small files. The total size is only around 300GB though. Many of them (maybe 70%) are duplicates, and if I'd guess the size of the final archive it'd be around 60-80GB.

I tried doing this on a machine with 128GB RAM and 128GB swap, but it fails at the final step (after the progressbar has gone to 100%) with "FATAL ERROR: Out of memory (write_inode_lookup_table)". Memory usage seems to be close to 100%, but swap usage seems to be lower (I don't have exact figures). Not sure if it matters but the files are gzipped protocolbuffers containing openstreetmap data.

The flags I used where -quiet -noD -comp zstd -Xcompression-level 1 -fstime 0 -all-time 0 -no-xattrs -all-root -info.

Is there any way I can reduce memory usage enough to make it complete? Or some way to estimate how much RAM I'd need to make it pass? I'm currently rerunning with no compression as a test but that will probably take 3-4 days to see if it fails again.

Thanks!

plougher · 2024-12-10T10:59:16Z

If Mksquashfs got to the write_inode_lookup_table function then Mksquashfs almost finished writing the filesystem.

If you don't want to export the filesystem via NFS (most people don't), then you can skip where it failed by adding -no-exports to the command line.

Also if you have 128GB RAM then Mksquashfs will by default use 32GB for buffers (25%). You can safely reduce that to 8GB (with a possible small lost of performance) using the -mem option.

So adding

-no-exports -mem 8GB

Should hopefully make it not run out of memory.

SvanteRichter · 2024-12-14T12:14:19Z

Thanks, that worked great! If anyone is interested this was the memory curve after applying your suggested flags:

plougher self-assigned this Apr 4, 2023

plougher added the enhancement label Apr 4, 2023

plougher added this to the Undecided milestone Apr 4, 2023

nh2 mentioned this issue Jun 23, 2024

Document memory behaviour and give tips for dealing with many files mhx/dwarfs#226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out-of-memory when mksquashfs'ing 200M files #238

Out-of-memory when mksquashfs'ing 200M files #238

nh2 commented Apr 1, 2023

plougher commented Apr 5, 2023

nh2 commented Apr 8, 2023

SvanteRichter commented Dec 9, 2024 •

edited

Loading

plougher commented Dec 10, 2024 •

edited

Loading

SvanteRichter commented Dec 14, 2024

Out-of-memory when mksquashfs'ing 200M files #238

Out-of-memory when mksquashfs'ing 200M files #238

Comments

nh2 commented Apr 1, 2023

plougher commented Apr 5, 2023

nh2 commented Apr 8, 2023

SvanteRichter commented Dec 9, 2024 • edited Loading

plougher commented Dec 10, 2024 • edited Loading

SvanteRichter commented Dec 14, 2024

SvanteRichter commented Dec 9, 2024 •

edited

Loading

plougher commented Dec 10, 2024 •

edited

Loading