Tools for verifying files with hash-list manifests.
To verify that my backups restore without errors, I was using this classic Bash one-liner:
find . -type f -exec md5sum {} + | LC_ALL=C sort -k2
Eventually the limitations of this approach became burdensome, so I created a one-to-one python implementation. When run without arguments, the command:
md5ls create
produces output identical to the bash command (on all tested systems).
The project has grown to include many quality-of-life improvements and additional features.
If all you need is the behavior of the original Bash command, here are some quality-of-life features that make this Python version worth using:
Use multithreading to greatly improve performance:
md5ls create -j 8
where 8 can be replaced with the number of CPU cores available to you. Just 6 threads can produce a 10x performance improvement over the bash version in my testing.
The Bash version requires that you cd
into the directory you are generating
a manifest for, since find .
must use the current working directory as its
root to get consistent relative filepaths in the output. The -r
option
allows you to run the command from anywhere by specifying the directory:
md5ls create -r /path/dir/folder/
Use the -o flag to generate a consistent manifest on all* systems:
md5ls create -o /folder/file.out
The file will always have unix-style line endings, and use unix-style folder
separators, even when run on Windows. This allows easy diff
comparisons
between two manifests, even between different platforms.
*tested on Windows 10 & 11, Ubuntu 22.04
You can use a basic diff
command to compare two manifests:
diff file1.out file2.out
but the output can be hard to read, especially if there are many differences.
Instead, I've created md5ls diff
to produce more human-readable output.
By default, the output adds headings and sorts changes into sections:
md5ls diff file1.out file2.out
Generate only a summary of changes, without the full list of lines with -s
:
md5ls diff file1.out file2.out -s
Navigate to a good temporary directory of your choice:
cd ~
Clone this repository.
git clone https://github.com/slbelden/md5ls.py.git
Change directory into the folder git just downloaded:
cd md5ls.py
Install (substitute pipx for pip if your system complains):
pip install .
Manage your PATH on your own, good luck, then use:
md5ls
Get basic usage help with -h
:
md5ls -h
and subcommand help in the same manner:
md5ls create -h