-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash when running --blob-callback on blobs larger than ~600,000,000 bytes #616
Comments
I have tried running git-filter-repo with |
The error in your file is
This message comes from the following git code:
and hashfile_truncate() in particular is this code:
It's not clear whether it's ftruncate() or lseek() that is returning an error in your case, but the fact that you have files measuring 600 MB or more, suggests you might be getting close to either a 2GB or 4GB limit. If that's true, there's a possibility that switching platforms might help (due to differences in sizes of long in C on different platforms). Or, maybe there's some code somewhere that is using int/long/unsigned/unsigned long instead of off_t and size_t. But I don't have an easy way to reproduce. Could you report what platform you are on, and try a few other OSes? If that still fails, could you try to find a way to reproduce that others can duplicate? |
The error occurs on 64 bit Windows with git version 2.47.1.windows.1 and git-filter-repo version a40bce548d2c. import subprocess, shutil, sys
shutil.rmtree(".git", ignore_errors=True)
subprocess.run(["git", "init"])
for i in range(5):
with open("small", "wb") as out:
out.write(bytes([i]))
out.truncate(1024)
with open("large", "wb") as out:
out.write(bytes([i]))
out.truncate(650 * 1024 * 1024)
subprocess.run(["git", "add", "small", "large"])
subprocess.run(["git", "commit", "-m", "commit %d" % i])
subprocess.run([
sys.argv[1],
"--force",
"--blob-callback", """
if len(blob.data) == 1024:
blob.data = b"Test" + blob.data[4:]
"""]) |
I tried my script in WSL and the error did not occur. I was then able to filter my repo on WSL without issues. Thanks for the suggestion. |
Hello, I'm trying to convert certain files in my repository from one format to another. I wrote some python code to accomplish this and am passing it to git-filter-repo's --blob-callback argument.
This seems to be working for a few thousand commits, then fast-import crashes with the message
fatal: cannot truncate pack to skip duplicate: Invalid argument
and writes a file fast_import_crash.I've tried this multiple times, with different callbacks, with the repository filtered to different paths, filtering by path first, then applying my blob callback on a second run of git-filter-repo. The exact blob that it stops at differs between runs, but it always crashes at a blob that is much larger than the other ones. Above 600,000,000 bytes. Perhaps this is a known or intentional limitation of git fast-import or git-filter-repo.
I have attached one such fast_import_crash file, but I have removed file and branch names, as this is a company repository.
fast_import_crash_30556.zip
The text was updated successfully, but these errors were encountered: