Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding archive entry paths #3638

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Prev Previous commit
Next Next commit
updated decompressor file naming
joeleonjr committed Nov 21, 2024
commit 6c1feea1ef5e7f5dab8f49f0fbe0b80c518fff37
5 changes: 3 additions & 2 deletions pkg/handlers/archive.go
Original file line number Diff line number Diff line change
@@ -153,8 +153,9 @@ func (h *archiveHandler) openArchive(
}
defer rdr.Close()

// Note: We're limited in our ability to add file names to the archiveEntryPath here, as the decompressor doesn't have access to a fileName value. Instead, we're adding a generic string to indicate that the file is decompressed. This could be improved.
return h.openArchive(ctx, append(archiveEntryPaths, "(decompressed "+reader.format.Name()+" file)"), rdr, dataOrErrChan)
// Note: We're limited in our ability to add file names to the archiveEntryPath here, as the decompressor doesn't have access to a fileName value.
// We add a empty string so we can keep track of the archive depth.
return h.openArchive(ctx, append(archiveEntryPaths, ""), rdr, dataOrErrChan)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever end up stringifying the slice of path parts, an empty string isn't going to maximally visually obvious as another level of depth, compared to like a ? or something. (Compare: some/path///file.txt to e.g. some/path/?/?/file.txt) What do you think of using a non-empty marker like ? instead?

Copy link
Contributor Author

@joeleonjr joeleonjr Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 36bdef0 fixes an oversight from an earlier version and is relevant to this discussion.

The main change is using file.NameInArchive instead of file.Name(). The difference is file.NameInArchive contains the relative path inside the archive that is being extracted. This does a few things:

  1. Previously, I mistakenly treated all files during an extraction operation as if they were in a flat directory and didn't preserve the actual directory structure. I didn't realize the file.Name() operation just grabbed the filename, not the path. That's fixed and we now have complete relative file paths in all supported archives.
  2. This change makes it abundantly clear how a user would navigate an archive to get to the actual file. For example: if you have an archive file containing another archive named archive.tar.gz with a file named secret.txt, it would return this in the archive entry path archive.tar.gz/archive/secret.txt. When you manually double-click to unarchive archive.tar.gz it tosses all of the contents into a new folder named archive and then you'd see secret.txt. It's super clean and clear. This method still uses the "" to track depth during decompression, but in the filepath.Join() operation, empty strings are ignored, which is exactly what we want to reconstruct the relative file path. If we change to add ? or anything else, we'd need to write a custom function to strip those out during the filepath.Join(). Since we're able to keep accurate track of depth and relative file path, I'd suggest just leaving as is.
  3. One other note: I only updated the error logs to include file.NameInArchive b/c I didn't want to bloat our non-error logs with the entire file path. This means the log at the very beginning of the extractorHandler function only has file.Name(). Not sure if that was the correct call or not.

case archiver.Extractor:
err := archive.Extract(ctx, reader, nil, h.extractorHandler(archiveEntryPaths, dataOrErrChan))
if err != nil {