Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing a read-only projection of the conserve archive for Windows #240

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

WolverinDEV
Copy link
Contributor

@WolverinDEV WolverinDEV commented Dec 29, 2023

Hey,
concluding my latest comment on the discussions #237 I open this PR.

Motivation

Right now it is not possible to view files which have been backupped with conserve without restoring a partial or complete versions of that particular backup. Fully restoring a specific version is mandatory in case of a complete data loss (e.g. corrupted hard drive) but inconvenient when only a few files need to be restored especially if these files are contained in different subtrees. Therefore having a feature which allows the user to easily access, view and copy only specific folders or files for specific versions is required.

Implementation

As discussed in the discussion linked above, Windows does ship an optional feature called Windows Projected File System (ProjFS).

The Windows Projected File System (ProjFS) allows a user-mode application called a "provider" to project hierarchical data from a backing data store into the file system, making it appear as files and directories in the file system.

Leveraging the ProjFS we're able to give the user a visual (and explorable) representation of all folder and files contained within the conserve backup. Leveraging the possibility of a virtual file system, the user is also able to easily view different versions of the backupped file system.

In the past few days I've implemented a rust wrapper around the Windows ProjFS API.
More details can be found here: https://github.com/WolverinDEV/windows-projfs

Pending Tasks

This PR is still in draft as the following tasks need to be completed before this feature can be shipped:

  • Improve band entry iteration for directory listings
  • Implement correct file metadata & attributes
  • Write tests for mounting & utilities

@@ -64,6 +64,9 @@ indoc = "2.0"
uzers = "0.11"
nix = { version = "0.27", features = ["fs", "user"] }

[target.'cfg(windows)'.dependencies]
windows-projfs = { version = "0.1.6", features = ["dynamic-import"] }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this pull in a lot of stuff? Maybe it should be optional even on Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "pull in a lot of stuff"?
Do you mean memory wise, dependency wise, file wise, etc?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant mostly a lot of new dependencies and therefore build time; hopefully it's not having any runtime effect unless it's used. Perhaps it's fine by default.

/// Make an iterator that returns hunks of entries from this index.
pub fn iter_available_hunks(self) -> IndexHunkIter {
let _span = debug_span!("iter_hunks", ?self.transport).entered();
let hunks = self.hunks_available().expect("hunks available"); // TODO: Don't panic
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make this return a Result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment & behaviour originates from the previous code, written by you :)
https://github.com/sourcefrog/conserve/blob/main/src/index.rs#L310

BTW, thanks for the review.

}

/// Make an iterator that returns hunks of entries from this index.
pub fn iter_available_hunks(self) -> IndexHunkIter {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the change from iter_hunks to iter_available_hunks really semantic? They seem fairly similar..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh this is actually a relict of the previous versions where I needed to pass a custom hunk iter to the iter_hunks method. Even tough, this is not functionally required any more I think the overall API semantics of IndexRead with iter_hunks and iter_available_hunks are improved.

src/mount.rs Outdated
sync::{Arc, Mutex},
time::Duration,
};
use tracing::{error, info, trace};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please order imports as paragraphs of

  1. std
  2. dependencies
  3. this crate

src/mount.rs Outdated
};
}

struct VoidMonitor;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this under src/monitor/: it doesn't seem specific to projfs.

src/mount.rs Outdated
Apath, Archive, BandId, BandSelectionPolicy, IndexEntry, IndexRead, Kind, Result, StoredTree,
};

macro_rules! static_dir {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is (currently) used only once so perhaps it's no worth having as a macro? It could just be a function, or even just inline?

src/mount.rs Outdated
.map(move |hunk_index| {
let mut index = index.duplicate();
let entries = index.read_hunk(hunk_index)?;
let meta_info = if let Some(entries) = entries {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to read and then discard every hunk, and I wonder if it would be better to cache them?

If the tree is very large that might use a lot of memory, but it seems like you at least might as well put them in an LRU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching the hunk contents for the current implementation of the HunkHelper itself does not make sense, as it never needs any of the hunk contents.
The moment we actually access the hunk contents later on via load_hunk_contents the results are getting cached.

Another logical issue with using an cache at this stage is, that the HunkHelper loads every hunk once, therefore we do not gain any knowledge about which hunk is most likely be loaded again.

src/mount.rs Outdated
}
});

let hunk_index = match hunk_index {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this could do with a comment about the interpretation of Err from binary_search

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually a relict where I only compared the start path and not the start and end path, therefore never encountering an actual match.

/// This has several side effects:
/// - Depending on the implementation of the decompressor, duplicate might not be a cheap option.
/// - Every read index has its own unique read stats, therefore the clone does not inherit the read stats.
pub(crate) fn duplicate(&self) -> Self {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the decompressor would ever be expensive: it's basically a buffer?

Maybe this could just be impl Clone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that the argument of not using Clone based complexity of cloning the decoder in itself wouldn't be enough. But we got two side effects here, which I've mentioned in the doc-comment in the code above seem severe enough that I would argue it's not actually creating a clone.

src/mount.rs Outdated
}

impl HunkHelper {
pub fn from_index(index: &IndexRead) -> Result<Self> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conserve handles the edge case where an index is incomplete due to an interrupted backup, by resuming at the same apath in the previous index. This can happen repeatedly if the previous index is also incomplete, until we either find a complete index or run out of indexes. As a result, just reading the index hunks for a single index is not enough to read incomplete backups correctly.

This is handled today by stitch.rs so this code should probably either build on or merge into that.

The way this is handled in the API today is admittedly not really ideal... but I do feel like this is important for the goal of recovering data even from interrupted or damaged backups...

(This is probably the most important non-stylistic comment in this review.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was aware of that, and it's a good catch!

As you have the ability to select which backup you want to inspect, I argue that you should only see files which have been processed therefore falling back to the next best band would be a violation of this expectation.
Under that premiss, unfinished backups shall only be partial.

Let me know of your thoughts :)

@sourcefrog
Copy link
Owner

sourcefrog commented Jan 1, 2024 via email

@WolverinDEV
Copy link
Contributor Author

I get what you mean.
windows-projfs does only depend on 5 dependencies (log, parking_lot, thiserror, windows, and libloading).
As far as I know are none of them known for causing build breakage.
The windows crate is statically generated before being deployed to crate.io therefore no hidden build requirements.

The only change is I use functionality of the latest Rust stable release, hence the CI pipeline edit to 1.75.

@sourcefrog
Copy link
Owner

I'll come back and go line by line, but just as a general comment I would be delighted to merge this, but I would want to have some more tests, including hopefully an integration test that mounts an archive on Windows and checks that you can read from it.

@WolverinDEV
Copy link
Contributor Author

I'll come back and go line by line, but just as a general comment I would be delighted to merge this, but I would want to have some more tests, including hopefully an integration test that mounts an archive on Windows and checks that you can read from it.

Hey,
yes including tests is one of the bullet points which is missing before draft completion.
Personally I've already added tests to windows-projfs but as the exam phase approaches, I got less and less time...

@WolverinDEV
Copy link
Contributor Author

Hey ho,
I've successfully implemented tests :)
You're free to review now and may merge.

@WolverinDEV WolverinDEV requested a review from sourcefrog August 1, 2024 11:37
@sourcefrog sourcefrog marked this pull request as ready for review August 7, 2024 20:25
@sourcefrog
Copy link
Owner

Ack, I'll look.

@sourcefrog
Copy link
Owner

Thanks for adding tests! It does look like they are currently failing in CI: https://github.com/sourcefrog/conserve/actions/runs/10197526482/job/28210539695?pr=240

@WolverinDEV
Copy link
Contributor Author

Thanks for adding tests! It does look like they are currently failing in CI: https://github.com/sourcefrog/conserve/actions/runs/10197526482/job/28210539695?pr=240

Yes, I had the same issue with the projfs tests.
I narrowed the issue down to a resource limitation.
The tests work flawlessly when reducing the number of Rust test threads (e.g., https://github.com/WolverinDEV/windows-projfs/blob/master/.github/workflows/rust.yml#L54).

I'm not quite sure what implications this might have for all the other conserve tests.
Maybe execute them separately?

Copy link
Owner

@sourcefrog sourcefrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably a good idea to merge. I am hesitating a bit because I don't regularly use Windows and so I won't give it any ad-hoc manual testing myself. However, I can see how it would be a useful feature there, and since there are at least some CI tests there is some protection against any major breakage.

This PR is also pointing to how the core APIs could be refactored to support caching or faster random access, but I wouldn't want to make this PR enormous to do all of that in here.

@@ -64,6 +64,9 @@ indoc = "2.0"
uzers = "0.11"
nix = { version = "0.27", features = ["fs", "user"] }

[target.'cfg(windows)'.dependencies]
windows-projfs = { version = "0.1.6", features = ["dynamic-import"] }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant mostly a lot of new dependencies and therefore build time; hopefully it's not having any runtime effect unless it's used. Perhaps it's fine by default.

src/bin/conserve.rs Show resolved Hide resolved
src/hunk_index.rs Show resolved Hide resolved
src/mount/projfs.rs Show resolved Hide resolved
@sourcefrog
Copy link
Owner

It looks like projfs adds minimal dependencies so I don't think there's any need to make it optional at build time.

I've changed the option name and help to be IMO easier to understand and to be clear this is probably projfs-specific behavior.

I wonder how the caching behavior will interact with the contents of latest changing over time. I'll look.

I tried this out a bit on Windows. It seems like if I run it without cleanup, exit, and then run it again, I get an error that failed to mark projection root: The object manager encountered a reparse point while retrieving an object (0x8007112B). I wonder if this means we should always clean up, or perhaps there's some way to take over ownership of an existing reparse point.

@sourcefrog
Copy link
Owner

sourcefrog commented Oct 23, 2024

Possibly the error on restarting is connected to this:

The provider only needs to create the virtualization root once for each virtualization instance. Once a root has been created, its associated instance can be repeatedly started and stopped without recreating the root.

I'm not sure though if the windows-projfs Rust wrapper exposes that functionality yet.

@sourcefrog
Copy link
Owner

Oh, I just remembered this is your crate.

From the docs, but not yet experimenting, maybe it needs a way to resume virtualizing by calling PrjMarkDirectoryAsPlaceholder on the first run, and then just PrjStartVirtualizing on later runs. Possibly that could just be automatic if the first one calls with ERROR_REPARSE_POINT_ENCOUNTERED?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants