Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for in-place restores. #1094

Open
wants to merge 15 commits into
base: 4.x
Choose a base branch
from
Open

Conversation

mattl-netflix
Copy link
Contributor

6369b6a 2024-05-02 | Cease stopping Cassandra in the restore process and reveal the ability to restore to a staging area and then import or move the data depending on whether Cassandra is running.
4ece8d3 2024-05-01 | Move getBackupDirectories to BackupRestoreUtil and change its signature to cease requiring an IConfiguration but rather just take the data directory path as a String.
6b3d45a 2024-04-10 | Cease wiping the data directory in advance of restore. It is counter-intuitive and will be empty in practice in most cases anyway.
f10289f 2024-04-10 | Consolidate restore status API.
4d6d07f 2024-03-09 | Remove backup racs as that configuration is not used in practice and is not relevant to whether a restore should be attempted.
aa3dffe 2024-03-09 | Remove redundant vertical whitespace.
8c23800 2024-03-09 | Remove redundant logging.
646973d 2024-03-09 | Remove all non-javadoc comments that don't provide warnings to future maintainers.
8d18f51 2024-03-09 | Remove waitForCompletion parameter to download method which is always false in practice. Remove final modifier on static method. Make static property an instance variable instead. Use parameterized constructor call in place of two lines.
ce6c7c1 2024-03-09 | Remove unused code from our restore tooling.

…hs when fetching incrementals. The iterators are always fully materialized. Also, remove the now-redundant method from BackupRestoreUtil that merely wrapped the MetaProxy call.
… false in practice.

Remove final modifier on static method.
Make static property an instance variable instead.
Use parameterized constructor call in place of two lines.
…is not relevant to whether a restore should be attempted.
…intuitive and will be empty in practice in most cases anyway.
…re to cease requiring an IConfiguration but rather just take the data directory path as a String.
…y to restore to a staging area and then import or move the data depending on whether Cassandra is running.
@mattl-netflix mattl-netflix force-pushed the feature/in_place_restore_4 branch from 6369b6a to 9557275 Compare May 5, 2024 20:11
String restoreFileName =
PATH_JOINER.join(dataDir, keyspace, columnFamily, indexDir, fileName);
return_ = new File(restoreFileName);
return_ =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are in here, consider renaming this variable, its not idiomatic java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

String dataDirectory, String monitoringFolder) throws IOException {
ImmutableSet.Builder<Path> backupPaths = ImmutableSet.builder();
Path dataPath = Paths.get(dataDirectory);
if (Files.exists(dataPath) && Files.isDirectory(dataPath))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Files.newDirectoryStream(keyspaceDirPath, Files::isDirectory)) {
for (Path columnfamilyDirPath : keyspaceStream) {
Path backupDirPath =
Paths.get(columnfamilyDirPath.toString(), monitoringFolder);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can use columnFamilyDirPath.resolve(monitoringFolder) here instead of converting back and forth between string and path. Reference: https://docs.oracle.com/javase/8/docs/api/index.html?java/nio/file/Path.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (Path columnfamilyDirPath : keyspaceStream) {
Path backupDirPath =
Paths.get(columnfamilyDirPath.toString(), monitoringFolder);
if (Files.exists(backupDirPath) && Files.isDirectory(backupDirPath)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here that the File.exists call is redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (instanceState.getRestoreStatus() != null
&& instanceState.getRestoreStatus().getStatus() != null
&& instanceState.getRestoreStatus().getStatus() == Status.STARTED) {
if (instanceState.isRestoring()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this cleanup 🙌

failedImports.addAll(importData(keyspace, table, tableDir.toString()));
}
} else {
recursiveMove(Paths.get(srcDir), Paths.get(configuration.getDataFileLocation()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the move is non-atomic I can see a weird edge case where we start Cassandra while the move is happening (and we have like 10,000+ files). Should we have some sort of "fence" that tells the subsystem that starts Cassandra it needs to wait while this happens?

if (Files.isRegularFile(path)) {
Files.move(path, destination.resolve(path.getFileName()));
} else if (Files.isDirectory(path)) {
recursiveMove(path, destination.resolve(path.getFileName()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If am reading this right, if the source path we are currently iterating over is a directory we recurse by moving the files in that directory into [destination/directory name]. Cant we just move the entire directory instead of recursing then?

} else if (Files.isDirectory(path)) {
recursiveMove(path, destination.resolve(path.getFileName()));
} else {
throw new IOException("Failed determining type of inode is " + path);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this error or just skip/log? Likely an edge case but do we really want to fail if thats the case?


// Cleanup local data
File dataDir = new File(config.getDataFileLocation());
if (dataDir.exists() && dataDir.isDirectory()) FileUtils.cleanDirectory(dataDir);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about this: I know we don't need to stop cass anymore because we use import but I was surprised to see we no longer clean the directory. Is this for the case where we want to restore a subset of the data but keep whats there? Do we still need the option to clean the directory for some restores?

"config.doesCassandraStartManually() is set to True, hence Cassandra needs to be started manually ...");
List<String> failedImports = cassOps.importAll(config.getRestoreDataLocation());
if (!failedImports.isEmpty()) {
instanceState.endRestore(Status.FAILED, LocalDateTime.now());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to use the list of failed imports somewhere to log what failed and be able to resume? instead of starting over from the beginning?

Also if we only need the boolean why return the list? I would rather us use the list but if we are not going to I mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants