Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robinhood Sync Acceleration for PanFS #108

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

elliswilson
Copy link

This pull request contains changes made this summer by employees at Panasas to provide support akin to what already exists for Lustre to accelerate updates to the Robinhood DB for the Panasas filesystem. Unlike in Lustre, which leverages it's changelog to keep the Robinhood DB up-to-date, in this change we use the PanFS snapshot delta utility to compute differences between successive snapshots, track which was the last snapshot we successfully updated from, and update when a newer snapshot is available using just the computed differences.

In the interest of de-risking existing users of Lustre or just the POSIX version of Robinhood that does a full tree walk, we compile out almost all of our code unless --enable-panfs is passed to configure.

Detailed changes include the following:

  • Adjust configure with new --enable-panfs build flag
  • Add new panfs configuration section to robinhood config file
  • Parse configuration section appropriately via panfs_config.c
  • Adjust code in rbh_misc.c and fs_scan.c to use Panasas object ID's in lieu of inodes to avoid confusion (as snapshots do not have the same inode even if nothing changed). These Panasas object ID's are available via xattrs.
  • In rbh_daemon.c added all of the following:
  • New PanFS-specific CLI arguments
  • All needed functions that can directly modify mysql to reflect the changes we observe between snapshots as reported by pan_snap_delta
  • Functionality to actually identify if new snapshots exist and run pan_snap_delta against that and the last successfully updated snapshot
  • Processing engine to filter and sort the output from pan_snap_delta into a format useful for inputting to the mysql functions to update the robinhood db

Please let me know if there are any issues with the code in this request and I'll be glad to make modifications to alleviate such concerns. Thanks!

Soheil Khadirsharbiyani and others added 2 commits August 23, 2019 18:32
Change-Id: I88c3deb32f977452cbcdb9b0cc3521f1cbc58eaa
Panasas pan_snap_delta update functionality 6

See merge request panasas/robinhood!1
@@ -2174,11 +2183,42 @@ int path2id(const char *path, entry_id_t *id, const struct stat *st)
path, strerror(-rc));
return rc;
}
#ifdef _PANFS
st = &stn;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer you include this large piece of code in a subfonction.

@@ -464,10 +468,12 @@ static int TerminateScan(int scan_complete, time_t end)
g_strfreev(cmd);
}
}

#ifdef _PANFS
signal_scan_finished();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why a scan with PanFS would require a different behavior here.

@@ -795,6 +801,35 @@ static int stat_entry(const char *path, const char *name, int parentfd,
if (parentfd != -1) {
if (fstatat(parentfd, name, inode, AT_SYMLINK_NOFOLLOW) == -1)
return -errno;
#ifdef _PANFS
ssize_t vallentmp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer this code is included to a subfunction here, to avoid overloading the code.

@@ -810,6 +845,37 @@ static int stat_entry(const char *path, const char *name, int parentfd,
if (lstat(path, inode) == -1)
return -errno;

#ifdef _PANFS
else{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subfunction

@@ -1207,6 +1273,36 @@ static int push_dir_list(robinhood_task_t *parent_task)
return rc;
}

#ifdef _PANFS
size_t vallentmp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subfunction

@@ -1246,6 +1342,36 @@ static int process_one_task(robinhood_task_t *p_task,
"Error accessing filesystem: exiting");
Exit(1);
}
#ifdef _PANFS
ssize_t vallentmp;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subfunction

Copy link
Member

@tl-cea tl-cea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
Overall, I would prefer PanFS specific code to be located in specific functions and just have thinks like:
#ifdef _PANFS
rc = PanFS_xxxx();
#endif

rather than large pieces of code included in the main function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants