Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RDF] Add option to change default basket size in RDataFrame Snapshot #17579

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Aditya-138-12
Copy link

This Pull request:

Introduces a new option in RSnapshotOptions.hxx to allow users to configure the default basket size of new branches when using RDataFrame::Snapshot. Modify SetBranchesHelper in ActionHelpers.hxx to honor this option.

Changes or fixes:

  1. Add a fBasketSize field to RSnapshotOptions to store the user-defined basket size.
  2. Update SetBranchesHelper to apply the specified basket size when creating output.
  3. Ensure backward compatibility by maintaining the current default behavior if no custom size is set.
    :)

Checklist:

  • [✅] tested changes locally
  • [❌] updated the docs (if necessary)

This PR fixes #17418

Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @Aditya-138-12 for this contribution! I feel like we're already going in the right direction. I have a few introductory inline comments. Together with those, I would highly apppreciate if you could add a test for the new feature, checking that the output TTree after a Snapshot has the requested basket size. You can take inspiration from some of the tests already written in e.g. dataframe_snapshot.cxx.

tree/dataframe/inc/ROOT/RSnapshotOptions.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RSnapshotOptions.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx Outdated Show resolved Hide resolved
…f int, 2. Passing only basketSize in the SetBranchesHelper function instead of Complete options object, 3. Added some more inline comments.
tree/dataframe/inc/ROOT/RSnapshotOptions.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RSnapshotOptions.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx Outdated Show resolved Hide resolved
tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx Outdated Show resolved Hide resolved
Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt interaction! We are still missing:

  • Adapting the other overload of SetBranchesHelper
  • Some tests

@Aditya-138-12
Copy link
Author

Thanks for the prompt interaction! We are still missing:

  • Adapting the other overload of SetBranchesHelper
  • Some tests

Thank you so much for your guidance! I really appreciate your time and feedback. I’m currently working on the tests, but I wanted to make sure I fully understand your point about 'adapting the other overload of SetBranchesHelper.' Could you kindly clarify what exactly needs to be adapted? I’d be grateful for any additional details!

@vepadulano
Copy link
Member

Could you kindly clarify what exactly needs to be adapted?

There is another function template of SetBranchesHelper starting at line 1336 of ActionHelpers.hxx. Currently in this PR you are only modifying the overload responsible for writing out branches storing collections instead, which is the one starting at 1399.

@@ -45,6 +47,7 @@ struct RSnapshotOptions {
bool fLazy = false; ///< Do not start the event loop when Snapshot is called
bool fOverwriteIfExists = false; ///< If fMode is "UPDATE", overwrite object in output file if it already exists
bool fVector2RVec = true; ///< If set to true will convert std::vector columns to RVec when saving to disk
std::optional<int> fBasketSize {}; /// Set a custom basket size option. For more details, see https://root.cern/manual/trees/#baskets-clusters-and-the-tree-header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::optional<int> fBasketSize {}; /// Set a custom basket size option. For more details, see https://root.cern/manual/trees/#baskets-clusters-and-the-tree-header
std::optional<int> fBasketSize {}; ///< Set a custom basket size option. For more details, see https://root.cern/manual/trees/#baskets-clusters-and-the-tree-header

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to change default basket size in RDataFrame Snapshot
3 participants