Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ntuple] update write option defaults #16877

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jblomer
Copy link
Contributor

@jblomer jblomer commented Nov 10, 2024

Consistently use MiB (units based on power of 2).

@jblomer jblomer added this to the 6.34.00 milestone Nov 10, 2024
@jblomer jblomer self-assigned this Nov 10, 2024
Copy link

github-actions bot commented Nov 10, 2024

Test Results

    19 files      19 suites   3d 23h 4m 31s ⏱️
 2 664 tests  2 664 ✅ 0 💤 0 ❌
48 706 runs  48 706 ✅ 0 💤 0 ❌

Results for commit e572958.

♻️ This comment has been updated with latest results.

Consistently use MiB (units based on power of 2).
Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but tutorial-v7-ntuple-ntpl013_staged is timing out on Windows, I would maybe restart the CI run and check that it was just a hiccup.

The default can be changed by the `RNTupleWriteOptions`.
The default should work well in the majority of cases.
In general, larger clusters provide room for more and larger pages and should improve compression ratio and speed.
However, clusters also need to be buffered during write and (partially) during read,
so larger clusters increase the memory footprint.

A second option in `RNTupleWriteOptions` specifies the maximum uncompressed cluster size.
The default is 1 GiB.
The default is 10x the default cluster target size, i.e. ~1.2 GiB.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not

Suggested change
The default is 10x the default cluster target size, i.e. ~1.2 GiB.
The default is 8x the default cluster target size, i.e. 1.0 GiB.

?
Or why the factor 10?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it was before a factor ~10 (100 MB --> 1 GiB)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the reason for choosing 10 something else than 'get to a round number'? If it is just to get to a round number we should switch to 8, if it is something else we should explain it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the largest compression factor we have seen in an experiment EDM so far was MiniAOD with a compression factor >9.

@jblomer jblomer removed this from the 6.34.00 milestone Nov 13, 2024
@jblomer
Copy link
Contributor Author

jblomer commented Nov 13, 2024

Since we have not yet updated measurements confirming that the new settings don't cause a performance degradation with the AGC benchmark, we remove the 6.34 milestone. We should settle for the final default settings before releasing the RNTuple v1 AGC dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants