-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need more checks in RDataFrame #17485
Comments
Dear @acampove , Thank you for reaching out! Regarding part 1 of your issue, the check has already been put in place for ROOT 6.34, see root/tree/dataframe/src/RLoopManager.cxx Lines 1236 to 1237 in 5358492
But for some reason I see there is something wrong on the Python side, I can reproduce a segfault on my machine. Need to investigate. Regarding part 2 of the issue, I can't understand the problem. The fact both files have different number of entries should not matter. Users of RDataFrame process billions of entries scattered across many files and there is nowhere the requirement that every file has the same amount of entries. Could you give a more detailed, self-contained reproducer? |
Hi @vepadulano Thanks for your reply. Regarding part 2, there seems to be a misunderstanding. The problem is not that the files have different number of entries. The problem is that the files can have different branches, i.e. the first one can have 20 branches and the second one can have 22. Regarding the first, issue. There must be a test both for the C++ and python part, every time you implement a feature, like the check you mentioned above. It cannot be, that the first time this breaks is when a user like me runs the code. This has to be tested so that we never see it happening. Cheers. |
Dear @acampove , Thank you for your reply.
I understand, but I still do not see how the failure could happen. In principle, if you're using say column
Thank you for your input, I completely agree the experience should be equivalent irrespective of how the user approaches the ROOT interfaces. The project uses powerful dynamic C++-Python bindings, so this specific type of error handling is usually transparent, but things might go wrong every once in a while. |
I was able to slim down part 1 of this issue into a simple reproducer that does not rely on RDataFrame. You can track the status at the linked issue. |
Dear @acampove , The underlying Python issue has been fixed, and #17581 is introducing a specific test for the situation you reported in part 1 of this issue. If you can, I would greatly appreciate a reproducer for part 2 of your issue so that I can make sure your requirements are completely satisfied. |
Explain what you would like to see improved and how.
Hi,
I see a crash. That should never happen. I expect a message telling me that the list of files is empty.
and I try to save the dataframe with Snapshot, I see a failure. It seems that in my particular case, both files have different number of entries. It took me 3 hours to figure this out. You need a preliminary check for cases like this. Please implement it.
ROOT version
6.34
Installation method
mamba
Operating system
alma9
Additional context
No response
The text was updated successfully, but these errors were encountered: