Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DF] Extend parsing capabilities of CSV data source #15045

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jblomer
Copy link
Contributor

@jblomer jblomer commented Mar 25, 2024

This Pull request:

Adds more advanced options to parse CSV files:

  • Left/right trimming
  • Skipping of a given number of header/footer lines
  • Comment character to skip lines / line remainders
  • Impose column names

It brings the CSV datasource closer to the Pandas CSV parsing capabilities and should avoid in many cases the need for users to implement custom text file parsing code.

Checklist:

  • tested changes locally
  • add tutorial
  • add pythonization for FromCSV that uses named arguments for all the new options in RCsvDS::ROptions
  • update release notes
  • Update RCsvDS class documentation

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

Copy link

github-actions bot commented Mar 25, 2024

Test Results

    18 files      18 suites   4d 0h 2m 50s ⏱️
 2 666 tests  2 666 ✅ 0 💤 0 ❌
46 176 runs  46 176 ✅ 0 💤 0 ❌

Results for commit 7d0c948.

♻️ This comment has been updated with latest results.

A new ROptions struct bundles all the different user-options for parsing
the CSV file.
Newly supported options:
  - Left/right trimming
  - Skipping of a given number of header/footer lines
  - Comment character to skip lines / line remainders
  - Impose column names

Brings the CSV datasource closer to the Pandas dataframe capabilities.
@phsft-bot
Copy link
Collaborator

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default
How to customize builds

@phsft-bot
Copy link
Collaborator

Build failed on windows10/default.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Failing tests:

@dpiparo
Copy link
Member

dpiparo commented Apr 3, 2024

This PR looks great. @jblomer would you prefer I wait for all the items in the checklist to be addressed to review it or should I proceed?

@jblomer
Copy link
Contributor Author

jblomer commented Apr 9, 2024

Thank you, @dpiparo. Let me address the PyROOT issue first, which I haven't understood yet.

@dpiparo dpiparo closed this Nov 12, 2024
@dpiparo dpiparo reopened this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants