-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast array extraction #7227
Open
alex-hh
wants to merge
17
commits into
huggingface:main
Choose a base branch
from
alex-hh:fast-array-extraction
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+208
−36
Open
fast array extraction #7227
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
a181015
fast array extraction
alex-hh 426178e
add array 1d feature
alex-hh 303c4e2
fast struct extraction by invoking extension type to_pylist
alex-hh 0be0895
also use to_pylist for list array
alex-hh deee87e
improve struct extraction
alex-hh ac5a46d
handle structs and lists of arrays
alex-hh a89ef52
restore arrow array to numpy to numpy extractor
alex-hh 7f1e217
fix failing array tests
alex-hh abbb59a
test cast array xd to features fix
alex-hh c39c4bc
test array write
alex-hh 67f65b5
formatting
alex-hh 97f0f19
fix a couple more test cases
alex-hh 0f37d05
fix writing struct arrays
alex-hh 7f8c00c
handle null rows in struct array
alex-hh 0d80abc
handle field name inference
alex-hh 0c01621
formatting
alex-hh 8fc6cc6
revert arrow dataset change
alex-hh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also check if it's a
list
orlarge_list
typeThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that lists of ArrayExtensionType features will call ArrayExtensionArray.to_pylist(), which didn't seem to be the case for struct, and is the main performance issue there
Not sure about large list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool ! maybe also check list of struct of ArrayExtensionType but no big deal, we can fix that rare case later (large list is also rare)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the list of struct case might require an ArrayExtensionScalar or something with an as_py method that returns a numpy object.
Seems like it could be useful but have no idea whether this is possible or how best to do it if so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless you know how to do this could we leave as issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just add a TODO comment about it for now ?