Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jan 18, 2025: This week(s) in DataFusion #14179

Open
alamb opened this issue Jan 18, 2025 · 9 comments
Open

Jan 18, 2025: This week(s) in DataFusion #14179

alamb opened this issue Jan 18, 2025 · 9 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jan 18, 2025

Introduction

This ticket is my weekly-ish summary of interesting things happening in DataFusion. Note this is not a complete list (it is what I remember / can find). Please leave comments on this ticket about things that I may have missed or you think should get wider attention by the community. Follow on to #13970

Reminder, find new content (and please post some!) to Concepts, Readings, Events page

Community Highlights

Releases!

Performance

DataFusion's core value proposition is great performance without having to re-implement it yourself

Quality

sqlite test suite

Bug Fixes

DataFusion is in the "we are finding all the corner case bugs now" phase of its life and people are now bashing them down

Cleanups 🧹

Now that we have a large useful codebase it is also important to keep it neat and tidy so we spend a non trivial time there too.

Features

Inline documentaton macros

Substrait!

External Sort (aka really large memory) improvements

@2010YOUY01 and @Lordworms are beginning to work on improving out of core sorting. There are several great PRs up and outstanding:

Dev Containers

Also, thanks to

Looking to get more involved? Please help review code! 🎣

DataFusion has a long history of community members contributing in all aspects of the project. Reviewing PRs is an especially great way to get introduced to the project, help the community and grow your own knowledge -- researching and understanding the code enough to review PRs also often inspires additional ideas for improvements.

We have docs about reviews. TLDR is: look for test coverage, if the change is understandable and well documented, and if the code can be improved. When you think the PR looks good to merge, try @ mentioning one of the committers.

Help wanted

  • I would love to see the community offer additional help testing, triaging bugs helping to make DataFusion a more stable foundation for building systems

Please feel leave your own comments on this ticket if you are looking for help

Community

Upcoming meetups:

  • Help schedule some!
@alamb alamb added the enhancement New feature or request label Jan 18, 2025
@alamb alamb pinned this issue Jan 18, 2025
@alamb
Copy link
Contributor Author

alamb commented Jan 21, 2025

@2010YOUY01 became a committer: https://lists.apache.org/thread/b7df0bpzyzzcg6ph50swx7jw0b5dks75 🎉

@alamb
Copy link
Contributor Author

alamb commented Jan 22, 2025

@mbrobbel has a nice proposal to add extension types in arrow-rs (which would potentially help ExtensionTypes in DataFusion). Would apprecaite any feedback:

@alamb
Copy link
Contributor Author

alamb commented Jan 22, 2025

@alamb
Copy link
Contributor Author

alamb commented Jan 23, 2025

@edmondop mentions

@alamb
Copy link
Contributor Author

alamb commented Jan 23, 2025

This is a neat "zero lake" idea (DataFusion via WASM in browser)

https://gh-sparkling-cherry-6975.fly.dev/

Image

@alamb
Copy link
Contributor Author

alamb commented Jan 23, 2025

@mertak-synnada has a PR that nicely refactors the data source code but may be a non trivial downstream API change

@alamb
Copy link
Contributor Author

alamb commented Jan 23, 2025

This is a pretty neat looking dataframe library built on DataFusion:

@alamb
Copy link
Contributor Author

alamb commented Jan 25, 2025

If anyone else is interested in helping build times, @waynexia is starting to organize a project:

@adriangb
Copy link
Contributor

Does this mean that some joins and window clauses now get pushed down to the TableProvider?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants