Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track the maximum partition size in dataset types #145

Open
facundominguez opened this issue Sep 11, 2018 · 0 comments
Open

Track the maximum partition size in dataset types #145

facundominguez opened this issue Sep 11, 2018 · 0 comments

Comments

@facundominguez
Copy link
Member

facundominguez commented Sep 11, 2018

Spark programs scale as long as the partition sizes of the inputs are bounded. This issue is to explore whether it would be possible to track in the type of datasets which is the maximum partition size of its partitions. This way, the type of an algorithms could ensure that the algorithm doesn't grow the partitions or doesn't grow them beyond some constant factor of the partitions sizes of the input.

Could also be a nice application for liquid haskell.

@facundominguez facundominguez changed the title Track the maximum partition size in dataset sizes Track the maximum partition size in dataset types Sep 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant