Add view support to the Rest Catalog #818

ndrluis · 2024-06-14T15:46:44Z

Feature Request / Improvement

Reference: https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml

sungwy · 2024-06-15T22:37:14Z

Thank you for raising this @ndrluis 💯 I will add this as a 0.8.0 milestone for now

shiv-io · 2024-10-20T18:33:48Z

Would love to take a first stab at this @kevinjqliu, could you assign this to me? edit: here's a PR for view_exists: #1242. Thanks!

corleyma · 2024-10-21T23:44:03Z

I am really curious about how Load View should work, given that currently only SQL representations of views are supported and I don't think we have an in-process SQL engine that can convert SQL into an iceberg scan plan (yet/at all?).

@shiv-io did you already have some thoughts there?

ndrluis · 2024-10-22T01:30:47Z

Following what @danielcweeks said in this email, I believe we could discuss and experiment with SQLGlot to create support for other dialects. However, to support load views, we likely need to rely on a query engine. I'm not sure if there is a query engine in the Python ecosystem that would make sense to support, but I feel that we could use Apache DataFusion through the iceberg-rust implementation or the Python bindings.

sungwy · 2024-10-22T12:54:54Z

That's an interesting question @corleyma . The way I see it, PyIceberg is a language library, that tries to remain open to any Python based query engine that wants to make use of its functions to process Iceberg tables. So I think the first step in introducing view support in PyIceberg would be for us to fetch the view representations from the REST Catalog endpoint and serve the view representations to any query engines that want to integrate with it (like Daft).

I agree with @ndrluis though, that it would be cool to leverage projects like DataFusion to improve the way we load, slice and dice the tables in PyIceberg.

corleyma · 2024-10-22T14:51:13Z

I agree with @sungwy that the primary goal of pyiceberg should be to make it possible for query engines to interface with Iceberg tables and views.

Nonetheless, it would be really ideal to have some out of the box way to get a scan of a view (PyArrow Dataset-like is the most ideal, but returning Table/RecordBatchReader like current table scan functionality is a fine endpoint). This is ideal because it provides an easy path for integrating with other things (like polars) that currently support pyiceberg tables, and because it will benefit use of pyiceberg for more operational concerns e.g. being able to easily preview view contents, etc.

I think DataFusion (either via Python bindings or via iceberg-rust) would be a great way to accomplish this goal. Since (I think?) pyiceberg is much further along in implementing the iceberg sdk than iceberg-rust, it would be interesting if it were possible for pyiceberg to use DataFusion directly but I suspect you need some custom rust code no matter what?

shiv-io · 2024-10-22T16:53:04Z

I'm fairly new to the Iceberg ecosystem -- thanks for the insightful discussion, looks like I have some reading to do before I can weigh in.

load_view aside though, I'd love to work on the other view features if contributions towards this issue are being accepted.

corleyma · 2024-10-22T20:08:13Z

@shiv-io It should still be possible to do load_view without supporting any scanning functionality yet, and like @sungwy says, that is likely a necessary precursor for other query engines anyway.

look at how load_table works today: we return a Table model with all the metadata about the table, and this model exposes functionality for data scans, etc. So load_view would start with returning a model with all the metadata about the view (as specified in the spec), and then we can look at trying to add some DataFusion-based scan functionality in subsequent iterations.

kevinjqliu · 2024-10-27T23:34:31Z

look at how load_table works today: we return a Table model with all the metadata about the table, and this model exposes functionality for data scans, etc. So load_view would start with returning a model with all the metadata about the view (as specified in the spec), and then we can look at trying to add some DataFusion-based scan functionality in subsequent iterations.

+1, I think it's a good idea to separate accessing the iceberg views from using them. The ability to read an iceberg view is great for general view operations. Even printing out what the view definition is would be a great feature to have.

Connecting the view with an external engine can be a separate story.

sungwy added this to the PyIceberg 0.8.0 release milestone Jun 15, 2024

kevinjqliu added the good first issue Good for newcomers label Aug 7, 2024

shiv-io mentioned this issue Oct 20, 2024

Add view_exists method to REST Catalog #1242

Open

2 tasks

kevinjqliu modified the milestones: PyIceberg 0.8.0 release, PyIceberg 0.9.0 release Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add view support to the Rest Catalog #818

Add view support to the Rest Catalog #818

ndrluis commented Jun 14, 2024 •

edited by Fokko

Loading

sungwy commented Jun 15, 2024

shiv-io commented Oct 20, 2024 •

edited

Loading

corleyma commented Oct 21, 2024 •

edited

Loading

ndrluis commented Oct 22, 2024

sungwy commented Oct 22, 2024 •

edited

Loading

corleyma commented Oct 22, 2024 •

edited

Loading

shiv-io commented Oct 22, 2024

corleyma commented Oct 22, 2024 •

edited

Loading

kevinjqliu commented Oct 27, 2024

Add view support to the Rest Catalog #818

Add view support to the Rest Catalog #818

Comments

ndrluis commented Jun 14, 2024 • edited by Fokko Loading

Feature Request / Improvement

sungwy commented Jun 15, 2024

shiv-io commented Oct 20, 2024 • edited Loading

corleyma commented Oct 21, 2024 • edited Loading

ndrluis commented Oct 22, 2024

sungwy commented Oct 22, 2024 • edited Loading

corleyma commented Oct 22, 2024 • edited Loading

shiv-io commented Oct 22, 2024

corleyma commented Oct 22, 2024 • edited Loading

kevinjqliu commented Oct 27, 2024

ndrluis commented Jun 14, 2024 •

edited by Fokko

Loading

shiv-io commented Oct 20, 2024 •

edited

Loading

corleyma commented Oct 21, 2024 •

edited

Loading

sungwy commented Oct 22, 2024 •

edited

Loading

corleyma commented Oct 22, 2024 •

edited

Loading

corleyma commented Oct 22, 2024 •

edited

Loading