-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support to append delete type data file #798
base: main
Are you sure you want to change the base?
Conversation
Sorry, I think I have some misunderstanding here since the action which support to append data file and delete file(position delete file, equality delete file) is RowDelta.🤔 So I guess what we need is right:
|
I noticed that MergingSnapshotProducer work like the MergeAppend, it will merge manifest when commit. Is that necessary for RowDelta?🤔 Can we implement RowDelta works like FastAppend but it can append data file and delete type file first here? I trying a demo using this way and it can query using spark. @Fokko |
Thanks @ZENOTME for this pr! Adding support for deletion is a complicated process, which involves conflict detection, commit retries. I would suggest to have sth like you mention built first like ManifestMergerManager, ManifestFilterManager, MergingSnapshot appender first. |
I want to make sure whether fast append, merge append, append data file, append delete file is orthogonal. For now, in iceberg-python and iceberg-java:
Is it possible to have action like combing fast append and append delete file and data file? In fact, I try this in practice and it can be read by spark. And for this action, I think we don't need the ManifestMergerManager.🤔 |
The reason we can have a fast append action in iceberg is that append only data file always have no conflict with other actions. Remember that iceberg supports concurrent write, so for update/deletion actions, we need to detect conflction. It works in simple case doesn't mean it's right. |
Do you mean that for appending only the data file, we can always retry by using the original data file, but for delete data file, the related data may be deleted already, so we should use something like ManifestMergerManager to merge and discard the useless delete file. I think for now we have conflicting detect using uuid, but we don't have the retry.
|
This PR support to support to append delete type data file