Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for MergeAppend #736

Open
Fokko opened this issue Nov 28, 2024 · 1 comment
Open

Support for MergeAppend #736

Fokko opened this issue Nov 28, 2024 · 1 comment
Assignees

Comments

@Fokko
Copy link
Contributor

Fokko commented Nov 28, 2024

The so-called fast-appends are added in #349

It would be good to also consider adding merge-commits.

With the fast-append, a new manifest is written out and added to the manifest-list as mentioned in the spec. As the name suggests, this is the fastest way of appending new data, minimizing the chance of conflicts. Also, it works pretty well in the case of a commit, since only the manifest has to be rewritten in case of a conflict. The biggest drawback is that you create many manifests adding overhead in the long run (more calls to the object store than needed).

The merge-commit takes an existing manifest, adds the new entries to it, and replaces the old manifest in the manifest-list.

Having too few manifests is not good because it will lead to limited parallelization, but too many will add much overhead in terms of networking and parsing. The thresholds can be configured through configuration, and have some reasonable defaults:

image

The goal of this issue is to add MergeAppendAction next to FastAppendAction. This is not a trivial task since there are some caveats:

  • Each manifest is bound to a certain partition strategy, meaning that the partition-spec-id is stored in the Avro header, and they should be all the same.
  • When rewriting the existing manifests, the ADDED status must be changed to EXISTING, and the sequence numbers must be tracked correctly.
@Fokko Fokko mentioned this issue Nov 28, 2024
28 tasks
@ZENOTME
Copy link
Contributor

ZENOTME commented Nov 28, 2024

I'm working on the merge append. And I find that the manifest writer needs to be refined first in #738 so that the code can be more clean.

@Fokko Fokko changed the title Support for merge-commits Support for MergeAppend Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants