Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](cloud) Remove pending delete bitmap's lock_id check when commit txn in MS #46841

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Jan 12, 2025

What problem does this PR solve?

Related PR: #46039

Problem Summary:

#46039 add a defensive check when commit_txn in MS to check whether the lock_id of pending delete bitmaps on tablets involved in the txn is the current txn's lock_id. But this may report a false negative in the following circumstance:

  1. heavy schema change begins and add shadow index to table.
  2. txn A load data to base index and shadow index.
  3. txn A write its pending delete bitmaps on MS. This includes tablets of base index and shadow index.
  4. txn A failed to remove its pending delete bitmaps for some reson(e.g. commit_txn() failed due to too large value)
  5. txn B load data to base index and shadow index.
  6. schema change failed for some reason and remove shadow index on table.
  7. txn B send delete bitmap calculation task to BE. Note that this will not involved tablets under shadow index because these tablets have been dropped. So these tablets' pending delete bitmaps will still be txn A's.
  8. txn B commit txn on MS and find that pending delete bitmaps' lock_id on tablets under shadow index not match. And txn B will failed.

We can see that the checks on these dropped tablets are useless so we remove the mandatory check to avoid this false negative and print a warning log instead to help locate problems.

Cases will be added later.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1 bobhan1 marked this pull request as ready for review January 12, 2025 08:40
@bobhan1
Copy link
Contributor Author

bobhan1 commented Jan 12, 2025

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants