Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended balance_root to handle root underflow case #621

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

krishvishal
Copy link
Contributor

Balance root now handles root underflow case:

  • Root becomes underflow when its empty. Its a special case.
  • We try and copy root's child into root.
  • If successful we delete the child and update pointers.
  • If unsuccessful because the child can't fit inside the root, we early return and handle the child overflow in balance_leaf

@krishvishal krishvishal marked this pull request as ready for review January 5, 2025 14:38
… trying to convert overflow pages to PageType variant

I have changed places which use `page_type` function to use `maybe_page_type`
@krishvishal
Copy link
Contributor Author

I'm not sure why the test is failing only on windows. I have fixed the same error on my ubuntu machine. This is the only check that's failing.

// In SQLite, we want to leave some space for future cell pointers
// Each cell pointer takes 2 bytes
const CELL_POINTER_SIZE: usize = 2;
const ESTIMATED_FUTURE_CELLS: usize = 4; // Leave room for a few more cells
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this come from somewhere, e.g. sqlite source code? If so, good to leave a link as comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this function to keep this PR close handling just the root and for checking root overflow, we just need to check if there the page contains any overflow cells. SQLite here: https://github.com/sqlite/sqlite/blob/c2e400af042bdd7d21e159a41fcf34c05398044c/src/btree.c#L9059-L9060

}
}

// Helper function to handle page 1's special case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should at least describe what the special case is and why it occurs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename the function to hint at what the special case is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not mentioned in SQLite. The overflow case is almost copied from previous balance_root implementation which handled only overflow case.

balance_deeper in SQLite handles the root overflow case. But it works a bit differently. It allocates a new child page and makes it right-child page and then it copies the contents into the child page. Then balance_nonroot will handle the child page overflow by splitting the page.

You can find balance_deeper docs and implementation here: https://github.com/sqlite/sqlite/blob/master/src/btree.c#L8940-L9004

Copy link
Collaborator

@pereman2 pereman2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we handle root underflow? Balance root tries to map 1:1 to balance_deeper implementation which is an overflow balancing of root. Underflow should be managed generically for all cases. Furthermore, can a root page underflow even? Isn't it possible to create a btree page that once you do first insertion it will see itself as underfull?

@krishvishal
Copy link
Contributor Author

@pereman2 I've looked at both the references below and thought it is better to handle root underflow case in the balance_root itself.

I think the comment from [1] describes root underflow case.

    /* The root page of the b-tree now contains no cells. The only sibling
    ** page is the right-child of the parent. Copy the contents of the
    ** child page into the parent, decreasing the overall height of the
    ** b-tree structure by one. This is described as the "balance-shallower"
    ** sub-algorithm in some documentation.
    **
    ** If this is an auto-vacuum database, the call to copyNodeContent()
    ** sets all pointer-map entries corresponding to database image pages
    ** for which the pointer is stored within the content being copied.
    **
    ** It is critical that the child page be defragmented before being
    ** copied into the parent, because if the parent is page 1 then it will
    ** by smaller than the child due to the database header, and so all the
    ** free space needs to be up front.


[1] https://github.com/sqlite/sqlite/blob/master/src/btree.c#L8867-L8881
[2]https://www.sqlite.org/btreemodule.html#:~:text=4.2.5.2%20Balance%20Shallower

self.usable_space(),
);
let buf = child_contents.as_ptr();
cells.push(buf[start..start + len].to_vec());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we insert_into_cell here instead of cloning vecs?

let new_root_buf = new_root_page_contents.as_ptr();
new_root_buf[0..DATABASE_HEADER_SIZE]
.copy_from_slice(&current_root_buf[0..DATABASE_HEADER_SIZE]);
if self.is_underflow(contents) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe do:

if self.is_overflow(contents) {
...
} else if self.is_underflow(contents) {
...
} else {
unreachable!("balance_root was called where we didn't have any overflow or underflow")
}

Comment on lines 1268 to 1271
let child_page = self
.pager
.read_page(child_page_id as usize)
.expect("This shouldn't have happened, child page not found");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to propagate errors

Suggested change
let child_page = self
.pager
.read_page(child_page_id as usize)
.expect("This shouldn't have happened, child page not found");
let child_page = self
.pager
.read_page(child_page_id as usize)?;

And I think we are not handling the case where child page is not loaded/locked?

return;
}

let grandchild_ptr = child_contents.rightmost_pointer();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename grandchild_ptr -> child_rightmost_pointer? There could be a lot of grand childs so it this feels more exact.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, if we are balancing root because it doesn't have any more cells, why do we expect the child to not be a leaf page? I mean, if you take rightmost pointer it means that child is a interior page, therefore, the tree is not balanced.

@@ -1285,8 +1400,7 @@ impl BTreeCursor {

// setup overflow page
let contents = page.get().contents.as_mut().unwrap();
let buf = contents.as_ptr();
buf.fill(0);
contents.write_u32(0, 0); // Initialize next overflow page pointer to 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 0, can we use one of the constants? This 0 means PAGE_HEADER_OFFSET_PAGE_TYPE, are you sure this is what this 0 write means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, what I did is wrong. Will fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pereman2 can you please tell me what's happening here?: https://github.com/sqlite/sqlite/blob/c2e400af042bdd7d21e159a41fcf34c05398044c/src/btree.c#L7166-L7172

This is my understanding:

// Writes the page number of the new overflow page (pgnoOvfl) to pPrior. For the first overflow page, pPrior points to the end of the local cell content.
put4byte(pPrior, pgnoOvfl);

// ignore this
releasePage(pToRelease);
pToRelease = pOvfl;

// Points pPrior to the start of the new overflow page's data area
pPrior = pOvfl->aData;

// Writes 0 at the start of the overflow page, which shows its the last overflow page
put4byte(pPrior, 0);

// Sets pPayload to point past the 4-byte pointer. This is where the actual data will be stored in the overflow page
pPayload = &pOvfl->aData[4];

//Calculates remaining space for data in this overflow page. Usable space minus 4 bytes used for the next-page pointer
spaceLeft = pBt->usableSize - 4;

Where are we doing put4byte(pPrior, 0); equivalent in limbo?

1. Added an unreachable!() to the if-else under/overflow checker.
2. Directly inserted to cells using `insert_into_cell()` instead of cloning vecs into an external vector
3. Propagated errors using `?` instead of `except` and handled the case where child page is not loaded/locked.
4. Restored `allocate_overflow_page()` function to previous state.
@krishvishal
Copy link
Contributor Author

@pereman2 @jussisaurio I've added the following changes, can you please take a look?

  1. Added an unreachable!() to the if-else under/overflow checker.
  2. Directly inserted to cells using insert_into_cell() instead of cloning vecs into an external vector
  3. Propagated errors using ? instead of except and handled the case where child page is not loaded/locked.
  4. Restored allocate_overflow_page() function to previous state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants