Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TieredStorage] Have HotStorageWriter::write_account() return Vec<StoredAccountInfo> #34929

Merged
merged 4 commits into from
Feb 1, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 67 additions & 31 deletions accounts-db/src/tiered_storage/hot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

use {
crate::{
account_storage::meta::StoredAccountMeta,
account_storage::meta::{StoredAccountInfo, StoredAccountMeta},
accounts_file::MatchAccountOwnerError,
accounts_hash::AccountHash,
rent_collector::RENT_EXEMPT_RENT_EPOCH,
Expand Down Expand Up @@ -543,14 +543,15 @@ impl HotStorageWriter {
&self,
accounts: &StorableAccountsWithHashesAndWriteVersions<'a, 'b, T, U, V>,
skip: usize,
) -> TieredStorageResult<()> {
) -> TieredStorageResult<Vec<StoredAccountInfo>> {
let mut footer = new_hot_footer();
let mut index = vec![];
let mut owners_table = OwnersTable::default();
let mut cursor = 0;

// writing accounts blocks
let len = accounts.accounts.len();
let mut stored_infos = Vec::with_capacity(len - skip);
brooksprumo marked this conversation as resolved.
Show resolved Hide resolved
for i in skip..len {
let (account, address, account_hash, _write_version) = accounts.get(i);
let index_entry = AccountIndexWriterEntry {
Expand All @@ -574,14 +575,30 @@ impl HotStorageWriter {
})
.unwrap_or((0, &OWNER_NO_OWNER, &[], false, None, None));
let owner_offset = owners_table.insert(owner);
cursor += self.write_account(
let stored_size = self.write_account(
lamports,
owner_offset,
data,
executable,
rent_epoch,
account_hash,
)?;
cursor += stored_size;

stored_infos.push(StoredAccountInfo {
// Here we pass the IndexOffset as the get_account() API
// takes IndexOffset. Given the account address is also
// maintained outside the TieredStorage, a potential optimization
// is to store AccountOffset instead, which can further save
// one jump from the index block to the accounts block.
offset: index.len(),
// The size here might be slightly bigger than the actual
// storage size of one account as one owner address could be
// shared by multiple accounts.
size: stored_size
+ footer.index_block_format.entry_size::<HotAccountOffset>()
+ std::mem::size_of::<Pubkey>(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... very interesting...

We may want to tweak this API. It currently assumes that the total appended size is the same as summing up all the accounts appended. For Tiered Storage that's not the case.

Since we know all the accounts, and once we write then there will be no other writes, it means we know the total/final/complete size of the storage file. It'd probably be best to have this function return a tuple: a list of the stored account offsets, and a total size stored.

That would still work with AppendVec, and then would correctly set the size for Tiered Storage.

Since the size of the file in AccountStorageEntry::alive_bytes is used by shrink, I think we want to ensure we accurately compute the size here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to tweak this API. It currently assumes that the total appended size is the same as summing up all the accounts appended. For Tiered Storage that's not the case.

Yep, I think there should exist a better API. But if I remember it correctly, an estimation won't make accounts-db panic based on my previous run joining pop-net and main-net a while back.

It'd probably be best to have this function return a tuple: a list of the stored account offsets, and a total size stored.

I think the size of each entry is used to estimate the saving of a shrink when selecting which file / account to shrink in accounts-db. So I think it does want the information for each individual account, but no need to be super accurate.

Since the size of the file in AccountStorageEntry::alive_bytes is used by shrink, I think we want to ensure we accurately compute the size here.

If I remember it correctly, accounts-db won't panic when hot-storage only provide an estimation based on my previous run on mainnet / popnet with the hot storage (a while back).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the size of each entry is used to estimate the saving of a shrink when selecting which file / account to shrink in accounts-db. So I think it does want the information for each individual account, but no need to be super accurate.

I think we should fix the API first, and then return the correct size here. IIRC the current append vec code uses the size when shrinking/reclaiming and creating new append vecs, and I believe there are new assumptions about having zero extra size in the append vec files. This was not always the case.

Option (2) would be to use this PR as-is, and create a GH issue to address fixing the API and updating this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let's do option 2. Will create an issue as a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double-checked the accounts_db code.

When removing an account in AccountStorageEntry, it will maintain the alive_bytes by subtracting the stored_size. In that case, I think it's safer to not include the owner here in TieredStorage otherwise the alive_bytes will go negative.

Will update the PR to use an estimate that will no larger than the actual stored size.

Copy link
Contributor Author

@yhchiang-sol yhchiang-sol Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to consider only stored size directly contributed by the account (i.e., mainly account meta, data, and address).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the API, the stored size of each account is still used when computing the estimated storage size saving when shrinking a specific accounts file (or append-vec). So I think the API should be good for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue created: #35041

});
index.push(index_entry);
}
footer.account_entry_count = (len - skip) as u32;
Expand Down Expand Up @@ -611,7 +628,7 @@ impl HotStorageWriter {

footer.write_footer_block(&self.storage)?;

Ok(())
Ok(stored_infos)
}
}

Expand Down Expand Up @@ -1281,6 +1298,37 @@ pub mod tests {
(stored_meta, AccountSharedData::from(account))
}

fn verify_account(
stored_meta: &StoredAccountMeta<'_>,
account: Option<&impl ReadableAccount>,
address: &Pubkey,
account_hash: &AccountHash,
) {
let (lamports, owner, data, executable, account_hash) = account
.map(|acc| {
(
acc.lamports(),
acc.owner(),
acc.data(),
acc.executable(),
// only persist rent_epoch for those non-rent-exempt accounts
brooksprumo marked this conversation as resolved.
Show resolved Hide resolved
Some(*account_hash),
)
})
.unwrap_or((0, &OWNER_NO_OWNER, &[], false, None));

assert_eq!(stored_meta.lamports(), lamports);
assert_eq!(stored_meta.data().len(), data.len());
assert_eq!(stored_meta.data(), data);
assert_eq!(stored_meta.executable(), executable);
assert_eq!(stored_meta.owner(), owner);
assert_eq!(stored_meta.pubkey(), address);
assert_eq!(
*stored_meta.hash(),
account_hash.unwrap_or(AccountHash(Hash::default()))
);
}

#[test]
fn test_write_account_and_index_blocks() {
let account_data_sizes = &[
Expand Down Expand Up @@ -1317,11 +1365,10 @@ pub mod tests {

let temp_dir = TempDir::new().unwrap();
let path = temp_dir.path().join("test_write_account_and_index_blocks");

{
let stored_infos = {
let writer = HotStorageWriter::new(&path).unwrap();
writer.write_accounts(&storable_accounts, 0).unwrap();
}
writer.write_accounts(&storable_accounts, 0).unwrap()
};

let hot_storage = HotStorageReader::new_from_path(&path).unwrap();

Expand All @@ -1334,29 +1381,7 @@ pub mod tests {
.unwrap();

let (account, address, account_hash, _write_version) = storable_accounts.get(i);
let (lamports, owner, data, executable, account_hash) = account
.map(|acc| {
(
acc.lamports(),
acc.owner(),
acc.data(),
acc.executable(),
// only persist rent_epoch for those rent-paying accounts
Some(*account_hash),
)
})
.unwrap_or((0, &OWNER_NO_OWNER, &[], false, None));

assert_eq!(stored_meta.lamports(), lamports);
assert_eq!(stored_meta.data().len(), data.len());
assert_eq!(stored_meta.data(), data);
assert_eq!(stored_meta.executable(), executable);
assert_eq!(stored_meta.owner(), owner);
assert_eq!(stored_meta.pubkey(), address);
assert_eq!(
*stored_meta.hash(),
account_hash.unwrap_or(AccountHash(Hash::default()))
);
verify_account(&stored_meta, account, address, account_hash);

assert_eq!(i + 1, next);
}
Expand All @@ -1366,5 +1391,16 @@ pub mod tests {
hot_storage.get_account(IndexOffset(num_accounts as u32)),
Ok(None)
);

for stored_info in stored_infos {
let (stored_meta, _) = hot_storage
.get_account(IndexOffset(stored_info.offset as u32))
.unwrap()
.unwrap();

let (account, address, account_hash, _write_version) =
storable_accounts.get(stored_info.offset);
verify_account(&stored_meta, account, address, account_hash);
}
}
}
Loading