Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

svm: allow conflicting transactions in entries #3146

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

2501babe
Copy link
Member

@2501babe 2501babe commented Oct 11, 2024

Problem

simd83 intends to relax entry-level constraints, namely that a transaction which takes a write lock on an account cannot be batched with any other transaction which takes a read or write lock on it

before the locking code can be changed, however, svm must be able to support such batches. presently, if one transaction were to write to an account, and a subsequent transaction read from or wrote to that account, svm would not pass the updated account state to the second transaction; it gets them from accounts-db, which is not updated until after all transaction execution finishes

Summary of Changes

improve the account loader to cache intermediate account states after each transaction execution, and pass that updated state to any subsequent transaction which uses it. rework the transaction batch processor so that fee-payer validation and account loading for each transaction happens after the previous transaction executes

we accomplish this using a just-in-time account loading strategy. we previously experimented with a batched loading strategy, but this had some fundamental drawbacks related to transaction size accounting

transaction batch processing proceeds as follows:

  • create program cache for batch
  • create account loader
  • for each transaction:
    • validate fee-payer and nonce
    • load transaction accounts
    • if executable, execute transaction
    • if successful, update local program cache and update account loader with all writable accounts; if failed, or if fee-only, update nonce and fee-payer; if non-processable, do nothing
  • update global program cache from local program cache, record timings

account loading now happens for one transaction at a time, interleaved with transaction execution, but the high-level flow for an individual transaction is largely unchanged. the crucial difference is that we now maintain a cache of account states local to svm, in addition to the program cache. this allows us to transparently provide transactions with updated account states that previous transactions produced. as a side effect, it should improve performance for batches where multiple transactions read from the same account, since these accounts only need to be fetched from accounts-db once

this change does not affect consensus rules. it updates svm to handle blocks that have write contention, but no such blocks can be produced, because account locking remains unchanged

there are also several consensus-affecting bugs in the old account loader related to the program cache and to transaction data size (see #3045). these have been preserved: the new account loader should in all cases produce identical execution results. we intend to fix the program cache in a subsequent (feature-gated) pr, and to present a simd to explicitly define transaction data size semantics

all code changes are contained in the first commit, for ease of review. the large majority of this pr (all its other commits) are tests (this is presently not true but could be made true again easily)

Comment on lines 98 to 107
#[derive(Debug, Clone)]
pub(crate) struct AccountLoader<'a, CB: TransactionProcessingCallback> {
pub program_cache: ProgramCacheForTxBatch,
account_overrides: Option<&'a AccountOverrides>,
account_cache: HashMap<Pubkey, AccountCacheItem>,
callbacks: &'a CB,
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as-written this struct needs to be passed around mutably. account_overrides and callbacks are never mutated, but account_cache must be mutated by AccountLoader functions, and program_cache by external functions. if youd rather we use an interior mutability pattern for these, please let me know what you suggest

Comment on lines +273 to +302
fn update_accounts_for_successful_tx(
&mut self,
message: &impl SVMMessage,
transaction_accounts: &[TransactionAccount],
) {
for (i, (address, account)) in (0..message.account_keys().len()).zip(transaction_accounts) {
if !message.is_writable(i) {
continue;
}

// Accounts that are invoked and also not passed as an instruction
// account to a program don't need to be stored because it's assumed
// to be impossible for a committable transaction to modify an
// invoked account if said account isn't passed to some program.
if message.is_invoked(i) && !message.is_instruction_account(i) {
continue;
}

self.update_account(address, account);
}
}
Copy link
Member Author

@2501babe 2501babe Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and the two related functions update_accounts_for_failed_tx and update_accounts_for_executed_tx are stripped-down versions of code from account_saver.rs. we talked before about possibly modularizing this code so it isnt duplicated, but i couldnt find a reasonable way to do that because the account saver version expects to move the execution result and read/writes several vectors of whole-transaction state at its deepest levels

the only confusing thing the accounts saver did tho was that .filter(|m| m.is_writable(i) || { !m.is_invoked(i) && m.is_instruction_account(i) }) thing, but i changed it to be more straightforward (also in monorepo account saver), so maybe duplicating the code is fine

@2501babe 2501babe force-pushed the 20241010_svmconflicts_attempt4 branch from 4394f5a to 18141ec Compare October 17, 2024 21:45
svm/src/account_loader.rs Outdated Show resolved Hide resolved
Comment on lines 484 to 583
let nonces_are_equal = account_loader
.load_account(
advanced_nonce_info.address(),
AccountUsagePattern::WritableInstruction,
)
.and_then(|loaded_nonce| {
let current_nonce_account = &loaded_nonce.account;
system_program::check_id(current_nonce_account.owner()).then_some(())?;
StateMut::<NonceVersions>::state(current_nonce_account).ok()
})
.and_then(
|current_nonce_versions| match current_nonce_versions.state() {
NonceState::Initialized(ref current_nonce_data) => {
Some(&current_nonce_data.durable_nonce == durable_nonce)
}
_ => None,
},
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flagging this for special attention because it is quite tricky. i wrote more integration tests for this one block than any other specific thing

Comment on lines 145 to 170
} else if let Some(program) = is_invisible
.then_some(())
.and_then(|_| self.program_cache.find(account_key))
{
// NOTE if this account could be found in the program cache, we must
// substitute its size here, to perserve an oddity with transaction size
// calculations. However, we CANNOT skip the accounts cache check,
// in case the account was non-executable and modified in-batch.
// When we properly check if program cache entries are executable,
// we can go to the program cache first and remove this branch, because
// executable accounts are immutable.
Some(LoadedTransactionAccount {
loaded_size: program.account_size,
account: account_shared_data_from_program(&program),
rent_collected: 0,
})
} else {
Copy link
Member Author

@2501babe 2501babe Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this (checking the account cache first, but overriding with the program cache if appropriate) is the magic that makes the new loader behave exactly like the old loader for the program cache and transaction data size edge cases tested by #3045. it also makes transactions that modify and then use programs in the same entry behave the same as if those transactions were in different entries in the same slot, a set of edge cases that are even more subtle

to restate all known issues (except for loader size, which is irrelevant):

  1. accounts from the program cache are assumed to be marked executable even if theyre not
  2. transaction data size is different depending on whether an invoked program is an instruction account
  3. when programs are deployed, they immediately become "executable" in the same entry despite a delayed visibility tombstone, because program cache tombstones are not checked by the account loader. they fail at execution because the bpf loader checks the slot. there is nothing to fix in this pr because this behavior is identical for programs deployed in the same slot in different entries
  4. when programs are closed or resized, subsequent transactions that read programdata must see the changed state
  5. when programs are closed or resized, subsequent transactions that invoke those programs must see the updated data size from the program cache. this is NOT related to simd83. nothing in account locking stops you from making a two-transaction entry that closes a program and then calls it: the first transaction takes a read lock on program and write lock on programdata, the second transaction takes a read lock on program. the transaction fails anyway, but the data size must be correct because whether it fails during loading or execution affects consensus

i verified by hand that 5 behaves the same with this pr, but it isnt feasible to test because it depends entirely on state internal to load_and_execute_sanitized_transactions because the per-batch program cache is (correctly) not merged back into the global cache, and i didnt want to majorly refactor it in this pr just for tests. tests that cover 1 and 2 are sufficient to demonstrate that the program cache is prioritized over the account cache for non-instruction accounts

as an aside, 1 and 5 cannot be fixed with a feature gate without a simd before fee-paying transactions are enabled, because there is no unambiguous "correct" behavior. as it stands, executable programs stay executable forever, even after they are closed. but if we checked program cache tombstones, we would treat closed programs as non-executable. however, once fee-only transactions are enabled, we dont need a simd or a feature gate, because both paths lead to the same result, so fixing it is just cleaning up the implementation

@Huzaifa696
Copy link

Hi Hana,
I just did an overview of this PR and was curious to know if the motivation behind JIT loading of accounts is primarily performance or uniformity of design (i.e. all execution related stuff done on tx-to-tx basis which includes fee_validation, sanity checks, loading and actual execution).

Also, do you know if this and related PRs are slated for the 2.1.x release?

@2501babe 2501babe marked this pull request as ready for review October 18, 2024 13:37
@2501babe
Copy link
Member Author

@Huzaifa696 i originally wrote a batched loader (#2702) which was fully compliant with simd83, but ultimately decided the batched approach was unworkable for two major reasons:

  • there are a number of interesting edge cases with program cache usage and transaction data size accounting (svm: test account loader edge cases #3045) that must be preserved because they affect consensus and we dont want to feature-gate the loader. i wrote the batched loader to comply with them, but the batched loader was basically a complete rewrite. its much safer to do it this way because a lot of the relevant code stays the same, particularly program loader retrieval in the program indicies loop
  • the fundamental flaw of batched loading is that you only find out if a transaction has exceeded its size limit after you already loaded all its accounts. transaction data size cannot be evaluated during batched loading because account sizes may change due to dealloc or realloc. we also dont want to say that transactions must be valid at the start of a batch because then transaction execution still depends on what entry its in. but if you dont validate during loading, transactions that pay no fees can pull over a gigabyte of data out of accounts-db. i planned to tackle this by having batched loading track all transaction sizes, abort loading accounts only requested by invalid transactions, and then resume loading if a presumed-invalid transaction had become valid, but that would be halfway to jit loading with three times the complexity

this pr is definitely targeting 2.1 but i dont know about the rest of simd83, ill have to find out from releng when the 2.1 cutoff date is expected to be

@2501babe 2501babe force-pushed the 20241010_svmconflicts_attempt4 branch from 18141ec to 813b353 Compare October 18, 2024 15:38
Copy link

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like the direction, left a few comments. need to do another pass on the edge cases for nonce and programs though

svm/src/transaction_processor.rs Outdated Show resolved Hide resolved
svm/src/account_loader.rs Outdated Show resolved Hide resolved
svm/src/account_loader.rs Show resolved Hide resolved
pub(crate) enum AccountUsagePattern {
Writable,
Instruction,
WritableInstruction,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems functionally the same as Writable. is there somewhere that the distinction matters that I am missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two cases we care about are "writable vs read-only" and "writable or instruction vs neither," so writable vs writable instruction doesnt matter but writable instruction vs instruction does. i can cut this down to three variants tho, i was thinking too much about the two bools i was replacing (the reason i made this enum at all was to not have to worry about the order of two bools passed into functions)

Copy link

@apfitzge apfitzge Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think let's cut it down to 3, because otherwise it's less obvious from reading the code (see my comment below on fee-payer) that it is okay to mark it as Writable even though it may be used in an instruction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

svm/src/transaction_processor.rs Show resolved Hide resolved
svm/src/transaction_processor.rs Outdated Show resolved Hide resolved
// This is the same as if it was used in different batches in the same slot
// If the nonce account was closed in the batch, we error as if the blockhash didn't validate
// We must vaidate the account in case it was reopened, either as a normal system account, or a fake nonce account
if let Some(ref advanced_nonce_info) = advanced_nonce {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block seems like it may benefit from being its' own function. That may make it much simpler to just test all the branches?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

youre right, i made it its own function. incidentally, do you think the caller should pass through the blockhash instead of the durable nonce, and the nonce validation function should hash it? the current way, every batch does a sha256sum once, the other way we would do it once per nonce transaction, but presumably most batches dont have any nonce transactions at all

asking first since changing the signature of validate_transaction_fee_payer changes a lot of tests so i would want to squash the history back to one code and n test commits. incidentally i also noticed that function doesnt need &self anymore so ill remove that too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

svm/src/account_loader.rs Outdated Show resolved Hide resolved
Self {
program_cache,
account_overrides,
account_cache: HashMap::new(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's start with some reasonable capacity so we re-allocate as little as possible.
maybe 256. it's not going to be perfect but can help us avoid allocating several times per batch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was waffling on whether to do this since i wasnt sure the reallocs are actually that expensive, but i can just get a max capacity from the total number of account keys on the transactions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wasnt sure the reallocs are actually that expensive

Re-allocs are often the worst for performance, because not only do we need to do an allocation and a deallocation (2 possible yields to os scheduler) we then also need to copy all the data from one to another.

This is per-batch, so it's not as bad as doing an allocation per transaction - but we should still avoid it the best we can.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// We must compare the full account, not just check if lamports are 0.
// Otherwise we might erroneously hide rent-delinquent read-only accounts.
if cache_item.account == AccountSharedData::default() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AccountSharedData::default() will allocate the Arc; we should have some is_default_account fn that makes the comparisons without allocating

Maybe outside the scope of this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now im changing this to use accounts_equal(). the code works fine with ==, but when i was researching Arc i learned that theyre compared using pointer equality, and the reason why this code works is because rustc optimizes empty vecs to point to the same location. i assume this would never change but its better to be sure

i could pr to add is_default() to AccountSharedData separately from this. neither pr would block the other, if this pr goes in first we can just change it after

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@2501babe 2501babe force-pushed the 20241010_svmconflicts_attempt4 branch 3 times, most recently from 8287561 to 2364af1 Compare October 22, 2024 19:46
@2501babe
Copy link
Member Author

@Huzaifa696 we will be targeting 2.2, after this lands ill review your work on account locks and follow up with you

@jstarry
Copy link

jstarry commented Oct 25, 2024

Adding @brooksprumo to review the account inspection code changes

Comment on lines +1176 to +1186
// batch 2:
// * successful transfer to a previously unfunded account
// * successful transfer using the new account as a fee-payer in the same batch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great to test here, commenting a quick thought.
Likely we will rarely see this case in practice. Before scheduling scheduler will do a quick fee-payer check and will drop there before sending to workers.

@2501babe 2501babe force-pushed the 20241010_svmconflicts_attempt4 branch 2 times, most recently from 4628fc0 to 3067871 Compare October 30, 2024 09:40
@2501babe
Copy link
Member Author

ok, ive addressed everything in code review so far. everything except the program cache size calculations should be pretty straightforward. the sizes needed quite a few more tests tho because there are a lot of peculiar cases

@2501babe 2501babe force-pushed the 20241010_svmconflicts_attempt4 branch from 3067871 to 4a0e5ba Compare October 30, 2024 09:46
Comment on lines +150 to +153
} else if let Some(program) = (is_invisible && is_owned_by_loader)
.then_some(())
.and_then(|_| self.program_cache.find(account_key))
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_owned_by_loader is the fix to the issue we have been discussing

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's another bug here. We only add new entries to the program cache during batch processing if an loader account is closed, upgraded, deployed, or extended. In the case where one transaction creates a new loader owned account which is not deployed, it should still be added to the program cache as a tombstone entry to preserve the program cache loaded account data size behavior. Otherwise, a later transaction with an invisible read to that recently created loader account would load the actual account data size rather than 0 bytes because there wasn't an entry in the program cache.

Comment on lines +544 to +545
let next_durable_nonce = DurableNonce::from_blockhash(environment_blockhash);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i moved the hashing in here, previously we did it once per batch. it means we calculate the hash multiple times for batches with multiple nonce transactions, but this is probably much more efficient since almost all batches will be exclusively blockhash transactions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine to do this as a follow-up, but is there any reason we cannot or should not store this value on the TransactionProcessingEnvironment struct itself?

Comment on lines +2254 to +2261
fn bpf_loader_buffer(enable_fee_only_transactions: bool) -> Vec<SvmTestEntry> {
let mut test_entries = vec![];
let mut common_test_entry = SvmTestEntry::default();
if enable_fee_only_transactions {
common_test_entry
.enabled_features
.push(feature_set::enable_transaction_loading_failure_fees::id());
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most new tests for the discussed case are here

Comment on lines +1583 to +1597
// currently, the account loader retrieves read-only non-instruction accounts from the program cache
// it creates a mock AccountSharedData with the executable flag set to true
// however, it does not check whether these accounts are actually executable before doing so
// this affects consensus: a transaction that uses a cached non-executable program executes and fails
// but if the transaction gets the program from accounts-db, it will be dropped during account loading
// this test enforces the current behavior, so that future account loader changes do not break consensus
//
// account cache has its own code path that accesses program cache, so we test hit and miss
// we also test bpf_loader and bpf_loader_upgradeable, because accounts owned by the latter can behave strangely
// all cases should produce the same results
#[test_matrix([bpf_loader::id(), bpf_loader_upgradeable::id()], [false, true])]
fn test_load_transaction_accounts_program_account_executable_bypass(
program_owner: Pubkey,
clear_account_cache: bool,
) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also more coverage for that case here, the code is the same but the test matrix ensures we hit all codepaths and a normal loader vs the upgradeable loader. there isnt actually any difference in behavior between loaderv3 buffers and anything else tho. the program cache pulls in empty accounts owned by the loaders too

Comment on lines +302 to +305
.flat_map(|tx| tx.account_keys().iter())
.sorted()
.dedup()
.count();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could just over-allocate here using account keys lengths, instead of doing an intermediate allocation for the sorting and deduping.

that will definitely give us enough room, and even in our worst expected case (for validator) of 64 txs using 64 accts it's only 4096 pubkeys.

Ok(tx_details)
})
.and_then(|tx_details| {
// Now validate the fee-payer for all transactions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment seems misleading here, this is validating a single transactions.

@2501babe 2501babe force-pushed the 20241010_svmconflicts_attempt4 branch from dd1548a to 9ec457b Compare October 30, 2024 15:24
Comment on lines +154 to +157
// NOTE if this account could be found in the program cache, we must
// substitute its size here, to perserve an oddity with transaction size
// calculations. However, we CANNOT skip the accounts cache check,
// in case the account was non-executable and modified in-batch.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// NOTE if this account could be found in the program cache, we must
// substitute its size here, to perserve an oddity with transaction size
// calculations. However, we CANNOT skip the accounts cache check,
// in case the account was non-executable and modified in-batch.
// NOTE if this account could be found in the program cache, we must
// substitute its size here to preserve an oddity with transaction size
// calculations. However, we CANNOT skip the accounts cache check,
// in case the account was non-executable and modified in-batch.

@@ -95,6 +97,260 @@ pub struct FeesOnlyTransaction {
pub fee_details: FeeDetails,
}

#[cfg_attr(feature = "dev-context-only-utils", derive(Clone))]
pub(crate) struct AccountLoader<'a, CB: TransactionProcessingCallback> {
pub program_cache: ProgramCacheForTxBatch,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: vis can be reduced to pub(crate)

usage_pattern: AccountUsagePattern,
) -> Option<LoadedTransactionAccount> {
let is_writable = usage_pattern == AccountUsagePattern::Writable;
let is_invisible = usage_pattern == AccountUsagePattern::ReadOnlyInvisible;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about is_invisible_read?

Comment on lines +150 to +153
} else if let Some(program) = (is_invisible && is_owned_by_loader)
.then_some(())
.and_then(|_| self.program_cache.find(account_key))
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's another bug here. We only add new entries to the program cache during batch processing if an loader account is closed, upgraded, deployed, or extended. In the case where one transaction creates a new loader owned account which is not deployed, it should still be added to the program cache as a tombstone entry to preserve the program cache loaded account data size behavior. Otherwise, a later transaction with an invisible read to that recently created loader account would load the actual account data size rather than 0 bytes because there wasn't an entry in the program cache.

Copy link

@jstarry jstarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing this for awhile I am pretty nervous about cache bugs popping up in this new approach with its new layer of caching. Even if we find all of the bugs now, I feel like there's a pretty high risk of new bugs being introduced. I feel like it might be better if the SVM entrypoint only handles one transaction at a time

Comment on lines +584 to +589
// NOTE this will be changed to a `load_account()` call in a PR immediately following this one
// we want to stop pushing loaders on the accounts vec, but tests need to change if we do that
// and this PR is complex enough that we want to code review any new behaviors separately
if let Some(owner_account) =
account_loader.callbacks.get_account_shared_data(owner_id)
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not using load_account here we have to be sure that we aren't loading an account that was modified earlier in the batch. Since we are only ever fetching owner accounts for executable programs, these owner accounts are necessarily either the native loader or one of the bpf loaders and none of these are able to be modified so this is safe from my understanding

@apfitzge
Copy link

I feel like it might be better if the SVM entrypoint only handles one transaction at a time

If we do this we need all batches of transactions executed to of size 1 as well; this really hurts overall performance because of the way much of our current execution pipeline is designed.

We've talked about this separately before and I think you were there(?). Long term it may make sense for us to move to a streaming model (not batched), but right now there's a bunch of overhead because of the way our processing pipeline has been implemented/designed. Just using simple transfers as an example (since we have a bench for it)

test bench_execute_batch_full_batch                        ... bench:     378,039.07 ns/iter (+/- 28,068.65)
test bench_execute_batch_full_batch_disable_tx_cost_update ... bench:     365,141.14 ns/iter (+/- 20,371.96)
test bench_execute_batch_half_batch                        ... bench:     389,184.90 ns/iter (+/- 16,216.87)
test bench_execute_batch_half_batch_disable_tx_cost_update ... bench:     376,606.51 ns/iter (+/- 36,538.17)
test bench_execute_batch_unbatched                         ... bench:     819,354.70 ns/iter (+/- 41,285.52)
test bench_execute_batch_unbatched_disable_tx_cost_update  ... bench:     817,550.00 ns/iter (+/- 115,051.66)

running same number of txs individually instead of in batches of 32 or 64 is more than 2x slower. This obviously doesn't translate directly to mnb load, which we can currently handle in an unbatched approach, but at this point it seems premature (to me) to get rid of the batching, given the overhead we observe.

@2501babe
Copy link
Member Author

2501babe commented Oct 31, 2024

i agree the cache interactions are worrying, which is why i wanted to feature-gate this

i think the account cache approach is, in and of itself, conceptually sound. the difficulty all comes down to preserving existing bugs with how program cache misuse interacts with reader/writer and writer/writer batches. these fall under two broad, and partially overlapping, categories:

  • account loader does not check program cache tombstones
  • program cache reports incorrect sizes for a variety of accounts

the first class includes the case where we need to execute non-executable accounts, and is fixed by fee-only transactions because loading failures (other than the fee-payer) and execution failures become identical from the point of view of consensus. the second class is intended to be fixed by simd186

once those features are live then the caches are effectively independent. we always get instruction accounts from the accounts cache, we always get non-instruction programs from the program cache, and the sizes always agree so there is no need for escape hatches where we go to one rather than the other

in other words this is only risky if we were to relax account locks before activating those two features. this isnt to say "that means we can merge this even if we arent 100% sure it will be correct since none of the difficult code paths will be hit anyway." the goal is to merge something correct. im just defending the account cache approach over single-batching

i think if we decide we cant merge this pr, we should either:

  • consider simd83 blocked by the fee-only transaction and transaction data size features
  • put the account cache loader behind a feature gate and handle tombstones correctly in it. with a feature gate, we dont need to worry that some of the transaction data size totals change, or that certain execution failures turn into loading failures. we can implement simd186 at our leisure without blocking simd83 from moving forward

if we go that route i believe we can split this into two prs (which i didnt realize before, since it was actually impossible in the previous batched account loader pr and in early states of this pr):

  • all the changes to the transaction processing loop (validate fees and nonces before each transaction load, load before each transaction execution) plus the AccountLoader struct which encapsulates account loading, with no account cache. no feature gate
  • add the account cache. this is feature-gated. perhaps we could do this in the same pr as we relax account locks to have only one feature gate for simd83

@jstarry
Copy link

jstarry commented Oct 31, 2024

I think my initial reaction was too strong.. I think the accounts cache makes a lot of conceptual sense too. I'd like to change/clarify my suggestion not as getting rid of batching entirely, but just inside the SVM (starting at the entrypoint function load_and_execute_sanitized_transactions). Inside load_and_execute_sanitized_transactions it doesn't seem like there's a lot of benefit of batching anyways, could be wrong. The block_processor bench you pointed out does transaction checks and commits which would still probably benefit from batching and would require an accounts cache still.

So what I'm suggesting is basically what @2501babe is saying with

all the changes to the transaction processing loop (validate fees and nonces before each transaction load, load before each transaction execution) ..

but also setup a new program cache before each transaction as well.

Making the SVM entrypoint operate on a single transaction at a time will more clearly delineate responsibilities. It's weird to me that the SVM has to internally update the passed program cache to prepare for executing the next transaction in the passed batch. This should all happen before the SVM is invoked which is basically what is happening in master before this PR. This PR is forcing the SVM to be smarter than it should be by trying to reuse the program cache for the entire batch.

So the tx pipeline can still do some batched tx checks before execution (and batched commit afterwards) but anything requiring account state should be done without batching because otherwise previous transactions with conflicting write locks could interfere with later transactions. Does that feel like a reasonable way of looking at the problem?

@jstarry
Copy link

jstarry commented Oct 31, 2024

Oh and there's another looming issue with SIMD 83 and the status cache.. Right now we check if transactions are in the status cache before processing a locked batch and then after processing that batch, we update the status cache before the next batch. This only works because we know that if two transactions have the same message hash, they both must have at least one writable account and therefore could not be in the same batch together. If we allow conflicting write locks in batches, this mechanism breaks and we could allow duplicate transactions to be included in a single batch. So we probably need a batch-level status cache too.

EDIT: this is actually probably out of scope for this PR since it's outside of the SVM

@2501babe
Copy link
Member Author

2501babe commented Oct 31, 2024

i made a draft pr that has all the transaction processing and account loader changes in this one, except the account cache: #3404. consider it illustrative until we decide how we want to handle the simd83-relevant changes, but its nice we can actually break this in half. depending on the approach we take we could merge that one and then do the trickier work separately

but also setup a new program cache before each transaction as well

starry and i discussed on slack and we think this could work on a technical level but has unknown performance implications. this pr was doable without benches because it performs strictly fewer account loads in all cases, but if we set up a per-tx instead of a per-batch program cache it will load a lot more. if what brooks says that only the first accounts-db access of an account in a block is expensive, this could be fine. but we need to write good svm benches before we can consider this

It's weird to me that the SVM has to internally update the passed program cache to prepare for executing the next transaction in the passed batch. This should all happen before the SVM is invoked which is basically what is happening in master before this PR.

mentioned on slack that it already does this in master but i wasnt sure why, because it isnt passed to anything or returned. digging more into it, it looks like this:

pub type ProgramRuntimeEnvironment = Arc<BuiltinProgram<InvokeContext<'static>>>;

means they share the same backing storage, hence why the call to evict_using_2s_random_selection() doesnt need to accept the batch-local cache as a parameter to use it for updates

EDIT: this is actually probably out of scope for this PR since it's outside of the SVM

right, svm has no concept of duplicate transactions at all. i havent looked at this code yet but i believe we would just need to check a batch has no duplicate message hashes at the same time we check the status cache. since the message contains the blockhash these can still be considered duplicates

@apfitzge
Copy link

Oh and there's another looming issue with SIMD 83 and the status cache

Yeah for sure. We need to dedup transactions but it should be done much before SVM and even before we do the status_cache checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants