svm: allow conflicting transactions in entries #3146

2501babe · 2024-10-11T20:23:18Z

Problem

simd83 intends to relax entry-level constraints, namely that a transaction which takes a write lock on an account cannot be batched with any other transaction which takes a read or write lock on it

before the locking code can be changed, however, svm must be able to support such batches. presently, if one transaction were to write to an account, and a subsequent transaction read from or wrote to that account, svm would not pass the updated account state to the second transaction; it gets them from accounts-db, which is not updated until after all transaction execution finishes

Summary of Changes

improve the account loader to cache intermediate account states after each transaction execution, and pass that updated state to any subsequent transaction which uses it. rework the transaction batch processor so that fee-payer validation and account loading for each transaction happens after the previous transaction executes

we accomplish this using a just-in-time account loading strategy. we previously experimented with a batched loading strategy, but this had some fundamental drawbacks related to transaction size accounting

transaction batch processing proceeds as follows:

create program cache for batch
create account loader
for each transaction:
- validate fee-payer and nonce
- load transaction accounts
- if executable, execute transaction
- if successful, update local program cache and update account loader with all writable accounts; if failed, or if fee-only, update nonce and fee-payer; if non-processable, do nothing
update global program cache from local program cache, record timings

account loading now happens for one transaction at a time, interleaved with transaction execution, but the high-level flow for an individual transaction is largely unchanged. the crucial difference is that we now maintain a cache of account states local to svm, in addition to the program cache. this allows us to transparently provide transactions with updated account states that previous transactions produced. as a side effect, it should improve performance for batches where multiple transactions read from the same account, since these accounts only need to be fetched from accounts-db once

this change does not affect consensus rules. it updates svm to handle blocks that have write contention, but no such blocks can be produced, because account locking remains unchanged

there are also several consensus-affecting bugs in the old account loader related to the program cache and to transaction data size (see #3045). these have been preserved: the new account loader should in all cases produce identical execution results. we intend to fix the program cache in a subsequent (feature-gated) pr, and to present a simd to explicitly define transaction data size semantics

~~all code changes are contained in the first commit, for ease of review. the large majority of this pr (all its other commits) are tests~~ (this is presently not true but could be made true again easily)

2501babe · 2024-10-17T21:28:22Z

svm/src/account_loader.rs

+#[derive(Debug, Clone)]
+pub(crate) struct AccountLoader<'a, CB: TransactionProcessingCallback> {
+    pub program_cache: ProgramCacheForTxBatch,
+    account_overrides: Option<&'a AccountOverrides>,
+    account_cache: HashMap<Pubkey, AccountCacheItem>,
+    callbacks: &'a CB,
+}


as-written this struct needs to be passed around mutably. account_overrides and callbacks are never mutated, but account_cache must be mutated by AccountLoader functions, and program_cache by external functions. if youd rather we use an interior mutability pattern for these, please let me know what you suggest

2501babe · 2024-10-17T21:36:47Z

svm/src/account_loader.rs

+    fn update_accounts_for_successful_tx(
+        &mut self,
+        message: &impl SVMMessage,
+        transaction_accounts: &[TransactionAccount],
+    ) {
+        for (i, (address, account)) in (0..message.account_keys().len()).zip(transaction_accounts) {
+            if !message.is_writable(i) {
+                continue;
+            }
+
+            // Accounts that are invoked and also not passed as an instruction
+            // account to a program don't need to be stored because it's assumed
+            // to be impossible for a committable transaction to modify an
+            // invoked account if said account isn't passed to some program.
+            if message.is_invoked(i) && !message.is_instruction_account(i) {
+                continue;
+            }
+
+            self.update_account(address, account);
+        }
+    }


this and the two related functions update_accounts_for_failed_tx and update_accounts_for_executed_tx are stripped-down versions of code from account_saver.rs. we talked before about possibly modularizing this code so it isnt duplicated, but i couldnt find a reasonable way to do that because the account saver version expects to move the execution result and read/writes several vectors of whole-transaction state at its deepest levels

the only confusing thing the accounts saver did tho was that .filter(|m| m.is_writable(i) || { !m.is_invoked(i) && m.is_instruction_account(i) }) thing, but i changed it to be more straightforward (also in monorepo account saver), so maybe duplicating the code is fine

svm/src/account_loader.rs

2501babe · 2024-10-17T21:52:16Z

svm/src/transaction_processor.rs

+            let nonces_are_equal = account_loader
+                .load_account(
+                    advanced_nonce_info.address(),
+                    AccountUsagePattern::WritableInstruction,
+                )
+                .and_then(|loaded_nonce| {
+                    let current_nonce_account = &loaded_nonce.account;
+                    system_program::check_id(current_nonce_account.owner()).then_some(())?;
+                    StateMut::<NonceVersions>::state(current_nonce_account).ok()
+                })
+                .and_then(
+                    |current_nonce_versions| match current_nonce_versions.state() {
+                        NonceState::Initialized(ref current_nonce_data) => {
+                            Some(&current_nonce_data.durable_nonce == durable_nonce)
+                        }
+                        _ => None,
+                    },
+                );


flagging this for special attention because it is quite tricky. i wrote more integration tests for this one block than any other specific thing

2501babe · 2024-10-18T13:02:12Z

svm/src/account_loader.rs

+            } else if let Some(program) = is_invisible
+                .then_some(())
+                .and_then(|_| self.program_cache.find(account_key))
+            {
+                // NOTE if this account could be found in the program cache, we must
+                // substitute its size here, to perserve an oddity with transaction size
+                // calculations. However, we CANNOT skip the accounts cache check,
+                // in case the account was non-executable and modified in-batch.
+                // When we properly check if program cache entries are executable,
+                // we can go to the program cache first and remove this branch, because
+                // executable accounts are immutable.
+                Some(LoadedTransactionAccount {
+                    loaded_size: program.account_size,
+                    account: account_shared_data_from_program(&program),
+                    rent_collected: 0,
+                })
+            } else {


this (checking the account cache first, but overriding with the program cache if appropriate) is the magic that makes the new loader behave exactly like the old loader for the program cache and transaction data size edge cases tested by #3045. it also makes transactions that modify and then use programs in the same entry behave the same as if those transactions were in different entries in the same slot, a set of edge cases that are even more subtle

to restate all known issues (except for loader size, which is irrelevant):

accounts from the program cache are assumed to be marked executable even if theyre not

transaction data size is different depending on whether an invoked program is an instruction account

when programs are deployed, they immediately become "executable" in the same entry despite a delayed visibility tombstone, because program cache tombstones are not checked by the account loader. they fail at execution because the bpf loader checks the slot. there is nothing to fix in this pr because this behavior is identical for programs deployed in the same slot in different entries

when programs are closed or resized, subsequent transactions that read programdata must see the changed state

when programs are closed or resized, subsequent transactions that invoke those programs must see the updated data size from the program cache. this is NOT related to simd83. nothing in account locking stops you from making a two-transaction entry that closes a program and then calls it: the first transaction takes a read lock on program and write lock on programdata, the second transaction takes a read lock on program. the transaction fails anyway, but the data size must be correct because whether it fails during loading or execution affects consensus

i verified by hand that 5 behaves the same with this pr, but it isnt feasible to test because it depends entirely on state internal to load_and_execute_sanitized_transactions because the per-batch program cache is (correctly) not merged back into the global cache, and i didnt want to majorly refactor it in this pr just for tests. tests that cover 1 and 2 are sufficient to demonstrate that the program cache is prioritized over the account cache for non-instruction accounts

as an aside, 1 and 5 cannot be fixed with a feature gate without a simd before fee-paying transactions are enabled, because there is no unambiguous "correct" behavior. as it stands, executable programs stay executable forever, even after they are closed. but if we checked program cache tombstones, we would treat closed programs as non-executable. however, once fee-only transactions are enabled, we dont need a simd or a feature gate, because both paths lead to the same result, so fixing it is just cleaning up the implementation

Huzaifa696 · 2024-10-18T13:35:35Z

Hi Hana,
I just did an overview of this PR and was curious to know if the motivation behind JIT loading of accounts is primarily performance or uniformity of design (i.e. all execution related stuff done on tx-to-tx basis which includes fee_validation, sanity checks, loading and actual execution).

Also, do you know if this and related PRs are slated for the 2.1.x release?

2501babe · 2024-10-18T14:03:45Z

@Huzaifa696 i originally wrote a batched loader (#2702) which was fully compliant with simd83, but ultimately decided the batched approach was unworkable for two major reasons:

there are a number of interesting edge cases with program cache usage and transaction data size accounting (svm: test account loader edge cases #3045) that must be preserved because they affect consensus and we dont want to feature-gate the loader. i wrote the batched loader to comply with them, but the batched loader was basically a complete rewrite. its much safer to do it this way because a lot of the relevant code stays the same, particularly program loader retrieval in the program indicies loop
the fundamental flaw of batched loading is that you only find out if a transaction has exceeded its size limit after you already loaded all its accounts. transaction data size cannot be evaluated during batched loading because account sizes may change due to dealloc or realloc. we also dont want to say that transactions must be valid at the start of a batch because then transaction execution still depends on what entry its in. but if you dont validate during loading, transactions that pay no fees can pull over a gigabyte of data out of accounts-db. i planned to tackle this by having batched loading track all transaction sizes, abort loading accounts only requested by invalid transactions, and then resume loading if a presumed-invalid transaction had become valid, but that would be halfway to jit loading with three times the complexity

this pr is definitely targeting 2.1 but i dont know about the rest of simd83, ill have to find out from releng when the 2.1 cutoff date is expected to be

apfitzge

like the direction, left a few comments. need to do another pass on the edge cases for nonce and programs though

svm/src/transaction_processor.rs

svm/src/account_loader.rs

apfitzge · 2024-10-21T20:34:43Z

svm/src/account_loader.rs

+pub(crate) enum AccountUsagePattern {
+    Writable,
+    Instruction,
+    WritableInstruction,


this seems functionally the same as Writable. is there somewhere that the distinction matters that I am missing?

the two cases we care about are "writable vs read-only" and "writable or instruction vs neither," so writable vs writable instruction doesnt matter but writable instruction vs instruction does. i can cut this down to three variants tho, i was thinking too much about the two bools i was replacing (the reason i made this enum at all was to not have to worry about the order of two bools passed into functions)

I think let's cut it down to 3, because otherwise it's less obvious from reading the code (see my comment below on fee-payer) that it is okay to mark it as Writable even though it may be used in an instruction.

svm/src/transaction_processor.rs

apfitzge · 2024-10-21T20:39:28Z

svm/src/transaction_processor.rs

+        // This is the same as if it was used in different batches in the same slot
+        // If the nonce account was closed in the batch, we error as if the blockhash didn't validate
+        // We must vaidate the account in case it was reopened, either as a normal system account, or a fake nonce account
+        if let Some(ref advanced_nonce_info) = advanced_nonce {


this block seems like it may benefit from being its' own function. That may make it much simpler to just test all the branches?

youre right, i made it its own function. incidentally, do you think the caller should pass through the blockhash instead of the durable nonce, and the nonce validation function should hash it? the current way, every batch does a sha256sum once, the other way we would do it once per nonce transaction, but presumably most batches dont have any nonce transactions at all

asking first since changing the signature of validate_transaction_fee_payer changes a lot of tests so i would want to squash the history back to one code and n test commits. incidentally i also noticed that function doesnt need &self anymore so ill remove that too

svm/src/account_loader.rs

apfitzge · 2024-10-21T20:45:11Z

svm/src/account_loader.rs

+        Self {
+            program_cache,
+            account_overrides,
+            account_cache: HashMap::new(),


let's start with some reasonable capacity so we re-allocate as little as possible.
maybe 256. it's not going to be perfect but can help us avoid allocating several times per batch.

i was waffling on whether to do this since i wasnt sure the reallocs are actually that expensive, but i can just get a max capacity from the total number of account keys on the transactions

i wasnt sure the reallocs are actually that expensive

Re-allocs are often the worst for performance, because not only do we need to do an allocation and a deallocation (2 possible yields to os scheduler) we then also need to copy all the data from one to another.

This is per-batch, so it's not as bad as doing an allocation per transaction - but we should still avoid it the best we can.

apfitzge · 2024-10-21T20:49:15Z

svm/src/account_loader.rs

+
+            // We must compare the full account, not just check if lamports are 0.
+            // Otherwise we might erroneously hide rent-delinquent read-only accounts.
+            if cache_item.account == AccountSharedData::default() {


AccountSharedData::default() will allocate the Arc; we should have some is_default_account fn that makes the comparisons without allocating

Maybe outside the scope of this PR.

for now im changing this to use accounts_equal(). the code works fine with ==, but when i was researching Arc i learned that theyre compared using pointer equality, and the reason why this code works is because rustc optimizes empty vecs to point to the same location. i assume this would never change but its better to be sure

i could pr to add is_default() to AccountSharedData separately from this. neither pr would block the other, if this pr goes in first we can just change it after

2501babe · 2024-10-23T15:54:56Z

@Huzaifa696 we will be targeting 2.2, after this lands ill review your work on account locks and follow up with you

jstarry · 2024-10-25T21:01:39Z

Adding @brooksprumo to review the account inspection code changes

apfitzge · 2024-10-25T21:13:35Z

svm/tests/integration_test.rs

+    // batch 2:
+    // * successful transfer to a previously unfunded account
+    // * successful transfer using the new account as a fee-payer in the same batch


great to test here, commenting a quick thought.
Likely we will rarely see this case in practice. Before scheduling scheduler will do a quick fee-payer check and will drop there before sending to workers.

2501babe · 2024-10-30T09:44:19Z

ok, ive addressed everything in code review so far. everything except the program cache size calculations should be pretty straightforward. the sizes needed quite a few more tests tho because there are a lot of peculiar cases

2501babe · 2024-10-30T09:48:46Z

svm/src/account_loader.rs

+            } else if let Some(program) = (is_invisible && is_owned_by_loader)
+                .then_some(())
+                .and_then(|_| self.program_cache.find(account_key))
+            {


is_owned_by_loader is the fix to the issue we have been discussing

I think there's another bug here. We only add new entries to the program cache during batch processing if an loader account is closed, upgraded, deployed, or extended. In the case where one transaction creates a new loader owned account which is not deployed, it should still be added to the program cache as a tombstone entry to preserve the program cache loaded account data size behavior. Otherwise, a later transaction with an invisible read to that recently created loader account would load the actual account data size rather than 0 bytes because there wasn't an entry in the program cache.

2501babe · 2024-10-30T09:54:16Z

svm/src/transaction_processor.rs

+            let next_durable_nonce = DurableNonce::from_blockhash(environment_blockhash);
+


i moved the hashing in here, previously we did it once per batch. it means we calculate the hash multiple times for batches with multiple nonce transactions, but this is probably much more efficient since almost all batches will be exclusively blockhash transactions

Fine to do this as a follow-up, but is there any reason we cannot or should not store this value on the TransactionProcessingEnvironment struct itself?

2501babe · 2024-10-30T10:05:50Z

svm/tests/integration_test.rs

+fn bpf_loader_buffer(enable_fee_only_transactions: bool) -> Vec<SvmTestEntry> {
+    let mut test_entries = vec![];
+    let mut common_test_entry = SvmTestEntry::default();
+    if enable_fee_only_transactions {
+        common_test_entry
+            .enabled_features
+            .push(feature_set::enable_transaction_loading_failure_fees::id());
+    }


most new tests for the discussed case are here

2501babe · 2024-10-30T10:08:50Z

svm/src/account_loader.rs

+    // currently, the account loader retrieves read-only non-instruction accounts from the program cache
+    // it creates a mock AccountSharedData with the executable flag set to true
+    // however, it does not check whether these accounts are actually executable before doing so
+    // this affects consensus: a transaction that uses a cached non-executable program executes and fails
+    // but if the transaction gets the program from accounts-db, it will be dropped during account loading
+    // this test enforces the current behavior, so that future account loader changes do not break consensus
+    //
+    // account cache has its own code path that accesses program cache, so we test hit and miss
+    // we also test bpf_loader and bpf_loader_upgradeable, because accounts owned by the latter can behave strangely
+    // all cases should produce the same results
+    #[test_matrix([bpf_loader::id(), bpf_loader_upgradeable::id()], [false, true])]
+    fn test_load_transaction_accounts_program_account_executable_bypass(
+        program_owner: Pubkey,
+        clear_account_cache: bool,
+    ) {


also more coverage for that case here, the code is the same but the test matrix ensures we hit all codepaths and a normal loader vs the upgradeable loader. there isnt actually any difference in behavior between loaderv3 buffers and anything else tho. the program cache pulls in empty accounts owned by the loaders too

apfitzge · 2024-10-30T15:19:29Z

svm/src/transaction_processor.rs

+            .flat_map(|tx| tx.account_keys().iter())
+            .sorted()
+            .dedup()
+            .count();


we could just over-allocate here using account keys lengths, instead of doing an intermediate allocation for the sorting and deduping.

that will definitely give us enough room, and even in our worst expected case (for validator) of 64 txs using 64 accts it's only 4096 pubkeys.

apfitzge · 2024-10-30T15:22:06Z

svm/src/transaction_processor.rs

+                    Ok(tx_details)
+                })
+                .and_then(|tx_details| {
+                    // Now validate the fee-payer for all transactions.


comment seems misleading here, this is validating a single transactions.

apfitzge · 2024-10-30T16:12:44Z

svm/src/account_loader.rs

+                // NOTE if this account could be found in the program cache, we must
+                // substitute its size here, to perserve an oddity with transaction size
+                // calculations. However, we CANNOT skip the accounts cache check,
+                // in case the account was non-executable and modified in-batch.


nit:

Suggested change

// NOTE if this account could be found in the program cache, we must

// substitute its size here, to perserve an oddity with transaction size

// calculations. However, we CANNOT skip the accounts cache check,

// in case the account was non-executable and modified in-batch.

// NOTE if this account could be found in the program cache, we must

// substitute its size here to preserve an oddity with transaction size

// calculations. However, we CANNOT skip the accounts cache check,

// in case the account was non-executable and modified in-batch.

jstarry · 2024-10-30T16:07:22Z

svm/src/account_loader.rs

@@ -95,6 +97,260 @@ pub struct FeesOnlyTransaction {
    pub fee_details: FeeDetails,
 }

+#[cfg_attr(feature = "dev-context-only-utils", derive(Clone))]
+pub(crate) struct AccountLoader<'a, CB: TransactionProcessingCallback> {
+    pub program_cache: ProgramCacheForTxBatch,


nit: vis can be reduced to pub(crate)

jstarry · 2024-10-30T16:11:34Z

svm/src/account_loader.rs

+        usage_pattern: AccountUsagePattern,
+    ) -> Option<LoadedTransactionAccount> {
+        let is_writable = usage_pattern == AccountUsagePattern::Writable;
+        let is_invisible = usage_pattern == AccountUsagePattern::ReadOnlyInvisible;


nit: how about is_invisible_read?

jstarry · 2024-10-30T18:45:45Z

svm/src/account_loader.rs

+            } else if let Some(program) = (is_invisible && is_owned_by_loader)
+                .then_some(())
+                .and_then(|_| self.program_cache.find(account_key))
+            {


I think there's another bug here. We only add new entries to the program cache during batch processing if an loader account is closed, upgraded, deployed, or extended. In the case where one transaction creates a new loader owned account which is not deployed, it should still be added to the program cache as a tombstone entry to preserve the program cache loaded account data size behavior. Otherwise, a later transaction with an invisible read to that recently created loader account would load the actual account data size rather than 0 bytes because there wasn't an entry in the program cache.

jstarry

After reviewing this for awhile I am pretty nervous about cache bugs popping up in this new approach with its new layer of caching. Even if we find all of the bugs now, I feel like there's a pretty high risk of new bugs being introduced. I feel like it might be better if the SVM entrypoint only handles one transaction at a time

jstarry · 2024-10-30T19:45:59Z

svm/src/account_loader.rs

+                // NOTE this will be changed to a `load_account()` call in a PR immediately following this one
+                // we want to stop pushing loaders on the accounts vec, but tests need to change if we do that
+                // and this PR is complex enough that we want to code review any new behaviors separately
+                if let Some(owner_account) =
+                    account_loader.callbacks.get_account_shared_data(owner_id)
+                {


Since we're not using load_account here we have to be sure that we aren't loading an account that was modified earlier in the batch. Since we are only ever fetching owner accounts for executable programs, these owner accounts are necessarily either the native loader or one of the bpf loaders and none of these are able to be modified so this is safe from my understanding

apfitzge · 2024-10-30T20:25:50Z

I feel like it might be better if the SVM entrypoint only handles one transaction at a time

If we do this we need all batches of transactions executed to of size 1 as well; this really hurts overall performance because of the way much of our current execution pipeline is designed.

We've talked about this separately before and I think you were there(?). Long term it may make sense for us to move to a streaming model (not batched), but right now there's a bunch of overhead because of the way our processing pipeline has been implemented/designed. Just using simple transfers as an example (since we have a bench for it)

test bench_execute_batch_full_batch                        ... bench:     378,039.07 ns/iter (+/- 28,068.65)
test bench_execute_batch_full_batch_disable_tx_cost_update ... bench:     365,141.14 ns/iter (+/- 20,371.96)
test bench_execute_batch_half_batch                        ... bench:     389,184.90 ns/iter (+/- 16,216.87)
test bench_execute_batch_half_batch_disable_tx_cost_update ... bench:     376,606.51 ns/iter (+/- 36,538.17)
test bench_execute_batch_unbatched                         ... bench:     819,354.70 ns/iter (+/- 41,285.52)
test bench_execute_batch_unbatched_disable_tx_cost_update  ... bench:     817,550.00 ns/iter (+/- 115,051.66)

running same number of txs individually instead of in batches of 32 or 64 is more than 2x slower. This obviously doesn't translate directly to mnb load, which we can currently handle in an unbatched approach, but at this point it seems premature (to me) to get rid of the batching, given the overhead we observe.

2501babe · 2024-10-31T04:03:21Z

i agree the cache interactions are worrying, which is why i wanted to feature-gate this

i think the account cache approach is, in and of itself, conceptually sound. the difficulty all comes down to preserving existing bugs with how program cache misuse interacts with reader/writer and writer/writer batches. these fall under two broad, and partially overlapping, categories:

account loader does not check program cache tombstones
program cache reports incorrect sizes for a variety of accounts

the first class includes the case where we need to execute non-executable accounts, and is fixed by fee-only transactions because loading failures (other than the fee-payer) and execution failures become identical from the point of view of consensus. the second class is intended to be fixed by simd186

once those features are live then the caches are effectively independent. we always get instruction accounts from the accounts cache, we always get non-instruction programs from the program cache, and the sizes always agree so there is no need for escape hatches where we go to one rather than the other

in other words this is only risky if we were to relax account locks before activating those two features. this isnt to say "that means we can merge this even if we arent 100% sure it will be correct since none of the difficult code paths will be hit anyway." the goal is to merge something correct. im just defending the account cache approach over single-batching

i think if we decide we cant merge this pr, we should either:

consider simd83 blocked by the fee-only transaction and transaction data size features
put the account cache loader behind a feature gate and handle tombstones correctly in it. with a feature gate, we dont need to worry that some of the transaction data size totals change, or that certain execution failures turn into loading failures. we can implement simd186 at our leisure without blocking simd83 from moving forward

if we go that route i believe we can split this into two prs (which i didnt realize before, since it was actually impossible in the previous batched account loader pr and in early states of this pr):

all the changes to the transaction processing loop (validate fees and nonces before each transaction load, load before each transaction execution) plus the AccountLoader struct which encapsulates account loading, with no account cache. no feature gate
add the account cache. this is feature-gated. perhaps we could do this in the same pr as we relax account locks to have only one feature gate for simd83

jstarry · 2024-10-31T05:34:26Z

I think my initial reaction was too strong.. I think the accounts cache makes a lot of conceptual sense too. I'd like to change/clarify my suggestion not as getting rid of batching entirely, but just inside the SVM (starting at the entrypoint function load_and_execute_sanitized_transactions). Inside load_and_execute_sanitized_transactions it doesn't seem like there's a lot of benefit of batching anyways, could be wrong. The block_processor bench you pointed out does transaction checks and commits which would still probably benefit from batching and would require an accounts cache still.

So what I'm suggesting is basically what @2501babe is saying with

all the changes to the transaction processing loop (validate fees and nonces before each transaction load, load before each transaction execution) ..

but also setup a new program cache before each transaction as well.

Making the SVM entrypoint operate on a single transaction at a time will more clearly delineate responsibilities. It's weird to me that the SVM has to internally update the passed program cache to prepare for executing the next transaction in the passed batch. This should all happen before the SVM is invoked which is basically what is happening in master before this PR. This PR is forcing the SVM to be smarter than it should be by trying to reuse the program cache for the entire batch.

So the tx pipeline can still do some batched tx checks before execution (and batched commit afterwards) but anything requiring account state should be done without batching because otherwise previous transactions with conflicting write locks could interfere with later transactions. Does that feel like a reasonable way of looking at the problem?

jstarry · 2024-10-31T05:45:20Z

Oh and there's another looming issue with SIMD 83 and the status cache.. Right now we check if transactions are in the status cache before processing a locked batch and then after processing that batch, we update the status cache before the next batch. This only works because we know that if two transactions have the same message hash, they both must have at least one writable account and therefore could not be in the same batch together. If we allow conflicting write locks in batches, this mechanism breaks and we could allow duplicate transactions to be included in a single batch. So we probably need a batch-level status cache too.

EDIT: this is actually probably out of scope for this PR since it's outside of the SVM

2501babe · 2024-10-31T07:34:09Z

i made a draft pr that has all the transaction processing and account loader changes in this one, except the account cache: #3404. consider it illustrative until we decide how we want to handle the simd83-relevant changes, but its nice we can actually break this in half. depending on the approach we take we could merge that one and then do the trickier work separately

but also setup a new program cache before each transaction as well

starry and i discussed on slack and we think this could work on a technical level but has unknown performance implications. this pr was doable without benches because it performs strictly fewer account loads in all cases, but if we set up a per-tx instead of a per-batch program cache it will load a lot more. if what brooks says that only the first accounts-db access of an account in a block is expensive, this could be fine. but we need to write good svm benches before we can consider this

It's weird to me that the SVM has to internally update the passed program cache to prepare for executing the next transaction in the passed batch. This should all happen before the SVM is invoked which is basically what is happening in master before this PR.

mentioned on slack that it already does this in master but i wasnt sure why, because it isnt passed to anything or returned. digging more into it, it looks like this:

pub type ProgramRuntimeEnvironment = Arc<BuiltinProgram<InvokeContext<'static>>>;

means they share the same backing storage, hence why the call to evict_using_2s_random_selection() doesnt need to accept the batch-local cache as a parameter to use it for updates

EDIT: this is actually probably out of scope for this PR since it's outside of the SVM

right, svm has no concept of duplicate transactions at all. i havent looked at this code yet but i believe we would just need to check a batch has no duplicate message hashes at the same time we check the status cache. since the message contains the blockhash these can still be considered duplicates

apfitzge · 2024-10-31T14:24:54Z

Oh and there's another looming issue with SIMD 83 and the status cache

Yeah for sure. We need to dedup transactions but it should be done much before SVM and even before we do the status_cache checks.

2501babe mentioned this pull request Oct 11, 2024

svm: allow conflicting transactions in entries #2702

Closed

2501babe force-pushed the 20241010_svmconflicts_attempt4 branch from beb17b3 to b64b265 Compare October 14, 2024 15:15

2501babe self-assigned this Oct 14, 2024

2501babe mentioned this pull request Oct 15, 2024

svm: improve integration test framework for SIMD83 #3181

Merged

2501babe force-pushed the 20241010_svmconflicts_attempt4 branch 5 times, most recently from fe15156 to 4394f5a Compare October 17, 2024 21:21

2501babe commented Oct 17, 2024

View reviewed changes

2501babe force-pushed the 20241010_svmconflicts_attempt4 branch from 4394f5a to 18141ec Compare October 17, 2024 21:45

2501babe commented Oct 17, 2024

View reviewed changes

svm/src/account_loader.rs Outdated Show resolved Hide resolved

2501babe commented Oct 17, 2024

View reviewed changes

2501babe commented Oct 18, 2024

View reviewed changes

2501babe marked this pull request as ready for review October 18, 2024 13:37

2501babe requested review from apfitzge and jstarry October 18, 2024 13:37

2501babe force-pushed the 20241010_svmconflicts_attempt4 branch from 18141ec to 813b353 Compare October 18, 2024 15:38

apfitzge reviewed Oct 21, 2024

View reviewed changes

2501babe force-pushed the 20241010_svmconflicts_attempt4 branch 3 times, most recently from 8287561 to 2364af1 Compare October 22, 2024 19:46

2501babe mentioned this pull request Oct 23, 2024

sdk: implement AccountSharedData::is_default() #3289

Closed

jstarry requested a review from brooksprumo October 25, 2024 21:01

apfitzge reviewed Oct 25, 2024

View reviewed changes

2501babe force-pushed the 20241010_svmconflicts_attempt4 branch 2 times, most recently from 4628fc0 to 3067871 Compare October 30, 2024 09:40

2501babe force-pushed the 20241010_svmconflicts_attempt4 branch from 3067871 to 4a0e5ba Compare October 30, 2024 09:46

2501babe commented Oct 30, 2024

View reviewed changes

2501babe requested review from apfitzge and jstarry October 30, 2024 12:33

apfitzge reviewed Oct 30, 2024

View reviewed changes

2501babe added 7 commits October 30, 2024 08:23

svm: allow conflicting transactions in entries

89a87d7

fix tx processor unit tests

3350328

fix account loader unit tests

1fb89e2

add integration tests from old branch

5405ceb

more integration test work

2bc8322

improve loader cache size tests

9092569

more tests for code review

9ec457b

2501babe force-pushed the 20241010_svmconflicts_attempt4 branch from dd1548a to 9ec457b Compare October 30, 2024 15:24

apfitzge reviewed Oct 30, 2024

View reviewed changes

jstarry reviewed Oct 30, 2024

View reviewed changes

2501babe mentioned this pull request Oct 31, 2024

svm: interleave transaction validation and processing #3404

Open

		let next_durable_nonce = DurableNonce::from_blockhash(environment_blockhash);

svm: allow conflicting transactions in entries #3146

Are you sure you want to change the base?

svm: allow conflicting transactions in entries #3146

Conversation

2501babe commented Oct 11, 2024 • edited Loading

Problem

Summary of Changes

Choose a reason for hiding this comment

2501babe Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2501babe Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

Huzaifa696 commented Oct 18, 2024

2501babe commented Oct 18, 2024

apfitzge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apfitzge Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2501babe commented Oct 23, 2024

jstarry commented Oct 25, 2024

Choose a reason for hiding this comment

2501babe commented Oct 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jstarry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apfitzge commented Oct 30, 2024

2501babe commented Oct 31, 2024 • edited Loading

jstarry commented Oct 31, 2024

jstarry commented Oct 31, 2024 • edited Loading

2501babe commented Oct 31, 2024 • edited Loading

apfitzge commented Oct 31, 2024

2501babe commented Oct 11, 2024 •

edited

Loading

2501babe Oct 17, 2024 •

edited

Loading

2501babe Oct 18, 2024 •

edited

Loading

apfitzge Oct 22, 2024 •

edited

Loading

2501babe commented Oct 31, 2024 •

edited

Loading

jstarry commented Oct 31, 2024 •

edited

Loading

2501babe commented Oct 31, 2024 •

edited

Loading