Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to significantly lower cost of calling WebAssembly blueprints and components #2014

Open
bbarwik opened this issue Nov 23, 2024 · 6 comments

Comments

@bbarwik
Copy link
Contributor

bbarwik commented Nov 23, 2024

Interactions between WASM packages are far too expensive in the current implementation.

Let's assume we have a component LendingMarket which has multiple LendingPool components, one LendingPool for every token. Now let's assume we have 20 LendingPool components, with the blueprint being the same for each of them.

When LendingPool needs to read a balance from all LendingPool components, it has to pay fees 20 times for:

  • Opening substate (OpenSubstate::GlobalPackage)
  • Reading substate (ReadSubstate)
  • Preparing WASM code (PrepareWasmCode)

The cost of these operations is very high because all packages have a lot of bloat and they have more than 200 KB of data even with the most basic blueprints. In practice, the cost of such a transaction may easily exceed the max fee limit for transactions. Compared to native components, calling WebAssembly components is several times more expensive due to this.

Here's one very important thing - in most cases, these operations are not even needed because WebAssembly packages are cached! At this moment, the default cache for WebAssembly packages is around 2 GB, which means every WebAssembly package is cached forever after the first load.

Due to this, I propose the following: We should assume that all WebAssembly packages are cached and all fees mentioned above regarding loading WebAssembly packages should be removed (they are in radix-engine/src/vm/vm.rs). This change will significantly lower fees for interaction with WebAssembly components without sacrificing network performance.

The only case in which a validator will have higher CPU usage is when a package is not loaded and must be cached for the first time. Because there is no way to determine when a validator loads it for the first time (it will need to do that every time after restart), there's no way to apply the correct fee for that. In future releases, I recommend that validators load all packages into cache on start, which should be possible to achieve within a few seconds by using multiple threads. At this moment, it seems impossible to use the whole 2 GB cache; it would require more than 2000-5000 unique packages.

The implementation of loading packages in vm.rs should also be optimized. Right now, it is always loading the full code of a package from memory, whether it's cached or not. The reason behind this is that every node can have different cached packages, so it is assumed that packages are never cached when it comes to fees. If we assume that all packages are cached, then this step can be skipped and packages can be loaded directly from cache using their hash. If a package is not cached, then it should be loaded and cached but without any fees.

In the future, this process can be highly improved by, as I said, loading all packages during node startup and compressing packages because 90% of their bytecode is the same. For now, I just recommend removing these fees and always assuming the package is cached (and if it isn't, just cache it without any fees).

I believe this change is necessary to compete with protocols like SUI, which have an advantage when it comes to package size by using MoveVM instead of WebAssembly. Transactions like this one, where someone is interacting with 4 components with exactly the same WebAssembly code, should not cost 4.2 XRD.

@beemdvp
Copy link
Contributor

beemdvp commented Nov 24, 2024

The tx you shared is on the manifest level. I can take a guess that if you change the method to accept an array of data and loop through the state with updated values, it will costs significantly less. Not sure if you've tried that. The cost of calling methods/functions at the manifest level is significantly more than on the internal scrypto implementation level

@bbarwik
Copy link
Contributor Author

bbarwik commented Nov 24, 2024

The tx you shared is on the manifest level. I can take a guess that if you change the method to accept an array of data and loop through the state with updated values, it will costs significantly less. Not sure if you've tried that. The cost of calling methods/functions at the manifest level is significantly more than on the internal scrypto implementation level

Hey @beemdvp. The transaction I posted isn't mine, it's just an example. If the same operation was done by 3th party blueprint instead of manifest it would cost exactly the same. I talked with many scrypto developer regarding this issue and they agree that this is an issue preventing creation of more complex logic because all blueprints are limited to maybe 10-20 calls to other blueprints during single transaction because they quickly reach the max gas limit for transaction. The other protocols which don't use WebAssembly don't have this problem so we shouldn't punish devleopers and users for choosing WebAssembly as our virtual machine.

@fpieper
Copy link
Contributor

fpieper commented Nov 24, 2024

Thanks for writing it up @bbarwik . I fully support that. That’s the reason why we needed to builtin the price oracle into our Ociswap pool blueprints directly instead of attaching it as a separate hook component.

Cross component calls are basically not usable right now besides on methods that are not called very often. Definitely not for something that is being used on a regular basis. For example effectively impossible to call other components in a swap method which needs to be fairly cheap to be competitive.

In general I also agree that transactions are too expensive and you are running into limits sooner than you would like to.

@dhedey
Copy link
Contributor

dhedey commented Nov 29, 2024

Hi @bbarwik thanks for raising this. This was something we were considering before babylon launch, and coincidentally we've been looking at things like this internally recently, doing detailed perf profiles. We'll talk about this more internally over the next week or two - but I can share some personal reflections in the meantime.

As we try to work out costing and optimizations, we need to find a balance between expected case, and worst case; and what can be assumed at Babylon (where we'd like to maximize throughput of a single chain), and what can be assumed at Xi'an (when at some point, storing every package in a memory cache may not be possible). Of course, by then we may also explore an alternative VM, which might have different trade-offs.

One thing we can reasonably do is only load / parse each blueprint once per transaction. I don't know to what extent this would improve the average transaction, but it would probably give builders quite a bit more flexibility.

It might be possible to take this further, and assume all packages are cached, but this may have knock-on effects for node runner requirements; and if we can't fit them all in memory, we might need to mitigate malicious cache-rolling attacks which could slow down the network.

Possible mechanisms off-hand (and I haven't thought very hard about this) might be something like upping the cost to publish packages; or consider some mechanism to decide which subset of packages get the "always cached, cheaper costs" treatment. (e.g. package owners could pay some XRD maintenance fee to the network - this could even be hooked up to the royalty system somehow).

@fpieper
Copy link
Contributor

fpieper commented Nov 29, 2024

@dhedey thanks for your response 👍

One thing we can reasonably do is only load / parse each blueprint once per transaction. I don't know to what extent this would improve the average transaction, but it would probably give builders quite a bit more flexibility.

The problem I see here is that though it makes it easier to have multiple instructions in the same transaction, it does not solve the core problem for the smart contract developers. They will still need to move code into one blueprint / package and not use cross-component calls because especially for "normal" transactions with one instruction there would be no fee benefit. However, ofc charging the fees only once per transaction makes absolutely sense and could be one part of the solution.

Overall I would say the core issue is maybe not the caching itself but that splitting up one larger component into two smaller ones with similar amount of code and logic has significant overhead for transaction fees. This is imo the root cause which needs to be solved. Just throwing in some numbers: if transaction fee overhead is more than 5% devs probably start with trading a worse blueprint architecture for lower fees. Imo there is no fundamental reason to charge less for one larger component compared to two smaller ones - what needs to be done here is to optimise the minimum cost per package.

To summarise these two features could provide a good solution:

  1. significantly reducing the fee overhead for splitting up large components into smaller ones
  2. charging transaction fees only once per transaction (not as important as (1) but definitely a nice improvement for batching)

This would imo also work nice with Xian. I see your concerns regarding Xian and that naive caching could lead to issues. Also caching and especially cache invalidation could be tricky because maybe some blueprints are developed in a way that require that their blueprints are cached and once they are not anymore the transaction fails because they are running into fee limits ^^.

Btw. reducing the fee overhead for splitting up components also highly affects the royalty system design goals of having a composable ecosystem of blueprints. It just doesn't make sense right now to use other blueprints as building blocks to speed up development of your own dapp (besides simple cases of I instantiate a single component of someone else's blueprint) - the transaction fee overhead is just too high. It works nicely for the native pools which really shows the potential strength. So yeah great idea but the cross-component overhead is a show stopper for non native components unfortunately.

@dhedey
Copy link
Contributor

dhedey commented Nov 29, 2024

@fpieper - thanks, lots of good food for thought.

not use cross-component calls

Just to clarify - under this model cross-component calls to the same blueprint would also be cheaper. So you could potentially split up large components, as long as they are in the same package.

But yes, you'd still hit issues in cross-blueprint calls and upgrade scenarios of sharing code between blueprints.

Imo there is no fundamental reason to charge less for one larger component compared to two smaller ones - what needs to be done here is to optimise the minimum cost per package.

Well there is potentially more code to load/parse/compile etc, because of lots of shared boilerplate gets duplicated between WASM modules (e.g. std library, scrypto lib, indexmap crate ...). As bbarwik says, having a large in memory cache might mitigate this.

Ultimately the point of the cost unit limits and fees is to align with the amount of network time spent dealing with a transaction, so we need to ensure that any changes to the billing model are justifiable by that metric.

There are lots of possible directions this could go (from lower effort to higher effort), and I think we'd need to explore the pros and cons of each. Some of them have hidden cons related to constraining future changes, which also need to be considered:

  • Making better use of the cache to avoid Code substate reads + initializations & blueprint definition reads (to improve throughput but not affect costing yet)
  • Assuming a large cache; possibly upping the memory requirements of node runners; but allowing us to avoid billing for such substate reads and initializations.
  • Investigating if we can trim the size of shared boilerplate further
  • Looking into some kind of wasm components (in the Wasm Component Model) for shared boilerplate which could be bound to; decreasing package size
  • Investigating alternative VMs - e.g. I know the Polkadot team are doing some cool work here
  • Investigating alternatives to SBOR which might be lower in code size

Note that in Cuttlefish, we've already have some execution speed improvements and tweaked execution costs which should make cost units go a little further. But yes, package load costs are the next big frontier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants