Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving WASM Smart Contract Debugging and Code Coverage in near-workspaces-rs #324

Open
njelich opened this issue Oct 11, 2023 · 28 comments

Comments

@njelich
Copy link

njelich commented Oct 11, 2023

Issue Description

I am seeking to improve the debugging, profiling, and code coverage capabilities for WebAssembly (WASM) smart contracts using the near-workspaces-rs library. This issue serves as a discussion point for potential approaches to implement these features.

One potential solution involves the use of minicov, a crate that provides code coverage and profile-guided optimization (PGO) for no_std and embedded programs such as WASM smart contracts. Minicov achieves this by utilizing a modified version of the LLVM profiling runtime, from which all dependencies on libc have been removed.

However, minicov relies on outputting profraw files for its functionality. By default, near-workspaces-rs and NEAR do not allow file writes as a feature, creating a barrier for using minicov in this context.

Proposed Solution

To overcome this barrier, I propose that near-workspaces-rs or NEAR should allow a special debugging mode that permits file writes or some other output channel where these profraw files can be piped. This would enable the profraw files to be processed with llvm-cov, thus allowing actual code coverage of WASM smart contracts on the NEAR runtime to be measured.

This is particularly important as the general coverage developers can currently get is not actually on the NEAR runtime, and this caveat might be missed by those developing near smart contracts unfamiliar with Rust. For example, if only running unit tests, the #[payable] attributes might be completely missed on some functions that are thought to be tested based on default coverage reporting, but are missed in the workspace-rs tests.

Discussion Points

I would like to open a broader discussion on this topic and outline other similar cases. Some potential discussion points could include:

  • Pros and cons of allowing file writes or other output channels in near-workspaces-rs or NEAR.
  • Other potential solutions for improving code coverage and debugging capabilities.
  • The impact of these changes on the overall performance and security of the system.

I look forward to hearing the community's thoughts and ideas on this matter.

@michaelbajor
Copy link

I do like the idea! Absolutely, having a code coverage would absolutely be desirable!

I believe that the way workspaces crate works is that it is launching a local node (sandbox), deploys a compiled wasm and then sends transactions via JSON RPC.

Thinking out loud here: a potential solution would be to measure "how much binary" was executed in each call and "what part" of it. Then, calculating a ratio of executed parts to all of the binary size would give at least an estimation on the code coverage.

But it seems like it should be implemented in the sandbox itself, not in the workspaces crate.

@njelich
Copy link
Author

njelich commented Oct 11, 2023

That is more or less how LLVM works - the thing is, WASM is an embedded environment, so normal rust tooling for LLVM doesn't work (it is a no profiler_builtins environment). This is where minicov comes in.

But because the environment is even more restricted than that (runs in VM with size limits and no file write) it's extra tricky.

@frol
Copy link
Collaborator

frol commented Oct 11, 2023

Thanks for bringing it up! This makes total sense to me. minicov seems to be exactly what is needed, I believe it should be added as a feature to near-sdk-rs, so it can add instrumentation as part of the macro expansion, and dump the profiling results via env::log_str. The question, however, would be the size of that profiling data, as there is a limit of how much logs can handle, though we can consider raising the limits for the local sandbox instance of nearcore.

@njelich Would you like to champion the implementation? I am happy to help you to find the relevant resources and integration points.

@njelich
Copy link
Author

njelich commented Oct 11, 2023

I'd be happy to. Right now I think it may be over the limit the logs can handle, thats why I thought adding a file write for debugging would be useful.

EDIT: I had a play around with setting it up - currently getting.

Error: unable to fulfill the query request

Caused by:
    handler error: [Function call returned an error: wasm execution failed with error: HostError(GasLimitExceeded)]

Which is purely due to the size of minicov - so a coverage run of near-workspace-rs would have to use neard with disabled gas measurement or with max_gas_burnt_view set to a high amount - which can be set when initializing a custom sandbox node.

@frol
Copy link
Collaborator

frol commented Oct 11, 2023

Right now I think it may be over the limit the logs can handle, thats why I thought adding a file write for debugging would be useful.

Actually, after a second thought, you can just write the data to the contract storage (i.e. env::storage_write("PROFILING_DATA", data)), and then view the storage after the call. Though, it won't work for view-function calls, but in order to test code coverage, view-functions can be called as function calls.

a coverage run of near-workspace-rs would have to use neard with disabled gas measurement or with max_gas_burnt_view set to a high amount

Exactly right 👍

@njelich
Copy link
Author

njelich commented Oct 12, 2023

Regarding running a custom near-sandbox - how would this be done in near-workspaces 0.7.0? Based on a read through of the code it seems impossible right now.

@ghost
Copy link

ghost commented Oct 12, 2023

@njelich you can use NEAR_SANDBOX_BIN_PATH to set your own. Here's some other flags NEAR_ENABLE_SANDBOX_LOG=1 might also be helpful.

I advise for now to use nearcore branch 1.35.0 for your edits as I ran into an issue with the master branch building a custom sandbox here.

@njelich
Copy link
Author

njelich commented Oct 12, 2023

Alright, I've got the tests running on a custom network using near-workspaces 0.8.0 (waiting for release). I managed to make enough progress to get to a new error.

Error: unable to broadcast the transaction to the network

Caused by:
    handler error: [An error happened during transaction execution: InvalidNonce { tx_nonce: 8, ak_nonce: 8 }]

Maybe the parallel test execution is causing problems...

EDIT: Works - I've got my very first .profraw file.

@njelich
Copy link
Author

njelich commented Oct 13, 2023

Okay, I've made some progress but I'm having issues with data limits - right now I have a view function that returns the coverage data, and it seems to be truncating its output (interesting behavior - silently truncates). @frol @shariffdev

@ghost
Copy link

ghost commented Oct 13, 2023

I'll take time to check the source to see if it's something that can be helped.

@ghost
Copy link

ghost commented Oct 17, 2023

@njelich I did not run into truncated output for large outputs, if it got too large it was HostError(GasLimitExceeded) error. Can you provide sample code, I might be missing something.

@njelich
Copy link
Author

njelich commented Oct 18, 2023

Okay, I've got it figured out - there were some deserialization issues.

Next step on my side would be figuring out how to plug in coverage measurement after every function call - for now I have a view function that I use to get coverage - that doesn't work in practice, because it only has its own environment for measurement, but really it should be injected every time there is a function call or view, within the same VM execution process. What would be the best way to do that? @shariffdev @frol

@ghost
Copy link

ghost commented Oct 18, 2023

Since logs/events are out, nothing comes to mind atm. Maybe a helper method? Thoughts @frol

@njelich
Copy link
Author

njelich commented Oct 18, 2023

It would probably require changes to nearcore in that case, to plug into the VM? Or maybe I can write a derive for coverage that adds the function call the end of each existing function call with some output channel.

@frol
Copy link
Collaborator

frol commented Oct 22, 2023

@njelich I believe that plugging in your code through near-sdk-rs derives is the best way to go.

Just for your reference, you can find what kind of code near-sdk-rs macro generates in the snapshots: https://github.com/near/near-sdk-rs/blob/ee5cf867741d6d0d4db15857609b9e9268cc9b32/near-sdk-macros/src/core_impl/code_generator/snapshots/no_args_no_return_mut.snap, so each struct method gets auxiliary pub extern "C" fn function that does state restore and save before and after the method call.

@njelich
Copy link
Author

njelich commented Oct 27, 2023

@frol
Copy link
Collaborator

frol commented Oct 27, 2023

@njelich Yep, that is the right place to look into.

@njelich
Copy link
Author

njelich commented Oct 28, 2023

I've noticed something interesting in the process of trying to modify the near-sdk. Editing the Cargo.toml near-sdk dependency to the revision corresponding to release 4.1.1 breaks builds.

near-sdk = { git = "https://github.com/near/near-sdk-rs", rev = "55020df" }
While it works fine with

near-sdk = "4.1.1"

Any idea what might be happening there? I get errors around types:

error[E0308]: arguments to this method are incorrect
  --> contract/src/ft_receiver.rs:45:8
   |
45 |     fn ft_on_transfer(&mut self, sender_id: AccountId, amount: U128, msg: String) -> PromiseOrValue<U128> {
   |        ^^^^^^^^^^^^^^            ---------             ------ expected `U128`, found `near_sdk::json_types::U128`
   |                                  |
   |                                  expected `AccountId`, found `near_sdk::AccountId`

@frol
Copy link
Collaborator

frol commented Oct 30, 2023

@njelich This error is usually an indicator of that two versions of the same crate were pulled in. Try using patch section in Cargo toml file instead of editing the dependency path.

Also, keep in mind that near-sdk-rs master branch has some breaking changes and will be released as 5.0 soon

@frol
Copy link
Collaborator

frol commented Dec 7, 2023

@njelich Are you in touch with Hacken team?

Hacken folks recently released: wasmcov. The project provides WASM-based code coverage. They have some docs for use on near.

@njelich-hacken
Copy link

well

@ghost
Copy link

ghost commented Dec 14, 2023

I'll update the README and link the docs. @njelich I found this link broken https://github.com/hknio/wasmcov

@njelich-hacken
Copy link

@shariffdev It was private.

1 similar comment
@njelich-hacken
Copy link

@shariffdev It was private.

@bokobza
Copy link

bokobza commented May 30, 2024

Hi @frol, is the plan still to include code coverage in near-workspaces or is wasmcov the way to go?

@frol
Copy link
Collaborator

frol commented Jun 2, 2024

Hacken team received a grant to integrate code coverage solution, but I have not heard from them lately

@njelich
Copy link
Author

njelich commented Jun 2, 2024

I have left Hacken a while ago, but I believe @bbarwik is likely in charge of this task now.

@bbarwik
Copy link

bbarwik commented Jun 25, 2024

Hey @frol @bokobza and everyone else. I almost finished this tool for near, it's in beta phase. The current beta version is available only on github - https://github.com/hknio/wasmcov/ - I'll push it to creates.io by the end of the week. Here's how you can use it right now:

To install use:
cargo install wasmcov
then you get a new command cargo wasmcov

To include it in your near project, add to dependencies:
wasmcov = "0.2"
and in lib.rs add:

#[cfg(target_family = "wasm")]
wasmcov::near::add_coverage!();

Then in your integration tests, if you need to build WASM binaries first run
cargo wasmcov build -- <standard build options>
it will build them for you and place in wasmcov/target dir. You need to have rust nightly installed for it

To run your tests just use
cargo wasmcov test --near=1.35.0
it should work even if your tests have near_workspaces::compile_project("./")

Then to generate coverage report run
cargo wasmcov report

You need to have rust nightly version installed and llvm 18, you can get it from https://apt.llvm.org/

Please let me know what you think and if it worked correctly in your case.

Here's an example report cross_contract.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants