Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiled wasm crashes on Rust 1.78.0, but not on 1.77.2 #443

Closed
lucasavila00 opened this issue Jun 1, 2024 · 8 comments
Closed

Compiled wasm crashes on Rust 1.78.0, but not on 1.77.2 #443

lucasavila00 opened this issue Jun 1, 2024 · 8 comments

Comments

@lucasavila00
Copy link

lucasavila00 commented Jun 1, 2024

To reproduce:

Clone this repository https://github.com/lucasavila00/bindgencrash

It contains a wit world that has a function that receives a string and returns i32

The function just returns 1 and never reads the string.

When I call the compiled wasm with some specific string, the wasm code crashes.

$ npm run build
$ npm run test


> [email protected] test
> tsx tests/test-e2e.js

RuntimeError: unreachable
    at rust_html_bindgen.wasm.__rust_start_panic (wasm://wasm/rust_html_bindgen.wasm-00575482:wasm-function[41]:0x2dea)
    at rust_html_bindgen.wasm.rust_panic (wasm://wasm/rust_html_bindgen.wasm-00575482:wasm-function[39]:0x2db8)
    at rust_html_bindgen.wasm._ZN3std9panicking20rust_panic_with_hook17h32c80a64fe4de396E (wasm://wasm/rust_html_bindgen.wasm-00575482:wasm-function[38]:0x2daa)
    at rust_html_bindgen.wasm._ZN3std9panicking19begin_panic_handler28_$u7b$$u7b$closure$u7d$$u7d$17hd496964d114e98b9E (wasm://wasm/rust_html_bindgen.wasm-00575482:wasm-function[28]:0x2523)
    at rust_html_bindgen.wasm._ZN3std10sys_common9backtrace26__rust_end_short_backtrace17h0d4686a7fe3981a4E (wasm://wasm/rust_html_bindgen.wasm-00575482:wasm-function[27]:0x244f)
    at rust_html_bindgen.wasm.rust_begin_unwind (wasm://wasm/rust_html_bindgen.wasm-00575482:wasm-function[33]:0x2a39)
    at rust_html_bindgen.wasm._ZN4core9panicking9panic_fmt17hc7427f902a13f1a9E (wasm://wasm/rust_html_bindgen.wasm-00575482:wasm-function[47]:0x2ed7)
    at rust_html_bindgen.wasm._ZN4core9panicking5panic17hb157b525de3fe68dE (wasm://wasm/rust_html_bindgen.wasm-00575482:wasm-function[48]:0x2f29)
    at rust_html_bindgen.wasm.__rdl_dealloc (wasm://wasm/rust_html_bindgen.wasm-00575482:wasm-function[31]:0x2636)
    at rust_html_bindgen.wasm.__rust_dealloc (wasm://wasm/rust_html_bindgen.wasm-00575482:wasm-function[3]:0x18e)

I wonder if this is related to the dlmalloc change alexcrichton/dlmalloc-rs#32

This change broke wasm-bindgen (see rust-lang/rust#124671, rustwasm/wasm-bindgen#3801, rustwasm/wasm-bindgen#3808)

Is wit-bindgen also affected by this?

It looks similar to the related issues, as this crash happens if a specific input String is used (eg: https://github.com/lucasavila00/bindgencrash/blob/main/tests/test-e2e.js#L14-L17) and the issue would happen when deallocating such strings.

I know if happens with wit-bindgen 0.16.0 and 0.26.0

@lucasavila00 lucasavila00 changed the title Crashes on Rust 1.78.0, but not on 1.77.2 Compiled wasm crashes on Rust 1.78.0, but not on 1.77.2 Jun 1, 2024
@alexcrichton alexcrichton transferred this issue from bytecodealliance/wit-bindgen Jun 1, 2024
@alexcrichton
Copy link
Member

Thanks for the report! I can reproduce this myself but I've moved this now from the wit-bindgen repository to the jco repository. I believe that this is a bug in jco's string encoding implementation currently. The allocation at the end needs to precisely fit the length of the string so the over-allocated size is what's causing the issue here. This is basically the exact same bug as what came up in wasm-bindgen, just in a different context.

The Canonical ABI is pretty strict about the allowable behaviors here in terms of how strings are lowered. @guybedford I think you'll want to transcode the store_utf16_to_utf8 algorithm in that document for lowerings strings from JS into the utf-8 encoding which would involve tripling the .length of the string and allocating that number of bytes, shrinking afterwards if it's too large.

That being said I know in the past in my work on wasm-bindgen it's much more beneficial to assume ascii most of the time and then pessimize later with another allocation. In that sense the best algorithm to implement I don't think is allowed by the component model right now. See store_string_into_range which is more-or-less the host (js in this case) is currently required to do one of those algorithms and technically isn't allowed to deviate. (my guess though is that in practice most hosts will probably deviate from the spec there though since it's nontrivial to implement exactly as-is).

Also cc @lukewagner on this since this is an interesting case coming up spec-wise. The summary is that Rust's debug assertions are catching a case where jco didn't shrink the size of a string allocation during lowering to the exact size of a string. That's easily fixable but the point I'd moreso want to raise here is that it's not clear to me how to implement "the most efficient way of transferring a string to wasm from js" given the current component model spec. For example in wasm-bindgen the current implementation looks like this (originally from rustwasm/wasm-bindgen#1736 I believe) and looks like (a) assume ascii and copy byte-by-byte and failing that (b) use encodeInto with TextEncoder-specific bits.

@lukewagner
Copy link

Thanks for pointing this out. So the optimistic initial allocation does assume that each code point is ASCII and fits in a byte (hence allocating src_code_units bytes). But perhaps the problem you're referring to is that the initial copy into this optimistically-sized buffer doesn't stop as soon as it hits non-ASCII -- it stops once it fills up the entire buffer. Thus maybe what want instead is a manual loop that copies up to the first non-ASCII code point and then immediately does the realloc before going farther; would that match the ideal JS glue?

@alexcrichton
Copy link
Member

Sort of yeah but not precisely. It's mostly that the rigidity of writing down the exact method of how string translation should happen can be limiting in some contexts. JS here is an example where, according to wasm-bindgen, the fastest way to copy a string is to (a) copy manually in JS without TextEncoder until a non-ascii byte is encountered and (b) fall back to using TextEncoder and resizing trickery. Such an algorithm I believe was measured to be optimal for ascii strings, and my guess is that lots of things on the web are ASCII like all DOM element names and properties and such. (user-input strings less so depending on where you are)

This specific algorithm for JS can't be implemented for components as it's not specified in the canonical ABI. That's not a break-the-bank situation per se but I mostly wanted to highlight this as a potential weakness.

@lukewagner
Copy link

If we were to change the Canonical ABI to match what JS wants to do, it seems like it would be no worse, and possibly better, for all the other host embeddings, so we could just do that (and probably this is what I should've started with). If we found a fundamental, unresolvable tension between two observably different string-copying algorithms, we could add more non-determinism to the spec to give hosts more wiggle room, but given the general portability benefits of deterministic semantics, I think we should build up that motivation with some concrete examples first.

@alexcrichton
Copy link
Member

That's true yeah, and I think concretely this would change the utf16 -> utf8 path to assume ascii first and then only start allocating one non-ascii was hit?

@lukewagner
Copy link

Yep! It'd be a pretty small tweak to the existing Python code, I think; I can work up a patch in a week.

@lukewagner
Copy link

Ok, filed component-model/#366

@guybedford
Copy link
Collaborator

As far as I understand the issue, I believe this was resolved in #444 and we can shortly update back to being zero copy per #448 for the next version of Wasmtime.

Please correct me if I've misinterpreted any of the above, and otherwise I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants