-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion of alternative: byte-granularity deterministic cap #13
Comments
Personally I feel that in trying to think through this proposal further it ends up being on the same level of complexity as custom page sizes. In that sense I don't feel that your bulleted lists capture the full breadth of the impact of having a "cap" on memory instead. The first example which jumped to mind was WebAssembly/wasi-libc#500. That PR is adding an emulation of the POSIX You also say that memory type matching rules wouldn't need to change, but I would personally find that confusing. To me the type-checking of linear memories would get even more complicated than today because each memory type can list an optional maximum page size in addition to an optional maximum byte size. Is it an error if the byte size is specified and the page size isn't? What if the maximum page size is excessively larger than the maximum byte size? Additionally maximum page sizes factor into whether two memory types are considered "matching" so I'm not sure why the maximum byte size wouldn't factor into this calculation. Instead what I'd expect is that the maximum byte size of a linear memory is calculated as an input of the maximum byte size of memory an the listed maximum page size, being the minimum of those two.
I don't think that #3 would really affect this. The only change would be that if you had a function working with a page size it would look like
One thing I want to be sure to call out here is that this proposal still has a required trapping semantics of memory accesses beyond the cap. This means that engines, just like with custom page sizes, can no longer rely on guard pages. My comment here isn't addressing the OP directly but rather additional concerns that the CG has raised on various occasions. The complexity of an engine not being able to rely on guard pages seems inevitable with any proposal to limit the size of memory to less than 64k. I mostly just want to be sure to call out that this proposal is not solving this concern or enabling engines to always use guards, engines will still need to implement a bounds-check-every-memory-access mode. |
I agree with @alexcrichton that having both a max page size and a max byte size seems complicated and error prone, but I'm also sympathetic to the argument that the units of What if we went with custom page sizes to avoid having multiple kinds of max size, but then also added That being said, I think there's not much user-facing difference between adding |
@alexcrichton, I certainly respect your view. And I appreciate that wasmtime has already implemented the current version of this proposal and probably isn't eager to change. In terms of comparative implementation burden, responses below.
One way to handle this would be a
Yes, agreed -- this alternative would add a static byte-granularity limit on the exposed length of the memory data, in addition to the page-granularity limit. It seemed like the minimal thing we could do to satisfy the particular use-case of small embedded environments.
No.
Also no error.
I think it would be workable either way -- it depends if the "cap" would be part of the memtype or a freestanding declaration, which I tried to leave open. If you think it's better to change the matching rules (making it a failure to use a memory with a smaller cap to satisfy an import with a higher cap), fine with me. On the other hand, maybe it would be nice if only the embedded engines have to care about the "cap" at instantiation time (and for everybody else, it's just an issue of execution semantics).
I'm having a hard time thinking of a common situation today where the body of a function in a .o file (parsed as standard Wasm) is expected to be incorrect until relocations are applied from the custom section. My impression was that relocations are required for speedy linking but, if you're willing to parse the code section, not generally required to understand the execution semantics. Maybe I'm not thinking broad enough or unaware of existing dependence on the tool conventions.
Yes, agreed. I don't know how we'd get rid of that if you also want to support a module with a memory of length 1 KiB (or length 17 bytes). |
Just to add a brief "Wasm language-level" perspective, I can echo @tlively in that my bias is to look for solutions that add fewer new concepts to the core spec, which has to be eternally forwards-compatible and therefore grows monotonically in size and complexity. I also like that the custom page sizes proposal sets us up neatly for future memory mapping/protection features that might have to operate per-page - e.g. I could imagine a future memory mapping feature that requires a certain minimum page size, but that would be unwieldy to use with 64k pages. |
Since we expect that custom size is going to be used in specific resource-constrained environments, it is necessary to understand if assumptions we make about this approach hold true in such environments. As an example
How would aforementioned implementation that currently caps memory handle this? I assume it is a baremetal device of some sort, is I'd like to second @ajklein's point one more time that the consumers of a feature aimed at supporting less than 64 KB memories are embedded users and it would be good to make sure the direction of this proposal is aligned with stated use cases. For a forum that is probably better equipped to review this, Embedded SIG has been approved in Bytecode Alliance, though personally maybe even a CG subgroup is needed as this proposal shows that embedded features are not necessarily fall under WASI and Component Model. |
where can i find the implementation? |
@keithw oh to clarify Wasmtime does not yet have an implementation of this proposal, only the I also think that your answers to the questions I raised are reasonable, but I mostly wanted to point out that the simplification of "just add a byte cap to memory" hides intrinsic complexity in even such a seemingly simple proposal. Even within the various questions there's room for debate, for example why would we want to allow both a byte cap and a page cap on memory? Why not require one xor the other? I bring these up again to mostly highlight complexities rather than saying that this should be decided here-and-now. Overall, I'm mostly addressing:
I realize that this is a bit tongue-in-cheek and not meant to be taken literally, but I personally feel that even very small changes in a spec like wasm have lots of complexities to sort through. If we'd need to add
An example of this is: extern char foo[10];
char *bar() {
return foo;
} which when compiled as (module
(type (;0;) (func (result i32)))
(import "env" "__linear_memory" (memory (;0;) 0))
(func (;0;) (type 0) (result i32)
i32.const 0
)
(@custom "linking" (after code) "\02\08\8e\80\80\80\00\02\00\04\00\03bar\01\10\03foo")
(@custom "reloc.CODE" (after code) "\03\01\04\04\01\00")
(@producers
(processed-by "clang" "18.1.2 (https://github.com/llvm/llvm-project 26a1d6601d727a96f4301d0d8647b5a42760ae0c)")
)
(@custom "target_features" (after code) "\02+\0fmutable-globals+\08sign-ext")
) The return value of function 0 here isn't actually 0 at runtime, it's a constant that's filled in by the lld at link-time. On the topic of I ask this because one possible extension of this proposal to the spec is that we could consider memories as being specified in terms of byte sizes rather than page sizes at the "AST level". The current binary format would continue to define memories as a multiple of 64k sizes but in the future there could be an option to specify a min/max with a byte size as well. The complexity here to me is the above question, what to do with |
I generally agree that custom page sizes feels more "wasmy"--it doesn't introduce a new concept, but generalizes an existing one. I think it fits in the same general category of generalizations as multiple memories. I also see it as a road to more fully utilizing extant hardware virtual memory mechanisms. Even with just two page sizes, we will now have encoding space for experimenting with 4 and 8KiB page sizes, which may make a big difference in particular use cases. |
More generally, all relocations (unless I'm forgetting some edge case) are 5-byte LEB-encoded zeroes in object files until they are patched by the linker with 5-byte LEB encodings of the correct value, so it's not possible to discover their "real" values without actually performing linking.
I guess |
For |
@keithw thanks for filing a detailed issue, and no worries about applying stop energy, I think we all just had a genuine misunderstanding. (I'm going to split my reply into two parts: first a general response, and second with more detailed, focused comments.) I think this issue's write up overestimates the amount of work necessary to support the custom-page-sizes proposal and underestimates the amount of work necessary to support this alternative proposal. In particular, I think this alternative doesn't fully consider the impact of language feature composition and reusing existing language concepts versus introducing new language concepts. Many (most?) engines and toolchains aren't only targeting embedded but aim to support Wasm across multiple domains. Additionally, extending and generalizing existing language concepts avoids being forced to answer (and spec and implement) many resolvable-but-annoying questions, like those @alexcrichton is raising. When reusing language concepts, those questions don't even arise in the first place because we already have answers to them via the existing machinery around those existing language concepts. This is important not only in the current moment, but for the future as well since Wasm features are purely additive and once they are standardized and shipped, they must be supported eternally, as @conrad-watt points out. |
What happens when module
This exaggerates the magnitude of the change to these instructions in the custom-page-sizes proposal (replacing a language-wide constant with a value defined in the memory's static type) and downplays the costs of introducing a whole new concept to the language.
Instead of linker changes, this proposal would instead require whole new additions to the core Wasm language: the I agree with what @tlively and @conrad-watt expressed, that in general we should prefer solutions with fewer core language changes over solutions with fewer toolchain changes.
|
One could imagine using an MPU's capabilities for this kind of thing, when available.
The Bytecode Alliance is not a standardization venue. We, the BA, organize implementation work under our umbrella, and might make recommendations and proposals (such as this custom-page-sizes proposal!) to bring to standards venues like the W3C, but it is not appropriate move standards discussions out of their standardization venues and into the BA. |
The binary decoding, binary encoding, text parsing, text printing, validation, test case generation for fuzzing, and Support has not landed in Wasmtime itself yet. |
I'd like to make sure our colleagues who have been shipping devices with smaller memory footprints are included in the discussion, so here comes a list of github handles / tags: @no1wudi, @dongsheng28849455, @xwang98 and @wenyongh. (I apologise if I've missed anyone). Perhaps we could get some insight into the use cases for smaller page sizes. From my own perspective, the device limitations restricted the code structure we deploy, reducing complexity, we are also likely to remove dynamic memory actions (grow/shrink), etc. Other perspectives on limitations usage of smaller page sizes would be useful too. |
A very minor comment:
I just wanted to point out that for this specifically, it's perfectly valid for memory growth to always fail, that that doing this is much preferable to other ways of not supporting growth (e.g. rejecting memory.grow instructions at validation time). |
[...] Hmm, this feels pretty different. The relocation in your example expresses the fact that the linker will (re)locate global variables in memory (it could end up at address 0). The generated .o files are still internally consistent, e.g. if I write: static char foo_[10], bar_[10];
char *foo() __attribute((export_name("foo"))) { return foo_; }
char *bar() __attribute((export_name("bar"))) { return bar_; } ... there are relocations for the addresses of
(module
(type (;0;) (func (result i32)))
(import "env" "__linear_memory" (memory (;0;) 1))
(import "env" "__indirect_function_table" (table (;0;) 0 funcref))
(func $foo (type 0) (result i32)
i32.const 0)
(func $bar (type 0) (result i32)
i32.const 10)
(export "foo" (func $foo))
(export "bar" (func $bar))
(data $foo_ (i32.const 0) "\00\00\00\00\00\00\00\00\00\00")
(data $bar_ (i32.const 10) "\00\00\00\00\00\00\00\00\00\00")) Whereas the current proposal uses a relocation to drop in a constant that's going to have a value of either 1 or 65,536, but unknown when producing the compilation unit so it's represented as 0 in the .o file (a syntactically valid Wasm module). The function doesn't behave correctly until it passes through wasm-ld, even if it never refers to the address of an extern symbol. That feels like a new step? It doesn't seem great, in terms of safety or comprehensibility, to be producing syntactically-valid-but-internally-incorrect Wasm modules and relying on wasm-ld (which afaik has a single implementation) to fix them up. It doesn't look like the group is going to go for the "cap" alternative. I think a big part of my perception of complexity in this proposal comes from the added weight placed on the tool conventions. In an effort to find consensus: How about a version of the current proposal that adds a new const instruction (e.g. |
I'm not very familiar with EDIT: I see this in the overview
I'll defer to the people who'd actually be implementing this, but I'm a little surprised at the ambition here! I understand the argument in #3 about precompiling standard libraries.
If we go forward with this proposal, who would be responsible for the changes to wasm-ld? We should try to have this opinion firsthand. |
@sbc100 has been the primary maintainer of wasm-ld. Sam, can you say how complex you think it would be to add a new relocation type for the page size?
There is precedent for wasm-ld having somewhat sophisticated compatibility rules around used and enabled features. The target features section already has a way to encode "this object file uses target feature X and linking should fail if any other object does not use target feature X," which is exactly what we need in this case AFAICT. (I designed and implemented this feature five years ago and it has never been used until now, so I'm glad I can stop regretting this particular bit of over-engineering.) |
Not sure I subscribe to that. Pretending the availability of some functionality when it doesn't actually work may be less useful than rejecting explicitly and early. |
I would hope a relocation type would not be needed here, and instead we could use a linker-synthetic symbol, similar to |
That would work, but the idea was to use a relocation rather than an in-memory value to allow for easier constant propagation. OTOH, accessing this single value seems unlikely to be very performance sensitive. WDYT? |
It wouldn't be an in-memory value, it would linker-generated constant address. Just like most of the other linker-generated symbols: |
Better still we could use a wasm global for this purpose like we do for |
So if I understand correctly, it seems like Is that high-level summary correct, @sbc100? |
If standardization cannot discussed in that SIG (also @woodsmc take note, as you are the chair of that), then we really need to have representation of the embedded space in W3C, I don't think having the game of telephone between the implementers and the standard is acceptable. I don't know how this representation should look like, whether it is a embedded subgroup or some other form, but we really need it, because we are currently at a risk of adding something we are not sure would work for the intended use cases while being an implementation burden on everyone else.
Apologies for maybe not being clear, this isn't about what one can imagine, rather about what is realistically supported by such implementations today. There is a bit of an inherent tension between mprotect and memories of less that 4KB, that's why it would be great not to base the standard on the former if the latter environment is not expected to support it. |
Yup, correct. |
@ppenzin, speaking as a TSC member of the BA, I want to clarify again that it's perfectly fine to discuss these kinds of things, and in Nick's words "make recommendations and proposals [..] to bring to standards venues" as part of BA-hosted activities. What we want to avoid is to create a situation in which it becomes required to participate in the BA to be able to participate in WebAssembly standardization, as moving the review of this proposal to a BA SIG would do. For context for others: the BA is currently in the process of establishing a SIG-Embedded. While I can only speculate, it seems highly likely to me that Nick would've brought up the topic of how best to handle memory sizes that aren't multiples of 64KB there first if the SIG was already operational. After a discussion there, a proposal to the Wasm CG would then have happened, leading to the same process we have here now.
Can you say more about what "game of telephone" you mean? It seems like after a bit of a slow start on the engagement, this proposal is now getting a lot of input from different implementers. And it seems like this proposal repository as an async forum and the regular CG meetings as a sync one work well for discussing the proposal. Making use of these forums requires active engagement by all interested parties, but that would be the case in whichever forum—and this very issue seems like an example of that engagement. |
Totally agree, but I also acknowledge @ppenzin for calling me, and the folks in the embedded space for not contributing more actively here, cc : @no1wudi, @dongsheng28849455, @xwang98 and @wenyongh. |
I can share what we are doing at the moment for smaller page sizes. The use case, at least of us, I'm not sure for @no1wudi or others. But we're compiling a simple function, usually some user supplied transformation. It typically doesn't actually use the heap at all, and simply needs to safely encapsulate in a platform portable way the function(s) we need to invoke. In this case a full 64kb is over kill, as we basically only need enough space for the stack ~ 4kb - $16kb are good approximations - cc @tacdom ? What we've been doing is compiling Zig, Rust and C to wasm, then converting to .wat and reducing the page count requested manually as initially, C wants 2 pages (128kb), Rust and Zig request 16 pages (1mb), we reduce it to one page. Then convert back to .wasm Then as discussed with @keithw we change the runtime's definition of what a page size is, as we embedded it in the host. Typically the code we're compiling is pretty limited. We're never going to do memory.grow / shrink, etc. In general memory is considered more of less static. |
@woodsmc For us, we have serveral different usage forms:
Generally, we need to specify the size of the linear memory at compile time in our usage. We avoid using the memory.grow instruction because the default page size (64K) is too large for us. If the page size can be configured, for example, to 4K or 16K as needed,we can then use a more standardized approach to handle the heap (e.g. wasi-sdk). |
Hi all,
in general I would say the smaller the better 😉
So 4kB or less would be the general target space imho. Same goes for page count. It looks like the numbers 2 for c and 16 for other languages are more or less constants.
These numbers make sense, but there are cases where these will be overkill. For example if I use rust w/o the standard library, there is no reason to use more pages than C does.
Maybe as a first step, compiler flags would be a nice way to give a developer control over the underlying stuff. If you do not care about that you just do not touch it.
Much better would be if page sizes and page number will be optimized to the application during compilation. But this is a rather tricky task.
Dominik
From: Chris Woods ***@***.***>
Date: Thursday, 6. June 2024 at 16:19
To: WebAssembly/custom-page-sizes ***@***.***>
Cc: Tacke, Dominik (T CED SSI-DE) ***@***.***>, Mention ***@***.***>
Subject: Re: [WebAssembly/custom-page-sizes] Discussion of alternative: byte-granularity deterministic cap (Issue #13)
I can share what we are doing at the moment for smaller page sizes.
The use case, at least of us, I'm not sure for @no1wudi<https://github.com/no1wudi> or others. But we're compiling a simple function, usually some user supplied transformation. It typically doesn't actually use the heap at all, and simply needs to safely encapsulate in a platform portable way the function(s) we need to invoke.
In this case a full 64kb is over kill, as we basically only need enough space for the stack ~ 4kb - $16kb are good approximations - cc @tacdom<https://github.com/tacdom> ?
What we've been doing is compiling Zig, Rust and C to wasm, then converting to .wat and reducing the page count requested manually as initially, C wants 2 pages (128kb), Rust and Zig request 16 pages (1mb), we reduce it to one page. Then convert back to .wasm
Then as discussed with @keithw<https://github.com/keithw> we change the runtime's definition of what a page size is, as we embedded it in the host.
Typically the code we're compiling is pretty limited.
We're never going to do memory.grow / shrink, etc. In general memory is considered more of less static.
—
Reply to this email directly, view it on GitHub<#13 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AOMPFDHQA2BARUA44WCTW73ZGBVXNAVCNFSM6AAAAABICDQTYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJSGY3DGMRVGU>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@woodsmc, @no1wudi, and @tacdom: thanks for the feedback and details on your use cases! It sounds like the custom-page-sizes proposal will indeed provide a standards-based solution for your use cases, allowing you to define memories smaller than 64KiB (and down to even just a single byte, if you wanted that for some reason).
FWIW, you don't need to disassemble to
Great! Configuring a memory's page size is exactly what the custom-page-sizes proposal introduces. Although I should note that the only valid page sizes will conservatively be 1 byte and 64KiB initially, but with a single byte page size, you can create a memory of any 32-bit size, including for example 4KiB and 16KiB memories. |
This issue's original topic of discussion (the per-memory byte-limit alternative to the custom-page-sizes proposal) has been quiet for a couple weeks now. It seems to me like the general consensus is that folks would rather continue pursuing the custom-page-sizes proposal over the per-memory byte-limit alternative. @keithw do you (or anyone else!) have any final comments you wanted to add about the byte-limit alternative? Unless anything new comes up in the next few days, I will schedule a time slot in an upcoming CG meeting to give another update (primarily on the discussion that's happened here and on the implementation that has landed in Wasmtime) and hold a vote to advance this proposal to phase 2. If anyone has any outstanding concerns, please file an issue for discussing them before then, thanks! |
Thanks, @fitzgen, I think the "cap" alternative had a fair airing and the consensus was clear. A substantial part of my concern about the complexity of the original proposal (and reliance on relocations and wasm-ld) was addressed by #22, assuming that technique prevails, and we can continue that discussion in the appropriate places. |
Heads up, I'm adding an agenda item to the 2024-07-30 CG meeting to give an update on this discussion and a vote for phase 2: WebAssembly/meetings#1619 |
I believe that we can close this issue now, since the phase 2 vote for the custom-page-sizes proposal passed. Thanks everyone! |
First off, apologies that I didn't see PR #12 until it was mentioned in today's meeting. From the April 23rd discussion, I had been expecting an issue thread on this and just didn't see it -- I'm sorry to keep applying stop energy to this.
My understanding is that this proposal is aimed at letting Wasm run (in a spec-conforming way) in environments with less than 64 KiB of memory (or, more rarely, less than some other integral number of 64-KiB pages). For this use case, my own opinion is that a deterministic "cap" on loads/stores would be simpler and less invasive, end-to-end, vs. plumbing custom page sizes through the tools and consumers. The "cap" would be a static declaration on a memory that requires the consumer to trap on loads and stores that go over some index. Syntactically, it could be part of
limits
ormemtype
, or it could be a new orthogonal element in a new section.To me this seems like it would be a lot less invasive for consumers to implement than the current proposal. Unlike the current proposal:
memory.size
ormemory.grow
.__builtin_wasm_page_size
clang -c
and the output in WAT, decoded by standard tools, is comprehensible as a Wasm module. The discussion in What is the toolchain integration story? #3 suggests that newly generated .o files will implicate this proposal pervasively, and knowledge of the tool conventions and non-standard sections will become more necessary to understand what the .o file is trying to do.)In exchange for giving up all this, the alternative would
From this point of view, the latter is a much shorter list. :-) I think this alternative would satisfy the "small-memory" use cases that I'm aware of, and I hope it's clear why it feels less invasive and easier for consumers and tools to deal with.
What I expect is that consumers in these small-memory embedded environments would refuse to instantiate a module unless it declares a sufficiently constraining set of "caps" on its memories and memory imports. Other consumers would probably just ignore the cap until an actual load/store is executed against a capped memory.
Downsides:
Response: The view could be that the over-allocation is tolerable everywhere except small-memory environments, which are well-handled by the "cap." Or maybe that the audio effects library should be using 512-byte GC char arrays instead of 512-byte custom pages in a linear memory. But if we want to tackle these other scenarios with full generality, then yeah, custom page sizes are probably the way to go.
Two other downsides are given in the #12 overview:
Response: Memory would still be composed of pages, and they'd still be 64 KiB. The effort required to spec the feature, and the additional burden on implementations, seems vastly smaller given my two bulleted lists above. It would, however, add a new execution-time concept (the byte-granularity "cap") that doesn't exist today.
Response: This is true. However, even today Wasm has no mechanism at execution-time to determine the
min
ormax
of alimits
. Like the "cap", these numbers are static and don't change at runtime. If this is really desired, it seems doable to specify a newmemory.limits
const instruction that takes an index immediate, and probably a field immediate (min/max/cap/shared/etc.), and pushes the corresponding static value.=====
Bottom line: for the particular goal of supporting small-memory environments, the "cap" feels a lot less invasive and challenging to implement than going to custom page sizes. I'm nervous about putting even more weight on the tools and tool-conventions, especially when lld seems to be the only implementation of these, it's not part of the Wasm spec, and the comprehensibility/accessibility of ".o" files is a nice thing to have. However, if there is a desire for the full flexibility and generality of custom page sizes for their own purposes (independent of the particular use-case of small-memory environments), then that's clearly the way to go.
The text was updated successfully, but these errors were encountered: