-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cranelift: Allow CLIF frontends to define their own stack maps #8728
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
the
cranelift
Issues related to the Cranelift code generator
label
Jun 1, 2024
cfallin
approved these changes
Jun 5, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is extremely cool work and I'm very happy we'll be able to remove the complexity later in the pipeline eventually!
A few nits and requests for some comments, minor refactorings, etc below but nothing major.
fitzgen
force-pushed
the
saferpoints
branch
2 times, most recently
from
June 6, 2024 18:18
1f0e08e
to
2b4fdc0
Compare
github-merge-queue
bot
removed this pull request from the merge queue due to failed status checks
Jun 6, 2024
Tracking GC references and producing stack maps is a significant amount of complexity in `regalloc2`. At the same time, GC reference value types are pretty annoying to deal with in Cranelift itself. We know our `r64` is "actually" just an `i64` pointer, and we want to do `i64`-y things with it, such as an `iadd` to compute a derived pointer, but `iadd` only takes integer types and not `r64`s. We investigated loosening that restriction and it was way too painful given the way that CLIF type inference and its controlling type vars work. So to compute those derived pointers, we have to first `bitcast` the `r64` into an `i64`. This is unfortunate in two ways. First, because of arcane interactions between register allocation constraints, stack maps, and ABIs this involves inserting unnecessary register-to-register moves in our generated code which hurts binary size and performance ever so slightly. Second, and much more seriously, this is a serious footgun. If a GC reference isn't an `r64` right now, then it will not appear in stack maps, and failure to record a live GC reference in a stack map means that the collector could reclaim the object while you are still using it, leading to use-after-free bugs! Very bad. And the mid-end needs to know *not* to GVN these bitcasts or else we get similar bugs (see bytecodealliance#8317). Overall GC references are a painful situation for us today. This commit is the introduction of an alternative. (Note, though, that we aren't quite ready to remove the old stack maps infrastructure just yet.) Instead of preserving GC references all the way through the whole pipeline and computing live GC references and inserting spills at safepoints for stack maps all the way at the end of that pipeline in register allocation, the CLIF-producing frontend explicitly generates its own stack slots and spills for safepoints. The only thing the rest of the compiler pipeline needs to know is the metadata required to produce the stack map for the associated safepoint. We can completely remove `r32` and `r64` from Cranelift and just use plain `i32` and `i64` values. Or `f64` if the runtime uses NaN-boxing, which the old stack maps system did not support at all. Or 32-bit GC references on a 64-bit target, which was also not supported by the old system. Furthermore, we *cannot* get miscompiles due to GVN'ing bitcasts that shouldn't be GVN'd because there aren't any bitcasts hiding GC references from stack maps anymore. And in the case of a moving GC, we don't need to worry about the mid-end doing illegal code motion across calls that could have triggered a GC that invalidated the moved GC reference because frontends will reload their GC references from the stack slots after the call, and that loaded value simply isn't a candidate for GVN with the previous version. We don't have to worry about those bugs by construction. So everything gets a lot easier under this new system. But this commit doesn't mean we are 100% done and ready to transition to the new system, so what is actually in here? * CLIF producers can mark values as needing to be present in a stack map if they are live across a safepoint in `cranelift-frontend`. This is the `FunctionBuilder::declare_needs_stack_map` method. * When we finalize the function we are building, we do a simple, single-pass liveness analysis to determine the set of GC references that are live at each safepoint, and then we insert spills to explicit stack slots just before the safepoint. We intentionally trade away the precision of a fixed-point liveness analysis for the speed and simplicity of a single-pass implementation. * We annotate the safepoint with the metadata necessary to construct its associated stack map. This is the new `cranelift_codegen::ir::DataFlowGraph::append_user_stack_map_entry` method and all that stuff. * These stack map entries are part of the CLIF and can be roundtripped through printing and parsing CLIF. * Each stack map entry describes a GC-managed value that is on the stack and how to locate it: its type, the stack slot it is located within, and the offset within that stack slot where it resides. Different stack map entries for the same safepoint may have different types or a different width from the target's pointer. Here is what is *not* handled yet, and left for future follow up commits: * Lowering the stack map entries' locations from symbolic stack slot and offset pairs to physical stack frame offsets after register allocation. * Coalescing and aggregating the safepoints and their raw stack map entries into a compact PC-to-stack-map table during emission. * Supporting moving GCs. Right now we generate spills into stack slots for live GC references just before safepoints, but we don't reload the GC references from the stack upon their next use after the safepoint. This involves rewriting uses of the old, spilled values which could be a little finicky, but we think we have a good approach. * Port Wasmtime over to using this new stack maps system. * Removing the old stack map system, including `r{32,64}` from Cranelift and GC reference handling from `regalloc2`. (For the time being, the new system generally refers to "user stack maps" to disambiguate from the old system where it might otherwise be confusing.) If we wanted to remove the old system now, that would require us to also port Wasmtime to the new system now, and we'd end up with a monolithic PR. Better to do this incrementally and temporarily have the old and in-progress new system overlap for a short period of time. Co-Authored-By: Trevor Elliott <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tracking GC references and producing stack maps is a significant amount of complexity in
regalloc2
.At the same time, GC reference value types are pretty annoying to deal with in Cranelift itself. We know our
r64
is "actually" just ani64
pointer, and we want to doi64
-y things with it, such as aniadd
to compute a derived pointer, butiadd
only takes integer types and notr64
s. We investigated loosening that restriction and it was way too painful given the way that CLIF type inference and its controlling type vars work. So to compute those derived pointers, we have to firstbitcast
ther64
into ani64
. This is unfortunate in two ways. First, because of arcane interactions between register allocation constraints, stack maps, and ABIs this involves inserting unnecessary register-to-register moves in our generated code which hurts binary size and performance ever so slightly. Second, and much more seriously, this is a serious footgun. If a GC reference isn't anr64
right now, then it will not appear in stack maps, and failure to record a live GC reference in a stack map means that the collector could reclaim the object while you are still using it, leading to use-after-free bugs! Very bad. And the mid-end needs to know not to GVN these bitcasts or else we get similar bugs (see #8317).Overall GC references are a painful situation for us today.
This commit is the introduction of an alternative. (Note, though, that we aren't quite ready to remove the old stack maps infrastructure just yet.)
Instead of preserving GC references all the way through the whole pipeline and computing live GC references and inserting spills at safepoints for stack maps all the way at the end of that pipeline in register allocation, the CLIF-producing frontend explicitly generates its own stack slots and spills for safepoints. The only thing the rest of the compiler pipeline needs to know is the metadata required to produce the stack map for the associated safepoint. We can completely remove
r32
andr64
from Cranelift and just use plaini32
andi64
values. Orf64
if the runtime uses NaN-boxing, which the old stack maps system did not support at all. Or 32-bit GC references on a 64-bit target, which was also not supported by the old system. Furthermore, we cannot get miscompiles due to GVN'ing bitcasts that shouldn't be GVN'd because there aren't any bitcasts hiding GC references from stack maps anymore. And in the case of a moving GC, we don't need to worry about the mid-end doing illegal code motion across calls that could have triggered a GC that invalidated the moved GC reference because frontends will reload their GC references from the stack slots after the call, and that loaded value simply isn't a candidate for GVN with the previous version. We don't have to worry about those bugs by construction.So everything gets a lot easier under this new system.
But this commit doesn't mean we are 100% done and ready to transition to the new system, so what is actually in here?
CLIF producers can mark values as needing to be present in a stack map if they are live across a safepoint in
cranelift-frontend
. This is theFunctionBuilder::declare_needs_stack_map
method.When we finalize the function we are building, we do a simple, single-pass liveness analysis to determine the set of GC references that are live at each safepoint, and then we insert spills to explicit stack slots just before the safepoint. We intentionally trade away the precision of a fixed-point liveness analysis for the speed and simplicity of a single-pass implementation.
We annotate the safepoint with the metadata necessary to construct its associated stack map. This is the new
cranelift_codegen::ir::DataFlowGraph::append_user_stack_map_entry
method and all that stuff.These stack map entries are part of the CLIF and can be roundtripped through printing and parsing CLIF.
Each stack map entry describes a GC-managed value that is on the stack and how to locate it: its type, the stack slot it is located within, and the offset within that stack slot where it resides. Different stack map entries for the same safepoint may have different types or a different width from the target's pointer.
Here is what is not handled yet, and left for future follow up commits:
Lowering the stack map entries' locations from symbolic stack slot and offset pairs to physical stack frame offsets after register allocation.
Coalescing and aggregating the safepoints and their raw stack map entries into a compact PC-to-stack-map table during emission.
Supporting moving GCs. Right now we generate spills into stack slots for live GC references just before safepoints, but we don't reload the GC references from the stack upon their next use after the safepoint. This involves rewriting uses of the old, spilled values which could be a little finicky, but we think we have a good approach.
Port Wasmtime over to using this new stack maps system.
Removing the old stack map system, including
r{32,64}
from Cranelift and GC reference handling fromregalloc2
. (For the time being, the new system generally refers to "user stack maps" to disambiguate from the old system where it might otherwise be confusing.) If we wanted to remove the old system now, that would require us to also port Wasmtime to the new system now, and we'd end up with a monolithic PR. Better to do this incrementally and temporarily have the old and in-progress new system overlap for a short period of time.