Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift: Allow CLIF frontends to define their own stack maps #8728

Merged
merged 1 commit into from
Jun 7, 2024

Commits on Jun 7, 2024

  1. Cranelift: Allow CLIF frontends to define their own stack maps

    Tracking GC references and producing stack maps is a significant amount of
    complexity in `regalloc2`.
    
    At the same time, GC reference value types are pretty annoying to deal with in
    Cranelift itself. We know our `r64` is "actually" just an `i64` pointer, and we
    want to do `i64`-y things with it, such as an `iadd` to compute a derived
    pointer, but `iadd` only takes integer types and not `r64`s. We investigated
    loosening that restriction and it was way too painful given the way that CLIF
    type inference and its controlling type vars work. So to compute those derived
    pointers, we have to first `bitcast` the `r64` into an `i64`. This is
    unfortunate in two ways. First, because of arcane interactions between register
    allocation constraints, stack maps, and ABIs this involves inserting unnecessary
    register-to-register moves in our generated code which hurts binary size and
    performance ever so slightly. Second, and much more seriously, this is a serious
    footgun. If a GC reference isn't an `r64` right now, then it will not appear in
    stack maps, and failure to record a live GC reference in a stack map means that
    the collector could reclaim the object while you are still using it, leading to
    use-after-free bugs! Very bad. And the mid-end needs to know
    *not* to GVN these bitcasts or else we get similar bugs (see
    bytecodealliance#8317).
    
    Overall GC references are a painful situation for us today.
    
    This commit is the introduction of an alternative. (Note, though, that we aren't
    quite ready to remove the old stack maps infrastructure just yet.)
    
    Instead of preserving GC references all the way through the whole pipeline and
    computing live GC references and inserting spills at safepoints for stack maps
    all the way at the end of that pipeline in register allocation, the
    CLIF-producing frontend explicitly generates its own stack slots and spills for
    safepoints. The only thing the rest of the compiler pipeline needs to know is
    the metadata required to produce the stack map for the associated safepoint. We
    can completely remove `r32` and `r64` from Cranelift and just use plain `i32`
    and `i64` values. Or `f64` if the runtime uses NaN-boxing, which the old stack
    maps system did not support at all. Or 32-bit GC references on a 64-bit target,
    which was also not supported by the old system. Furthermore, we *cannot* get
    miscompiles due to GVN'ing bitcasts that shouldn't be GVN'd because there aren't
    any bitcasts hiding GC references from stack maps anymore. And in the case of a
    moving GC, we don't need to worry about the mid-end doing illegal code motion
    across calls that could have triggered a GC that invalidated the moved GC
    reference because frontends will reload their GC references from the stack slots
    after the call, and that loaded value simply isn't a candidate for GVN with the
    previous version. We don't have to worry about those bugs by construction.
    
    So everything gets a lot easier under this new system.
    
    But this commit doesn't mean we are 100% done and ready to transition to the new
    system, so what is actually in here?
    
    * CLIF producers can mark values as needing to be present in a stack map if they
    are live across a safepoint in `cranelift-frontend`. This is the
    `FunctionBuilder::declare_needs_stack_map` method.
    
    * When we finalize the function we are building, we do a simple, single-pass
    liveness analysis to determine the set of GC references that are live at each
    safepoint, and then we insert spills to explicit stack slots just before the
    safepoint. We intentionally trade away the precision of a fixed-point liveness
    analysis for the speed and simplicity of a single-pass implementation.
    
    * We annotate the safepoint with the metadata necessary to construct its
    associated stack map. This is the new
    `cranelift_codegen::ir::DataFlowGraph::append_user_stack_map_entry` method and
    all that stuff.
    
    * These stack map entries are part of the CLIF and can be roundtripped through
    printing and parsing CLIF.
    
    * Each stack map entry describes a GC-managed value that is on the stack and how
    to locate it: its type, the stack slot it is located within, and the offset
    within that stack slot where it resides. Different stack map entries for the
    same safepoint may have different types or a different width from the target's
    pointer.
    
    Here is what is *not* handled yet, and left for future follow up commits:
    
    * Lowering the stack map entries' locations from symbolic stack slot and offset
    pairs to physical stack frame offsets after register allocation.
    
    * Coalescing and aggregating the safepoints and their raw stack map entries into
    a compact PC-to-stack-map table during emission.
    
    * Supporting moving GCs. Right now we generate spills into stack slots for live
    GC references just before safepoints, but we don't reload the GC references from
    the stack upon their next use after the safepoint. This involves rewriting uses
    of the old, spilled values which could be a little finicky, but we think we have
    a good approach.
    
    * Port Wasmtime over to using this new stack maps system.
    
    * Removing the old stack map system, including `r{32,64}` from Cranelift and GC
    reference handling from `regalloc2`. (For the time being, the new system
    generally refers to "user stack maps" to disambiguate from the old system where
    it might otherwise be confusing.) If we wanted to remove the old system now,
    that would require us to also port Wasmtime to the new system now, and we'd end
    up with a monolithic PR. Better to do this incrementally and temporarily have
    the old and in-progress new system overlap for a short period of time.
    
    Co-Authored-By: Trevor Elliott <[email protected]>
    fitzgen and elliottt committed Jun 7, 2024
    Configuration menu
    Copy the full SHA
    3db2d30 View commit details
    Browse the repository at this point in the history