RFC: Add a scalable representation to allow support for scalable vectors #3268

JamieCunliffe · 2022-05-19T12:24:07Z

A proposal to add an additional representation to be used with simd to allow for scalable vectors to be used.

Amanieu · 2022-05-25T19:44:53Z

I think a more general definition of an "opaque" type would be useful. This is a type which can exist in a register but not in memory, specifically:

It can be used as a function parameter or return value.
It can be used as the type of a local variable.
(Possible extension) you can make a struct consisting only of opaque types. The struct itself acts like an opaque type.
You can't have a pointer to an opaque type since it doesn't exist in memory.

Other that ARM and RISC-V scalable vectors, this would also be useful to represent reference types in WebAssembly. These are opaque references to objects which can only be used as local variables or function arguments and can't be written to WebAssembly memory.

tschuett · 2022-05-26T09:19:10Z

ARM SVE uses svfloat64x2_t. Vectors are a multiples of 128 bit. I don't know what RISC-V uses.

f64xN is in the Portable packed SIMD vector types RFC.

boomshroom · 2022-05-26T15:23:33Z

I noticed that seeing the vector length pseudoregister at runtime was considered undefined behavior. For RISC-V, rather than masking out elements that aren't used, it seems to primarily focus on setting the VL register, which is an actual register that needs to be modified when switching between different vector types. It also let's you change the actual "register size" by grouping together multiple physical registers, which is used either to save instructions or to facilitate type conversions. (ie casting from a u16 vector to a u32 vector puts the result across 2 contiguous vector registers, which can then be used as though they're one register.)

JamieCunliffe · 2022-06-07T15:19:19Z

@boomshroom
I'm not too familiar with RISC-V, the reason I said changing VL at runtime is undefined is because LLVM considers vscale to be a runtime constant, and as far as I'm aware considers changing vscale to be undefined behaviour.

"That vscale is constant -- that the number of elements in a scalable vector does not change during program execution -- is baked into the accepted scalable vector type proposal from top to bottom and in fact was one of the conditions for its acceptance" - https://lists.llvm.org/pipermail/llvm-dev/2019-October/135560.html

It might just be a case of changing the wording so that it's more clear that causing vscale to change is the undefined behaviour. On RISC-V, I think vscale corresponds to VLMAX rather than VL. If that seems reasonable then I can update the RFC accordingly.

@Amanieu
I think we would have to be careful with the wording here, "This is a type which can exist in a register but not in memory" could be a little confusing as the SVE types can spill to the stack for instance.

Just to be clear though, are you asking me to transform this into a more general RFC for opaque types, or just mention them?

tschuett · 2022-06-07T16:11:02Z

ARM offers ACLEs, which can read the vscale. I have an array of floats, then I read them with ACLE SVE. Do SVE types ever exist in memory or only in registers?

Amanieu · 2022-06-07T16:41:13Z

I don't think this needs to be a general RFC on opaque types, but more details on how scalable vectors differ from normal types would be nice to have.

tschuett · 2022-06-07T17:19:21Z

There are SVE registers. The calling convention can probably pass scalable vectors on the stack. Then it will be vscale * 1 bytes. It has to be a fixed size.

tschuett · 2022-06-07T17:49:50Z

If you have too much time, you can actually play with a SVE box:
https://github.com/aws/aws-graviton-getting-started
The other option is a Fujitsu box. It is a harder problem to get access.

tschuett · 2022-06-07T20:55:34Z

One selling point of SVE is: if you use ARM ACLE SVE intrinsics and you follow the rules, then your program will run on 256-bit and 2048-bit hardware. ARM SVE are plain Cray vectors. I believe the RISC-V scalable vectors are more elaborate.

clarfonthey · 2022-06-07T23:34:16Z

I'm honestly a bit confused by this RFC. I understand the benefits of SVE and what it is, but I'm not 100% sure what it's asking.

Specifically, it seems like it's suggesting stabilising #[repr(simd)] for scalable vectors, which… I don't think is stabilised or will ever be stabilised for fixed-size vectors? Is it suggesting to add specific ARM-specific intrinsics in core::arch? How would this be added to std::simd when that gets stabilised?

Like, I'm sold on the idea of having scalable vectors in stdlib, but unsure about both what the RFC is proposing, and the potential implementation.

tschuett · 2022-06-08T00:19:40Z

>  wc -l arm_sve.h
24043 arm_sve.h

eddyb · 2022-06-08T05:28:04Z

I think a more general definition of an "opaque" type would be useful. This is a type which can exist in a register but not in memory, specifically:

It can be used as a function parameter or return value.

It can be used as the type of a local variable.

(Possible extension) you can make a struct consisting only of opaque types. The struct itself acts like an opaque type.

You can't have a pointer to an opaque type since it doesn't exist in memory.

Other that ARM and RISC-V scalable vectors, this would also be useful to represent reference types in WebAssembly. These are opaque references to objects which can only be used as local variables or function arguments and can't be written to WebAssembly memory.

@Amanieu Mostly agree with #3268 (comment), just had a couple notes:

"opaque" feels ambiguous with e.g. extern { type } and similar existing FFI concepts
- ironically, they're opposites, because extern { type } is "always behind a pointer" (i.e. data in memory), while this other concept is "never in memory"/always-by-value
- free bikeshed material: "value-only types", "exotic types" (too vague?), "memoryless types"
- however, there is an interesting connection: if we consider a Sized/DynSized/Pointee hierarchy, then the straightforward thing to do is have such types be !Pointee (which also implies they can't be used in ADTs without making the ADTs !Pointee as well, forcing _{FCA_{(first-class aggregates)}/early SROA_{(scalar replacement of aggregates)}})
more than just/on top of externref in wasm, upcoming GC proposals would have entire hierarchies of types that it would be nice to have access to
- unlike miri/CHERI, wasm wants to keep linear memory a plain array of bytes so all the GC allocations are completely separate - great design, but if we don't want LLVM/linker-level errors about how they got misused, we do need robust high-level support
- long-term, GC-only wasm (w/o linear memory) could serve as a building block for some very interesting things (been thinking about it a lot in the context of GraalVM / Truffle, which today is built on Java bytecode)
Rust-GPU/rustc_codegen_spirv exposes several SPIR-V types that are effectively high-level abstract handles to GPU resources (buffers, textures, various aspects of raytracing, etc.), and while SPIR-V is inconsistent about how it deals with them (e.g. whether a pointer is required/allowed/disallowed), it would be great to hide a lot of it from the Rust code
- OTOH long-term we may end up having good enough capabilities in rewriting memory-heavy code to memory-less code that we may not want to limit the user, and if we'd be comfortable with erroring in our equivalent of LTO (instead of on the original generic Rust code), then a lot of this probably doesn't matter as much

workingjubilee · 2022-06-08T18:11:47Z

@tschuett This is an RFC, not IRC. Please only leave productive comments that advance the state of the conversation instead of non-contributing allusions that have no clear meaning. I can't even tell if your remark is critical or supportive.

tschuett · 2022-06-08T18:31:53Z

Sorry for my misbehaviour. I am supportive of adding scalable vectors to Rust. Because of type inference you cannot see that the pred variable is a predicate.

tschuett · 2022-06-08T21:38:19Z

The real questions is whether you want to make scalable vectors target-dependent (SVE, RISC-V).
I still like this f64xN. Scalable vectors of f64. rustc or LLVM can make it target-dependent:
https://github.com/gnzlbg/rfcs/blob/ppv/text/0000-ppv.md#unresolved-questions

programmerjake · 2022-06-08T22:06:10Z

The real questions is whether you want to make scalable vectors target-dependent (SVE, RISC-V).

Imho scalable vectors should be target independent, the compiler backend will simply pick a suitable constant for vscale at compile time if not otherwise supported.

tschuett · 2022-06-08T22:39:08Z

Note that vscale is a LLVM thing and should not be part of the RFC. LLVM assumes the vscale is an unknown but constant value during the execution of the program. The real value is hardware dependent.

programmerjake · 2022-06-09T00:01:46Z

Note that vscale is a LLVM thing and should not be part of the RFC.

I think it should not be dismissed just because it's a LLVM thing: every other compiler will have a similar constant simply because they need to represent scalable vectors as some multiple of an element count, that multiple is vscale.

Also, there should be variants for vectors like llvm's <vscale x 4 x f32>, not just <vscale x f32>, especially because fixed-length vector architectures are likely to pick 1 as vscale and vectors should be more than 1 element for efficiency.

https://reviews.llvm.org/D53695

Legalization

To legalize a scalable vector IR type to SelectionDAG types, the same procedure
is used as for fixed-length vectors, with one minor difference:

If the target does not support scalable vectors, the runtime multiple is
assumed to be a constant '1' and the scalable flag is dropped. Legalization
proceeds as normal after this.

tschuett · 2022-06-09T06:59:55Z

Do you want to expose this in Rust or should it be a an implementation detail of the compiler?

programmerjake · 2022-06-09T07:42:11Z

Do you want to expose this in Rust or should it be a an implementation detail of the compiler?

imho @rust-lang/project-portable-simd should expose scalable vector types with vscale, an additional multiplier, and an element type -- perhaps by exposing a wrapper struct that also contains the number of valid elements (like ArrayVec::len -- VL for RISC-V V and SimpleV) rather than the underlying compiler type.

programmerjake · 2022-06-09T07:50:31Z

One important thing that imho this RFC needs to be usable by portable-simd is for the element type and the multiplier to be able to be generics:

#[repr(simd, scalable(MUL))]
struct ScalableVector<T, const MUL: usize>([T; 0]);

portable-simd's exposed wrapper type might be:

pub struct ScalableSimd<T, const MUL: usize>
where
    T: ElementType,
    ScalableMul<MUL>: SupportedScalableMul,
{
    len: u32, // exposed as usize, but realistically u32 is big enough
    value: ScalableVector<T, MUL>,
}

tschuett · 2022-06-09T12:34:59Z

How about this notation (without the 4):

#[repr(simd, scalable)]
#[derive(Clone, Copy)]
pub struct svfloat32_t {
    _ty: [f32; 0],
}

It is a target-indent scalable vector of f32. If you need len(), then it will tell the number of f32 in the vector.

JamieCunliffe · 2022-06-17T12:51:42Z

MUL would be known at compile time and it's being constrained to a valid value by the traits, so I don't see a reason we couldn't have something like that. Having said that, I'm not yet fully sure of the implications of allowing a repr to depend on a const generic parameter as part of it though.

@tschuett
The RFC gives details as to why this takes a parameter, but without this parameter rustc would need to know about the SVE and RISC-V types (and any other future scalable SIMD extensions that might be created) to be able to emit the correct types to the compiler backend. For example with SVE and LLVM, you can't just use vscale x i64 the SVE intrinsics would be expecting a vscale x 2 x i64

My intention was that the feature proposed by this RFC would be target independent, and the rustc implementation would be target independent.
The bit that would then make it target dependent would be stdarch which would be able to expose a set of types and intrinsics that are architecture (and compiler backend) specific, like currently exists for SIMD.

tschuett · 2022-06-23T22:25:15Z

Honestly my RISC-V knowledge is limited. If you say that MUL is 4, then you make it target-dependent. It most likely only works for SVE. If In the future there comes a new scalable ISA that requires 8. How can your representation with integers be target-independent.

I agree with your vscale vector examples.

Maybe you can query LLVM for information about targets.

tschuett · 2022-06-23T22:57:26Z

For reference, IBM is also working on a scalable vector ISA:
https://libre-soc.org/openpower/sv/svp64/
https://libre-soc.org/openpower/sv/overview/

Amanieu · 2023-11-15T11:30:09Z

Values are already evaluated, so evaluating a value can never be UB. I am confused. Do they mean extracting the agnostic value from the vector at base type, as well as using an agnostic value as a non-masked input anywhere, is UB?

Yes, that's the intended meaning. Feel free to suggest better wording in that thread.

What exactly does it even mean that things are "undefined" on the ISA level?

According to the RISC-V vector spec:

When a set is marked agnostic, the corresponding set of destination elements in any vector destination operand can either retain the value they previously held, or are overwritten with 1s. Within a single vector instruction, each destination element can be either left undisturbed or overwritten with 1s, in any combination, and the pattern of undisturbed or overwritten with 1s is not required to be deterministic when the instruction is executed with the same inputs.

At the LLVM level, it's just treated as undef because the previous contents of the register is whatever happens to be left in the register when it is picked by regalloc.

jacobbramley · 2023-11-15T11:37:31Z

we can just say that it's UB to use any undefined value in a scalable vector: such values must be masked out when performing any operation

Does that really work? Rust normally considers uninitialised values (outside MaybeUninit) to be UB, even if not used. I'd been trying to work out if that also applies to individual SIMD lanes, but didn't find anything definitive. For SVE, we took a fairly conservative approach, noted in my earlier replies on this RFC.

For SVE either [T] or [MaybeUninit] would work since zeroing masked elements is effectively "free", so there is little inventive to use the _x forms of the intrinsics that leave masked elements undefined.

It's not free; most machine instructions support either zeroing or merging predication, so an extra instruction might be required to implement whichever is not inherently supported. The _x intrinsics abstract that, so the programmer can let the compiler pick the simplest one. Thus, for Rust, we say that the _x output is always initialised, but inactive lanes will be initialised to an unspecified value. (In practice, zero, or a merged value taken from user input.)

RalfJung · 2023-11-15T13:28:38Z

At the LLVM level, it's just treated as undef because the previous contents of the register is whatever happens to be left in the register when it is picked by regalloc.

So LLVM has specific intrinsics for Risc-V? Or which LLVM operations are we talking about here?

tschuett · 2023-11-15T16:14:32Z

One random example:
https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/RISCV/rvv/vssub-rv32.ll

They are namespaced with llvm.riscv.

Amanieu · 2023-11-15T16:27:14Z

Does that really work? Rust normally considers uninitialised values (outside MaybeUninit) to be UB, even if not used. I'd been trying to work out if that also applies to individual SIMD lanes, but didn't find anything definitive. For SVE, we took a fairly conservative approach, noted in my earlier replies on this RFC.

LLVM tracks undef at the lane level, all we need to do is ensure that we don't emit the noundef LLVM attribute when loading scalable vectors and passing them as parameters/return values. We already do this for MaybeUninit.

RalfJung · 2023-11-15T18:14:16Z

Okay, so these are platform-specific intrinsics that have semantics on the LLVM IR level, makes sense.

These signatures are hard to read and the function names look like gibberish to the untrained eye. What can I expect these operations to be like, when we express them in Rust? Something like this?

/// Returns the following function applied pointwise:
/// fn add_masked(x: MaybeUninit<T>, y: MaybeUninit<T>, mask: bool) -> MaybeUninit<T> {
///   if mask {
///     MaybeUnunit::new(x.assume_init() + y.assume_init())
///   } else {
///     MaybeUninit::uninit()
///   }
/// }
fn simd_add_masked<T, N>(x: Simd<T, N>, y: Simd<T, N>, mask: Mask<T, N>) -> Simd<T, N>

IOW, if any of the input elements inside the mask are uninit, we make it immediate UB? (I'm aware that in LLVM this will be delayed UB via poison/undef, but that is something we avoided in Rust semantics so far. Also with some of the plans LLVM has for the near future, it would probably be a really bad idea to have poison values in Rust. And undef is going away in LLVM.)

LLVM tracks undef at the lane level, all we need to do is ensure that we don't emit the noundef LLVM attribute when loading scalable vectors and passing them as parameters/return values. We already do this for MaybeUninit.

poison is more relevant than undef here, at least long-term, but the same is true for poison.

RalfJung · 2024-02-23T06:49:17Z

text/3268-repr-scalable.md

+* These types can be loaded and stored to/from memory for spilling to the stack,
+  and to follow any calling conventions.
+* Can't be stored in a struct, enum, union or compound type.
+  * This includes single field structs with `#[repr(trasparent)]`.


So what should happen when I do

#[repr(transparent)] struct Wrap<T>(T); type MyTy = Wrap<svfloat32_t>;

Are scalable SIMD types not allowed to instantiate generic parameters? Are there new post-monomorphization errors for when a generic instantiation turns out to break rules like this?

RalfJung · 2024-02-23T06:53:04Z

text/3268-repr-scalable.md

+
+This new class of type has the following properties:
+* Not `Sized`, but it does exist as a value type.
+  * These can be returned from functions.


This seems to indicate that we need support for "unsized (r)values" to use this feature. Unfortunately the current state of unsized values is "they are a complete mess, and don't even have a consistent MIR-level semantics".

We don't currently have support for returning unsized from functions in Rust. I would like for this RFC to better detail the impact that this will have on implementing scalable vectors in Rust. I hope I can provide some helpful information below.

If we look at this example, we can see that C/C++ can handle:

Scalable types as function params

Scalable types as local variables

Scalable types as return values

How does C/C++ do it?

These types in C/C++ are both sizeless and scalable sized. It seems that they invoke either of these properties where it is convenient. For example if you try to take sizeof on a scalable type:

<source>:8:5: error: invalid application of 'sizeof' to sizeless type 'vint32m8_t' (aka '__rvv_int32m8_t') 8 | sizeof(vint32m8_t);

Another example of the scalable type being sizeless is in ASTContext::getTypeInfoImpl:

// Because the length is only known at runtime, we use a dummy value // of 0 for the static length. #define SVE_VECTOR_TYPE(Name, MangledName, Id, SingletonId, NumEls, ElBits, \ IsSigned, IsFP, IsBF) \ case BuiltinType::Id: \ Width = 0;

But on the other hand, clang also treats these types as having a scalable size which can be resolved at runtime. There is a function getBuiltinVectorTypeInfo. In this function you can see how a BuiltinVectorTypeInfo object gets created using ElementCount::getScalable:

#define SVE_ELTTY(ELTTY, ELTS, NUMVECTORS) \ {ELTTY, llvm::ElementCount::getScalable(ELTS), NUMVECTORS}; // ... snip #define RVV_VECTOR_TYPE_INT(Name, Id, SingletonId, NumEls, ElBits, NF, \ IsSigned) \ case BuiltinType::Id: \ return {getIntTypeForBitwidth(ElBits, IsSigned), \ llvm::ElementCount::getScalable(NumEls), NF};

Then in SemaChecking.cpp, there are function calls such as areCompatibleSveTypes, checkRVVTypeSupport, CheckImplicitConverssion which type check treating these types as having a scalable size.

When it comes to code-gen to LLVM IR, Rust unsized types have been tricky because it can be difficult to lower unsized types, especially when it comes to return types. But that isn't the case with scalable types. Rust scalable types can be mapped to LLVM scalable types. I think this may allow us to sidestep a lot of the complications that come with supporting general unsized types in Rust. Using the godbolt example above we see that the C scalable/sizeless types lowered as LLVM scalable types:

%7 = load i64, ptr %4, align 8 %8 = call <vscale x 16 x i32> @foo(__rvv_int32m8_t, unsigned long)(<vscale x 16 x i32> %6, i64 noundef %7) store <vscale x 16 x i32> %8, ptr %5, align 4

Relying on Builtins

One important point I want to make here is that C/C++ is limiting scalable/sizeless types to builtins. For example, you cant define your own scalable type. In addition you cant define data structures using existing builtin scalable types:

// This is an error struct foo { vint32m8_t b; vint32m8_t a; };

As a result, the scope of handling these types is greatly reduced. As I pointed out above, functions like areCompatibleSveTypes, checkRVVTypeSupport know how to type check specifically on these scalable types. There is explicit lowering of intrinsics that operate on these types. I believe that by restricting support to only care for handling unsized scalable builtins, then we may not have to concern ourselves with what a mess general unsized types are in Rust.

What does this mean for Rust

I hope that this RFC can clarify what it will look like to add support for scalable vectors, in the context of unsized in Rust. Some questions I would like to clarify:

Will we support unsized fn params, unsized local variables, and unsized return values in general, or will we limit the scope to scalable types? I am leaning towards the latter, especially because supporting unsized return values might be a massive undertaking, if it is possible at all. I think if you choose the former, then we should have an RFC on adding that feature to the language. I've started inquiring about that topic on this Zulip thread in attempt to understand if any work had been done yet.

Will scalable types be builtin or can people define their own scalable types in their own Rust programs? If we choose the builtin path, I would like this RFC to discuss adding builtins under Prior Art.

If we sometimes treat these types as unsized and sometimes treat them as having scalable sized, what features will we need to include? Would we require something like #![feature(unsized_fn_params, unsized_locals, unsized_ret_vals)], #![feature(scalable_types)]`, or both?

RalfJung · 2024-02-23T06:54:45Z

text/3268-repr-scalable.md

+* Heap allocation of these types is not possible.
+* Can be passed by value, reference and pointer.
+* The types can't have a `'static` lifetime.
+* These types can be loaded and stored to/from memory for spilling to the stack,
+  and to follow any calling conventions.
+* Can't be stored in a struct, enum, union or compound type.


This is a wild list of restrictions, and the RFC does not explain why they are needed. Further down it seems like really these types are just "slices where the length is determined by a run-time constant". Slices don't have most of these restrictions, so why do scalable SIMD types need them?

RalfJung · 2024-02-23T06:55:11Z

text/3268-repr-scalable.md

+  * These can be returned from functions.
+* Heap allocation of these types is not possible.
+* Can be passed by value, reference and pointer.
+* The types can't have a `'static` lifetime.


Wait, so svfloat32_t: 'static is not true? But there's no lifetime in this type so this statement must be true. What is this about?

That might be poorly phrased by me, I was referring to the fact these cant exist as a static variable. I can update the RFC to make that clearer.

saethlin · 2024-03-27T22:28:06Z

This is a type which can exist in a register but not in memory

@Amanieu This sounds potentially useful except that it is contradicted by the language in this RFC. I don't understand how a type which cannot be in memory can be put in memory for the platform calling convention and can also be passed by pointer. Can you explain what's going on here?

programmerjake · 2024-03-28T21:00:47Z

honestly all the arbitrary restrictions (inherited from C) sound like ARM didn't want to bother to implement dynamically-sized types that are usable anywhere a usual type is, so they came up with some restrictions so they didn't have to, except that they arbitrarily chose where they were willing to put in the work and where they decided the didn't want to. I think Rust should be more consistent about where it supports types.

… r=Amanieu Stabilize Ratified RISC-V Target Features Stabilization PR for the ratified RISC-V target features. This stabilizes some of the target features tracked by #44839. This is also a part of #114544 and eventually needed for the RISC-V part of rust-lang/rfcs#3268. There is a similar PR for the the stdarch crate which can be found at rust-lang/stdarch#1476. This was briefly discussed on Zulip (https://rust-lang.zulipchat.com/#narrow/stream/250483-t-compiler.2Frisc-v/topic/Stabilization.20of.20RISC-V.20Target.20Features/near/394793704). Specifically, this PR stabilizes the: * Atomic Instructions (A) on v2.0 * Compressed Instructions (C) on v2.0 * ~Double-Precision Floating-Point (D) on v2.2~ * ~Embedded Base (E) (Given as `RV32E` / `RV64E`) on v2.0~ * ~Single-Precision Floating-Point (F) on v2.2~ * Integer Multiplication and Division (M) on v2.0 * ~Vector Operations (V) on v1.0~ * Bit Manipulations (B) on v1.0 listed as `zba`, `zbc`, `zbs` * Scalar Cryptography (Zk) v1.0.1 listed as `zk`, `zkn`, `zknd`, `zkne`, `zknh`, `zkr`, `zks`, `zksed`, `zksh`, `zkt`, `zbkb`, `zbkc` `zkbx` * ~Double-Precision Floating-Point in Integer Register (Zdinx) on v1.0~ * ~Half-Precision Floating-Point (Zfh) on v1.0~ * ~Minimal Half-Precision Floating-Point (Zfhmin) on v1.0~ * ~Single-Precision Floating-Point in Integer Register (Zfinx) on v1.0~ * ~Half-Precision Floating-Point in Integer Register (Zhinx) on v1.0~ * ~Minimal Half-Precision Floating-Point in Integer Register (Zhinxmin) on v1.0~ r? `@Amanieu`

michaelmaitland · 2024-04-10T16:04:25Z

On RISC-V, I think vscale corresponds to VLMAX rather than VL

On RISC-V, vscale corresponds to VLEN/64. VLEN is a hardware defined constant.

michaelmaitland · 2024-04-18T14:56:22Z

text/3268-repr-scalable.md

+Existing SIMD types are tagged with a `repr(simd)` and contain an array or multiple fields to represent the size of the
+vector. Scalable vectors have a size known (and constant) at run-time, but unknown at compile time. For this we propose a
+new kind of exotic type, denoted by an additional `repr()`, and based on a ZST. This additional representation, `scalable`,
+accepts an integer to determine the number of elements per granule. See the definitions in


In LLVM, a scalable type is represented as an (ElementCount NumElts, Type EltTy). An ElementCount is represented by (IsScalable, MinNumElts). Maybe it would be good if called it the minimum number of elements instead of granule?

michaelmaitland · 2024-04-18T15:00:00Z

text/3268-repr-scalable.md

+
+```rust
+#[repr(simd, scalable(4))]
+pub struct svfloat32_t {


I'm a bit confused on where scalable(4) comes into play here? I was looking at the svfloat32_t type in C, which is really backed by the builtin type __SVInt64_t and I couldn't find how that type was tied to a minimum element count of 4.

Am I missing where C SVE intrinsics tie svfloat32_t to a minimum number of elements? Or is this something that you are proposing Rust does that is missing in C?

This seems to be related to the fact that the LLVM representation of the type is <vscale x 4 x f32>, which means that we assume the hardware scales in units of 128bits (that fit 4 f32). On hardware with a different scaling unit, this will be suboptimal -- or maybe even not work, if the scaling unit is smaller than 128 bits. IOW, this type is pretty non-portable.

That's my understanding based on reading the LLVM LangRef; maybe I got it all wrong. Unfortunately the RFC doesn't explain enough to be able to say -- it assumes a bunch of background on how these scalable vector types work in LLVM / hardware.

michaelmaitland · 2024-04-19T15:31:04Z

text/3268-repr-scalable.md

+This new class of type has the following properties:
+* Not `Sized`, but it does exist as a value type.
+  * These can be returned from functions.
+* Heap allocation of these types is not possible.


In C, heap allocation depends on malloc which takes a size. You can't call sizeof on the an unsized type in C. So it is a compiler error to write malloc(sizeof(vint8mf8_t)). In this sense, unsized types may seem non-heap-allocatable.

However, I took a look at the RISC-V "V" C intrinsics trying to understand whether this had to be the case. On RISC-V a vector register has a size, even if it is unknown at compile time (due to the vscale). However, the __riscv_vlenb C intrinsic could be used to write programs that determine the size of the vector register associated with a type at runtime. As a result, it should be possible to do something like this. Using pseudo-code:

vscale = __riscv_vlenb() / 64; // helper func that returns the minimum vector size (i.e. size without vscale or multiplied by a vscale of 1) min_vec_size = get_min_size(vint8mf8_t); vint8mf8_t *heap_allocated_scalable = malloc(to_bytes_from_bits(vscale * min_vec_size));

So while it may be a little convoluted (and target dependent) to allocate these types on the heap, I think it is possible. Maybe it would be better to drop this as a requirement but note that initially there will not be support for allocating these types on the heap.

RalfJung · 2024-04-26T07:21:26Z

Would it make sense to consider the alternative of not exposing these scalable vector types in Rust at all, and instead have them entirely handled by codegen? In other words, when I have a large but statically sized vector Simd<i32, 128>, there could be a language primitive to iterate over that vector, and then the compiler would under the hood generate the code that queries the vector size and processes my vector in appropriate chunks. What is the reason why this chunk size needs to be visible to the programmer?

RalfJung · 2024-04-26T07:27:23Z

text/3268-repr-scalable.md

+`vscale` could be 1, 2, 4, 8, 16 which would give register sizes of 128, 256,
+512, 1024 and 2048. While SVE now has the power of 2 restriction, `vscale` could
+be any value providing it gives a legal vector register size for the
+architecture.


This sounds like a pretty bad API for portable efficient vector programming. I thought the point was to not have to know the vector size supported by the hardware, so I could e.g. use <vscale x i32> to get a vector of i32 that's the ideal size for this hardware. But now it seems like I still have to know the hardware I am writing for so that I can use <vscale x 4 x i32> on ARM while using e.g. <vscale x 8 x i32> on some target where vscale measures multiples of 256 bits.

Ideally for Rust we should have a version of this that does not require me to know the hardware's "vector scaling unit" (i.e. the size that corresponds to an LLVM vscale of 1).

so I could e.g. use to get a vector of i32

A scalable type has a minimum size component. For example <vscale x 4 x i32>

But now it seems like I still have to know the hardware I am writing for

I'm not sure thats true in all instances. In LLVM, vector types go through type legalization in SelectionDAG or GlobalISel, which are components responsible for translating IR into target specific instructions. In cases where SelectionDAG or GlobalISel see a vector type that is not supported, the legalizer will try to put it into a form that the hardware can support. One example of this is on RISC-V where all fixed vectors are legalized into scalable vectors.

Ideally for Rust we should have a version of this that does not require me to know the hardware's "vector scaling unit" (i.e. the size that corresponds to an LLVM vscale of 1).

As LLVM scalable types exist today, we don't know what vscale is until runtime. So you are not required to know the hardware's scaling unit at compile time.

(i.e. the size that corresponds to an LLVM vscale of 1).

This sounds like a suggestion to use fixed sized vectors instead of scalable vectors in cases where your really need it.

Everything I am saying is based on the LangRef: "For scalable vectors, the total number of elements is a constant multiple (called vscale) of the specified number of elements; vscale is a positive integer that is unknown at compile time and the same hardware-dependent constant for all scalable vectors at run time. The size of a specific scalable vector type is thus constant within IR, even if the exact size in bytes cannot be determined until run time.".

IOW, this is not a minimum size. <vscale x 4 x i32> means "some constant times 4 x i32". And if you also have a <vscale x 2 x i32> then that's the same constant times 2 x i32". So, <vscale x 4 x i32> will always be exactly twice as large as <vscale x 2 x i32>. If the ARM chip has vectors of size 512bit, then vscale=4 and <vscale x 2 x i32> will be only 256bit in size, so half the vector width was wasted. One therefore has to carefully pick the unit that is being scaled to match the hardware.

As LLVM scalable types exist today, we don't know what vscale is until runtime. So you are not required to know the hardware's scaling unit at compile time.

I was talking about the scalable vector unit, not the scalable vector factor. (I am making up terms here as LangRef doesn't give me good terms to work with.) On ARM, the "unit" is 128bit large. The factor then determines the actual size of the vector registers, in units of 128bit. So a factor of 4 means the registers are 512 bit large. With the interface provided by LLVM, one has to know the unit (not the factor!) at compiletime to generate optimal code.

Or maybe I got it all wrong. But the LangRef description is not compatible with your claim that the 4 in vscale x 4 x i32 is a minimum.

was talking about the scalable vector unit

Do you mind giving a definition of what a unit is? Is that the fixed components of the vector type? For <vscale x 4 x i32> the unit is 4 x i32?

... With the interface provided by LLVM, one has to know the unit (not the factor!) at compile time to generate optimal code.

I'm not so sure about ARM, but I know that RISC-V can generate code for all different "units" regardless the runtime vscale value. You can pick whatever "unit" you'd like to use.

But the LangRef description is not compatible with your claim that the 4 in <vscale x 4 x i32> is a minimum.

It is a minimum because the smallest runtime value of vscale is 1.

It is a minimum because the smallest runtime value of vscale is 1.

If describing it as a minimum is a sufficient description, then <vscale x 2 x i32> and <vscale x 4 x i32> should both be vectors of size 128bit (if the platform has registers of that size), right? I am asking for "at least 2 (or 4) i32, but ideally as many as the hardware provides".

But that's not correct, according to LangRef. Ergo, saying it is a minimum is misleading. The type is not defined as "at least that big", it is defined as "the hardware-specific scaling factor times that base size". If you pick the base size too small (smaller than the scaling unit of the hardware), you will waste register space. If you pick it too big, presumably LLVM complains.

Do you mind giving a definition of what a unit is?

It's how much you get when the factor is 1. I am talking about a hardware property here. ARM defines that if vscale is 1 then the registers are 128bit large, ergo the ARM scalable vector unit is 128bit -- IOW, the size of ARM scalable vectors is measured in multiples of 128bit.

LLVM vscale types also have a unit, as you say it is the part after vscale x. If that unit does not have the same size as the hardware unit then things seem weird.

I think ideally we'd avoid tying this lang feature too closely to any specific implementation of scalable vector types. Even if this feature is primarily meant for internal use, we still need to properly document and specify it as part of the Rust language since it is a language extension. I would also not be surprised if some day people ask for this to be directly exposed, why should only stdarch define such types?

So what I'd hope for is to declare a type like

#[repr(simd, scalable)] pub struct svfloat32_t { _ty: [f32], }

and then it is the compiler's responsibility to figure out how large the vector should be.

Imagine RISC-V did scalable vectors where the smallest possible vector size is 256bits. So vscale says how many times 256bits the vectors are in size -- on contrast to ARM where apparently vscale denotes a multiple of 128bits. I'd want to declare a single scalable vector type for both targets, but the RFC as-is does not support that. The svfloat32_t type shown above would lower to <vscale x 4 x f32> on ARM but to <vscale x 8 x f32> on this hypothetical RISC-V version of scalable vectors.

coastalwhite · 2024-04-26T08:17:55Z

This sounds like a pretty bad API for portable efficient vector programming.

The point of these variable-sizes vector extensions is that code is written agnostic to the size of the register, but chooses a vscale to specify how many registers are required. Writing register size agnostic code in a sense makes it a lot more portable. Stamping down at compile time what the scaling factor or register size is goes partially against why these extensions exist. I think this work requires quite a lot of special casing by Rust and exposing all of those internals to Rust-users will require a lot more effort.

Would it make sense to consider the alternative of not exposing these scalable vector types in Rust at all, and instead have them entirely handled by codegen? In other words, when I have a large but statically sized vector Simd<i32, 128>, there could be a language primitive to iterate over that vector, and then the compiler would under the hood generate the code that queries the vector size and processes my vector in appropriate chunks. What is the reason why this chunk size needs to be visible to the programmer?

I think this is quite a reasonable idea, but I think it would be a lot of work from rust’s perspective.

RalfJung · 2024-04-26T12:03:18Z

The point of these variable-sizes vector extensions is that code is written agnostic to the size of the register

Yeah that's what I thought. But now I learn that one has to generate LLVM that says <vscale x 4 x i32> on ARM targets to make proper use of them, i.e. I still have to know that a vscale of 1 corresponds to "4 times i32" -- I still have to know that the register has a "base size" of 128bits. To be truly agnostic to the size of the register I should at least be able to just say "give me an i32 vector of the right size, whatever is best for the current CPU". In the RFC that seems to be reflected with the 4 in #[repr(simd, scalable(4))] -- the fact that I have to know to put 4 there is pretty bad and ideally we can avoid exposing that to users. I just want a vector of floats, I don't want to have to know whether the hardware scales in units of 128 bits or 256 bits or whatever.

Amanieu · 2024-04-26T22:01:45Z

Just to be clear, I don't believe #[repr(simd, scalable)] is ever intended to be stabilized. It will only be used as an implementation detail for std::arch which will provide scalable types like svfloat32_t and intrinsics to work with them.

As such, users won't need to worry about figuring out the correct value of N for <vscale x N x f32>: svfloat32_t will use a value of N=4, which is what LLVM expects.

RalfJung · 2024-04-27T08:28:37Z

Just to be clear, I don't believe #[repr(simd, scalable)] is ever intended to be stabilized. It will only be used as an implementation detail for std::arch which will provide scalable types like svfloat32_t and intrinsics to work with them.

As such, users won't need to worry about figuring out the correct value of N for <vscale x N x f32>: svfloat32_t will use a value of N=4, which is what LLVM expects.

It is impossible to evaluate this RFC without understanding what all of this stuff actually means. And I had to go read other documents to figure this out as the RFC doesn't explain this.

I came in expecting some sort of portable interface where I can just ask for "a vector of i32 of the best size for the hardware". That's what intuitively a scalable vector would mean, if one hasn't already read the ARM manuals.

Currently the RFC is written in a way that it can only be understood by people that already know how scalable vectors work in detail, all the way down to hardware. That excludes the majority of the community from the discussion (and likely the majority of the lang team as well). That needs to be fixed.

RalfJung · 2024-04-27T08:33:57Z

text/3268-repr-scalable.md

+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+This will focus on LLVM. No investigation has been done into the alternative codegen back ends. At the time of


This should focus on Rust, not LLVM. In other words, it should fully describe the behavior of these types without mentioning anything LLVM-specific. This is a Rust langauge RFC after all, so its effect needs to be described in terms of what happens on the level of Rust.

It is okay to also explain how this maps to LLVM, but you cannot expect the reader to know anything about LLVM -- so the text needs to make sense to someone who knows nothing about LLVM.

RalfJung · 2024-04-27T08:41:13Z

text/3268-repr-scalable.md

+`Sized` (or both). Once returning of unsized is allowed this part of the rule
+would be superseded by that mechanism. It's worth noting that, if any other
+types are created that are `Copy` but not `Sized` this rule would apply to
+those.


Remember that Rust has generics, so I can e.g. write a function fn foo<T: Copy>(x: &T) -> T. The RFC seems to say this is allowed, because the return type is Copy. But for most types T and most ABIs this can't be implemented.

You can't just say in a sentence that you allow unsized return values. That's a major language feature that needs significant design work on its own.

I think what you actually want is some extremely special cases where specifically these scalable vector types are allowed as return values, but in a non-compositional way. There is no precedent for anything like this in Rust so it needs to be fairly carefully described and discussed.

… r=Amanieu Stabilize Ratified RISC-V Target Features Stabilization PR for the ratified RISC-V target features. This stabilizes some of the target features tracked by #44839. This is also a part of #114544 and eventually needed for the RISC-V part of rust-lang/rfcs#3268. There is a similar PR for the the stdarch crate which can be found at rust-lang/stdarch#1476. This was briefly discussed on Zulip (https://rust-lang.zulipchat.com/#narrow/stream/250483-t-compiler.2Frisc-v/topic/Stabilization.20of.20RISC-V.20Target.20Features/near/394793704). Specifically, this PR stabilizes the: * Atomic Instructions (A) on v2.0 * Compressed Instructions (C) on v2.0 * ~Double-Precision Floating-Point (D) on v2.2~ * ~Embedded Base (E) (Given as `RV32E` / `RV64E`) on v2.0~ * ~Single-Precision Floating-Point (F) on v2.2~ * Integer Multiplication and Division (M) on v2.0 * ~Vector Operations (V) on v1.0~ * Bit Manipulations (B) on v1.0 listed as `zba`, `zbc`, `zbs` * Scalar Cryptography (Zk) v1.0.1 listed as `zk`, `zkn`, `zknd`, `zkne`, `zknh`, `zkr`, `zks`, `zksed`, `zksh`, `zkt`, `zbkb`, `zbkc` `zkbx` * ~Double-Precision Floating-Point in Integer Register (Zdinx) on v1.0~ * ~Half-Precision Floating-Point (Zfh) on v1.0~ * ~Minimal Half-Precision Floating-Point (Zfhmin) on v1.0~ * ~Single-Precision Floating-Point in Integer Register (Zfinx) on v1.0~ * ~Half-Precision Floating-Point in Integer Register (Zhinx) on v1.0~ * ~Minimal Half-Precision Floating-Point in Integer Register (Zhinxmin) on v1.0~ r? `@Amanieu`

RalfJung · 2024-06-27T13:57:13Z

I wonder if the proposal for "claimable" types with automatic claim can be used to overcome the issue of Copy: Sized? We'd still need to introduce a new category of "types that are unsized but can anyway be passed to an from functions", but maybe we don't have to break Copy: Sized...

Amanieu · 2024-06-27T14:03:01Z

The current plan in the implementation PR (rust-lang/rust#118917) is for scalable vector types to not implement either Copy or Sized but to instead specifically allow these types to be used as local variables and function arguments/return values.

My understanding is that this RFC is going to be rewritten to match the new implementation plan.

RalfJung · 2024-06-27T14:13:52Z

That sounds potentially quite hacky... but in the end it'll be up to @rust-lang/types to decide whether that is acceptable.

An interesting part of this will be properly working out the MIR semantics, ideally by implementing them in the interpreter.

Add a scalable representation to allow support for scalable vectors.

c443df4

JamieCunliffe changed the title ~~RFC: Add a scalable representation to allow support for scalable vectors.~~ RFC: Add a scalable representation to allow support for scalable vectors May 19, 2022

Update PR number

b5073c9

ehuss added T-lang Relevant to the language team, which will review and decide on the RFC. A-simd SIMD related proposals & ideas labels May 19, 2022

JamieCunliffe mentioned this pull request Dec 13, 2023

Support for a scalable simd representation rust-lang/rust#118917

Draft

RalfJung reviewed Feb 23, 2024

View reviewed changes

lcnr added the T-types Relevant to the types team, which will review and decide on the RFC. label Apr 6, 2024

michaelmaitland reviewed Apr 18, 2024

View reviewed changes

michaelmaitland reviewed Apr 19, 2024

View reviewed changes

RalfJung reviewed Apr 26, 2024

View reviewed changes

RalfJung reviewed Apr 27, 2024

View reviewed changes

RFC: Add a scalable representation to allow support for scalable vectors #3268

Are you sure you want to change the base?

RFC: Add a scalable representation to allow support for scalable vectors #3268

Conversation

JamieCunliffe commented May 19, 2022 • edited Loading

Amanieu commented May 25, 2022

tschuett commented May 26, 2022

boomshroom commented May 26, 2022

JamieCunliffe commented Jun 7, 2022

tschuett commented Jun 7, 2022

Amanieu commented Jun 7, 2022

tschuett commented Jun 7, 2022

tschuett commented Jun 7, 2022

tschuett commented Jun 7, 2022

clarfonthey commented Jun 7, 2022

tschuett commented Jun 8, 2022

eddyb commented Jun 8, 2022

workingjubilee commented Jun 8, 2022

tschuett commented Jun 8, 2022

tschuett commented Jun 8, 2022

programmerjake commented Jun 8, 2022

tschuett commented Jun 8, 2022

programmerjake commented Jun 9, 2022

Legalization

tschuett commented Jun 9, 2022

programmerjake commented Jun 9, 2022

programmerjake commented Jun 9, 2022 • edited Loading

tschuett commented Jun 9, 2022

JamieCunliffe commented Jun 17, 2022

tschuett commented Jun 23, 2022

tschuett commented Jun 23, 2022

Amanieu commented Nov 15, 2023

jacobbramley commented Nov 15, 2023

RalfJung commented Nov 15, 2023

tschuett commented Nov 15, 2023

Amanieu commented Nov 15, 2023

RalfJung commented Nov 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

How does C/C++ do it?

Relying on Builtins

What does this mean for Rust

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saethlin commented Mar 27, 2024 • edited Loading

programmerjake commented Mar 28, 2024 • edited Loading

michaelmaitland commented Apr 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung commented Apr 26, 2024

RalfJung Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

michaelmaitland Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

RalfJung Apr 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coastalwhite commented Apr 26, 2024

RalfJung commented Apr 26, 2024 • edited Loading

Amanieu commented Apr 26, 2024

RalfJung commented Apr 27, 2024

RalfJung Apr 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung commented Jun 27, 2024

Amanieu commented Jun 27, 2024

RalfJung commented Jun 27, 2024

JamieCunliffe commented May 19, 2022 •

edited

Loading

programmerjake commented Jun 9, 2022 •

edited

Loading

RalfJung commented Nov 15, 2023 •

edited

Loading

saethlin commented Mar 27, 2024 •

edited

Loading

programmerjake commented Mar 28, 2024 •

edited

Loading

RalfJung Apr 26, 2024 •

edited

Loading

RalfJung Apr 26, 2024 •

edited

Loading

RalfJung Apr 26, 2024 •

edited

Loading

michaelmaitland Apr 26, 2024 •

edited

Loading

RalfJung Apr 27, 2024 •

edited

Loading

RalfJung commented Apr 26, 2024 •

edited

Loading

RalfJung Apr 27, 2024 •

edited

Loading