Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add a scalable representation to allow support for scalable vectors #3268

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 159 additions & 0 deletions text/3268-repr-scalable.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
- Feature Name: repr_scalable
- Start Date: 2022-05-19
- RFC PR: [rust-lang/rfcs#3268](https://github.com/rust-lang/rfcs/pull/3268)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Expanding the SIMD functionality to allow for runtime determined vector lengths.

# Motivation
[motivation]: #motivation

Without some support in the compiler it would be impossible to use the
[ACLE](https://developer.arm.com/architectures/system-architectures/software-standards/acle)
[SVE](https://developer.arm.com/documentation/102476/latest/) intrinsics from Arm.

This RFC will focus on the Arm vector extensions, and will use them for all examples. A large amount of what this
RFC covers is emitting the vscale attribute from LLVM, therefore other scalable vector extensions should work.
In an LLVM developer meeting it was mentioned that RISC-V would use what's accepted for Arm SVE for their vector extensions.
\[[see slide 17](https://llvm.org/devmtg/2019-04/slides/TechTalk-Kruppe-Espasa-RISC-V_Vectors_and_LLVM.pdf)\]

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

This is mostly an extension to [RFC 1199 SIMD Infrastructure](https://rust-lang.github.io/rfcs/1199-simd-infrastructure.html).
An understanding of that is expected from the reader of this. In addition to that, a basic understanding of
[Arm SVE](https://developer.arm.com/documentation/102476/latest/) is assumed.

Existing SIMD types are tagged with a `repr(simd)` and contain an array or multiple fields to represent the size of the
vector. Scalable vectors have a size known (and constant) at run-time, but unknown at compile time. For this we propose a
new kind of exotic type, denoted by an additional `repr()`, and based on a ZST. This additional representation, `scalable`,
accepts an integer to determine the number of elements per granule. See the definitions in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"granule" is mentioned here but not defined anywhere else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

granule is the name i made-up for the <4 x i32> part of the LLVM IR scalable vector type <vscale x 4 x i32>, idk what it's actually called.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In LLVM, a scalable type is represented as an (ElementCount NumElts, Type EltTy). An ElementCount is represented by (IsScalable, MinNumElts). Maybe it would be good if called it the minimum number of elements instead of granule?

[the reference-level explanation](#reference-level-explanation) for more information.

e.g. for a scalable vector f32 type the following could be its representation:

```rust
#[repr(simd, scalable(4))]
#[derive(Clone, Copy)]
pub struct svfloat32_t {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused on where scalable(4) comes into play here? I was looking at the svfloat32_t type in C, which is really backed by the builtin type __SVInt64_t and I couldn't find how that type was tied to a minimum element count of 4.

Am I missing where C SVE intrinsics tie svfloat32_t to a minimum number of elements? Or is this something that you are proposing Rust does that is missing in C?

Copy link
Member

@RalfJung RalfJung Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be related to the fact that the LLVM representation of the type is <vscale x 4 x f32>, which means that we assume the hardware scales in units of 128bits (that fit 4 f32). On hardware with a different scaling unit, this will be suboptimal -- or maybe even not work, if the scaling unit is smaller than 128 bits. IOW, this type is pretty non-portable.

That's my understanding based on reading the LLVM LangRef; maybe I got it all wrong. Unfortunately the RFC doesn't explain enough to be able to say -- it assumes a bunch of background on how these scalable vector types work in LLVM / hardware.

_ty: [f32; 0],
}
```
`_ty` is purely a type marker, used to get the element type for the LLVM backend.


This new class of type has some restrictions on it that a normal ZST wouldn't have, and some of the restrictions that a ZST
has do not apply to this new type.
As this type does have a run-time size it can be stored to memory, this is required for spilling to the stack for instance.
This new class of type can't be stored in a structure or a compound type, as the layout of that wouldn't be known at compile
time.


A simple example that an end user would be able to write for summing of two arrays using functions from the ACLE
for SVE is shown below:
```rust
unsafe {
let step = svcntw() as usize;
for i in (0..SIZE).step_by(step) {
let a = data_a.as_ptr().add(i);
let b = data_b.as_ptr().add(i);
let c = &mut data_c as *mut f32;
let c = c.add(i);

let pred = svwhilelt_b32(i as _, SIZE as _);
let sva = svld1_f32(pred, a);
let svb = svld1_f32(pred, b);
let svc = svadd_f32_m(pred, sva, svb);

svst1_f32(svc, pred, c);
}
}
```
As can be seen by that example the end user wouldn't necessarily interact directly with the changes that are
proposed by this RFC, but might use types and functions that depend on them.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

This will focus on LLVM. No investigation has been done into the alternative codegen back ends. At the time of
Copy link
Member

@RalfJung RalfJung Apr 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should focus on Rust, not LLVM. In other words, it should fully describe the behavior of these types without mentioning anything LLVM-specific. This is a Rust langauge RFC after all, so its effect needs to be described in terms of what happens on the level of Rust.

It is okay to also explain how this maps to LLVM, but you cannot expect the reader to know anything about LLVM -- so the text needs to make sense to someone who knows nothing about LLVM.

writing I believe cranelift doesn't support scalable vectors ([current proposal](https://github.com/bytecodealliance/rfcs/pull/19)),
and the GCC backend is not mature enough to be thinking about this.

Most of the complexity of SVE will be handled by LLVM and the `vscale` modifier that is applied to vector types. Therefore
changes for this should be fairly minimal for Rust. From the LLVM side this is as simple as calling `LLVMScalableVectorType`
rather than `LLVMVectorType`.
Comment on lines +88 to +94
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that unfortunately, some time has elapsed since this was first proposed, I'd like to see this RFC address a little bit more about how a non-LLVM backend might handle this. It doesn't have to dwell deeply on this, but I would like to see cross-referencing with cranelift's Dynamic Vectors implementation so that we know, before we stabilize anything, if the design will be tractable to implement by codegen that isn't "use LLVM". LLVM has injected limitations that are not contingent on the capabilities of the CPUs in question, so what other arbitrary limitations will we need to account for?

More than just codegen, it is very convenient if Miri understands how things operate, so it can model what is UB or uninit (poison/undef). So I would like it if these intrinsics were defined as something Miri can recognize and execute during interpretation of a Rust program, as opposed to just linking raw LLVM intrinsics, even if it's just "use a Rust intrinsic which expands into raw LLVMIR, which does roughly ${description}".


For a Scalable Vector Type LLVM takes the form `<vscale x elements x type>`.
* `elements` multiplied by sizeof(`type`) gives the smallest allowed register size and the increment size.
* `vscale` is a run time constant that is used to determine the actual vector register size.

For example, with Arm SVE the scalable vector register (Z register) size has to be a multiple of 128 bits, therefore for `f32`, `elements` would
always be four. At run time `vscale` could be 1, 2, 3, through to 16 which would give register sizes of 128, 256, 384 to 2048.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 0 a valid vscale for any architecture?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no afaict. you can set VL to 0, but vscale is instead basically VLMAX. vscale must be positive on SVE[2] and RVV. idk about SX Aurora but I'd expect vscale is always 64.

for SVP64 (SimpleV) instead of the cpu selecting VLMAX, the programmer selects it, so it'd work better to use fixed-size vectors (Simd<MaybeUninit<T>, VLMAX> with the first len/VL elements used, so kinda like ArrayVec) rather than scalable vectors (though we're thinking of also supporting compiler-selected VLMAX for compatibility with RVV/SVE, this would use scalable vector types). VLMAX must be in 1..=64 (or 1..=128 or more for future extensions). the instruction for selecting VLMAX (setvl) doesn't even have a way to encode VLMAX=0, it encodes VLMAX=1 using all zero bits in the appropriate immediate field.


The scalable representation accepts the number of `elements` rather than the compiler calculating it, which serves
two purposes. The first being that it removes the need for the compiler to know about the user defined types and how to calculate
the required `element` count. The second being that some of these scalable types can have different element counts. For instance,
the predicates used in SVE have different element counts in LLVM depending on the types they are a predicate for.

Within Rust some of the requirements on a SIMD type would need to be relaxed when the scalable attribute is applied, for instance,
currently the type can't be a ZST this check would need to be conditioned on the scalable attribute not being present, and a check
to ensure a scalable vector is a ZST should be added.
Aside from that check, all other SIMD checks should be valid to do with what the type can contain.

This should have minimal impact with other language features, to the same extent that the `repr(simd)` has.


As mentioned previously `vscale` is a runtime constant. With SVE the vector length can be changed at runtime (e.g. by a
[prctl()](https://www.kernel.org/doc/Documentation/arm64/sve.txt) call in Linux). However, since this would require a change
to `vscale`, this is considered undefined behaviour in Rust. This is consistent with C and C++ implementations.

# Drawbacks
[drawbacks]: #drawbacks

One difficulty with this type of approach is typically vector types require a target feature to be enabled.
Currently, a trait implementation can't enable a target feature, so `Clone` can't be implemented without
setting `-C target-feature` via rustc.

However, that isn't a reason to not do this, it's a pain point that another RFC can address.

# Prior art
[prior-art]: #prior-art

This is a relatively new concept, with not much prior art. C has gone a very similar way to this by using a ZST to
represent the SVE types. Aligning with C here means that most of the documentation that already exists for
the intrinsics in C should still be applicable to Rust.

# Future possibilities
[future-possibilities]: #future-possibilities

## Portable SIMD
For this to work with portable SIMD in the way that portable SIMD is currently implemented, a const generic parameter
would be needed in the `repr(scalable)`. Creating this dependency would be awkward from an implementation point of view
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many intrinsic functions in std::arch have been moved to using const generics and const fn implicitly, with annotations that allows using a "C-like" syntax. This is especially the case for intrinsics that expect "immediates" (though I haven't memorized the SVE2 instruction set so there may be none applicable in this case). But I expect this sort of trend to continue in the general sense.

So for that reason and more, I would like to know better how to make things easier here for you, more than "it's awkward". Rust programmers are likely to expect reasonably strong interoperability here, even just with std::simd (e.g. conversions between [Simd<f32, 4>] and svfloat32_t).

as it would require support for symbols within the literals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am... genuinely not sure what this means? Please explain? Which literals and which symbols?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc this is intended to address #3268 (comment) where imho we want all parts of a scalable vector type to support const generics and not only literal integers.


One potential for having portable SIMD working in its current style would be to have a trait as follows:
```rust
pub trait RuntimeScalable {
type Increment;
}
```

Which the compiler can use to get the `elements` and `type` from.

The above representation could then be implemented as:
```rust
#[repr(simd, scalable)]
#[derive(Clone, Copy)]
pub struct svfloat32_t {}
impl RuntimeScalable for svfloat32_t {
type Increment = [f32; 4];
}
```

Given the differences in how scalable SIMD works with current instruction sets it's worth experimenting with
architecture specific implementations first. Therefore portable scalable SIMD should be fully addressed with
another RFC as there should be questions as to how it's going to work with adjusting the active lanes (e.g.
predication).