-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out which target features are required for which SIMD size #131800
Comments
cc @programmerjake to confirm what features are relevant re: PowerPC |
|
|
+zve32x, +zve32f, +zve64x, +zve64f, +zve64d, +v, +zvbb, +zvkb. There several others that start with +zv*. There is an implies relationship so they all ultimately imply +zve32x. These change the ABI for fixed length vector arguments/returns in IR in the backend. When compiling with clang with the default C ABI, fixed length vectors are passed coerced to a scalar integer type or passed indirectly through memory. clang will not create fixed length vector return types or arguments in IR. There is a fixed length vector ABI being implemented via an attribute llvm/llvm-project#100346. This changes how fixed length vectors are passed. |
vsx just enables an additional 32 registers that overlap with the scalar floating point registers but are otherwise the same as the 32 128-bit registers from vmx (aka. altivec), so no new simd bitwidths. |
@programmerjake and no alterations to the calling convention? |
@topperc hm, it seems LLVM doesn't have the crypto implications for these features specified here? is it somewhere else? https://github.com/llvm/llvm-project/blob/d54953ef472bfd8d4b503aae7682aa76c49f8cc0/llvm/lib/Target/RISCV/RISCVFeatures.td#L734-L746 it seems to rather be the opposite, a requirement relationship, but perhaps I'm misunderstanding: https://github.com/llvm/llvm-project/blob/d54953ef472bfd8d4b503aae7682aa76c49f8cc0/llvm/lib/TargetParser/RISCVISAInfo.cpp#L754-L758 |
My mistake. I'm not sure why we don't have the implies. gcc does. |
enabling vsx doesn't alter the calling convention. |
Are there types which are passed via these MMA registers? |
For us it's nice that there's no "implies" here, that makes it a lot easier to check the ABI consequences. ;) This way we juts have to block the actual feature changing something, not other features implying them. Though maybe this is also unnecessary if #131807 takes care of all that. EDIT: Ah, that's just for float ABI, not for vectors, is it? |
Uh wait a second, we are exposing another flag that can change ABI? 😢 😭 EDIT: That's probably a discussion for Zulip. |
LoongArch: According to the LoongArch ABI Specs, vector type parameters and return values are passed in GAR(general-purpose argument registers) or on the stack, and do not rely on vector registers or vector features. |
after a bit more research, there are types for MMA, but you can't pass them by value in function arguments or return, so they're not ABI-breaking: https://clang.godbolt.org/z/e4sTY37Pv |
...concerning. these blockades are in clang's semantic checks, they don't seem to be enforced by LLVM. |
cc @jacobbramley re: aarch64 |
cc @androm3da re: hexagon |
How do we handle that in Rust? We'd need a special pass during collection rejecting them as arguments, likely as part of the simd arg check that this issue is about. I assume we don't support these types yet, but this will need to be considered when someone decides to add them. |
I only just realized this only affects scalable vector types. Which anyway we don't support. So we can ignore this for now. |
According to the CSKY Development Guide: 4.5 vdsp, the vector width is configured as follows: For vdspv2, the width is fixed at 128 bits. Therefore, for both versions, you can safely set the vector width to 128 bits. csky: |
Hexagon HVX supports uses these target flags - +hvx-length64b, +hvx-length128b, +hvx, +hvxv60 through +hvxv73 (explicitly 60,62,65,66,67,68,69,71,73). Oh - and there's also vector predicate registers - they're 64-bit and 128-bit wide in hvx-length64b and hvx-length-128b modes, respectively. These registers can also be quad'd. All of these vector, vector-predicate registers are caller-saved. |
Neon vectors always have at least 128 bits, so I can confirm this (if I understand the notation correctly):
Neon also provides operations on 64-bit vectors, and corresponding types. Those types can be passed as arguments too, in case that matters, but anything with Neon supports both the 128-bit and 64-bit types. SVE vectors have a size that isn't known until runtime. In general, we say that they must be passed in "Z" registers (or "P" registers for the predicate type,
That Passing something like |
For AArch32 ("arm"), Neon has 128-bit vectors (like AArch64), and also supports 64-bit vectors. They are used differently for argument passing, but I doubt that it's important here.
MVE has a register layout similar to Neon, and also has 128-bit vectors, but I can't see any operations on 64-bit vectors (e.g. in ACLE). I'm not deeply familiar with MVE, but from recent experiments inspired by Zulip discussions, I think the calling convention is similar to Neon on A-class AArch32, with a few caveats such as being able to turn off FP support but still use the vector registers.
|
I experimented with gcc and clang and it seems that these C compilers at least offer us the small mercy of resisting doing anything but stack-passing for fixed-size vectors on AArch64. Even if you have SVE above 128 bits, and even if you do your merit best to trick it into accepting such code: https://gcc.godbolt.org/z/GYGEso3Pe Which, well, good. It still conceivably has some ABI problems, but they at least don't try to sometimes do register passing and sometimes not, which is where we are with x86_64's featureset. |
@RalfJung Sorry, there's a typo in my previous response. In csky, its vdspv1 and vdspv2, not vdsp1 or vdsp2. Thanks |
clang says:
so going with that. |
pinging maintainers on the remaining architectures or subarchitectures:
|
|
The ESP32-S3 uses custom DSP instructions, not based on any official Xtensa vector instructions. These instructions have their own register file (prefixed with Both sets of instructions will operate on a maximum of 128bits. |
And these q registers are not used for any ABI / any arguments of any type?
|
Correct, they are strictly used for accumulation or parallelisation of 8/16/32 bit math/float operations in memory. There are no custom types nor any changes to the standard calling convention. As the DSP operates on memory, alignment could be a concern here, but the DSP is able to load/store from unaligned addresses (at a perf cost) meaning it stays ABI compatible, but it would of course be preferable to use 128 bit alignment where possible which I'm guessing |
The issue for the q reg from the esp32 s3 is 1) they didn't upstream their DSP extension to LLVM upstream 2) standard RISC-V ABI won't use reg from vendor extension to pass any things |
The plan is for the DSP extensions to be upstreamed into LLVM.
Good point, the P4 isn't finalized yet so perhaps we'll end up supporting the official vector extensions, if not we'll just expose the instructions via a separate crate instead of |
AMX has a special type in LLVM called |
I'm not sure I understand the topic well enough to give a complete answer but I will give it my best. First of all. PTX only deals with virtual registers and exactly how values are passed is handled at a lower level (SASS). The errors encountered due to not treating SIMD types correctly might therefore be of a different nature when compiling to PTX. Instead of having features there's ISA versions that may slightly change how things work. When using LLVM to produce ptx it's tied to the cpu arch, while the NVIDIA tooling strives to use new ISA versions even for old arches. At the current newest version (PTX ISA 8.5) the documented size of vector registers is as described here
On the other hand there doesn't exists actual vector instructions operating on the described 16, 64, or 128-bit registers. It only exists instructions operating on 32-bit registers (4x8bit or 2x16bit). I still believe this means the "most correct " thing to use is |
The context for this is #116558: passing vector types by-value over
extern "C"
needs certain registers to be present, so if the target feature enabling these registers is missing, then either the ABI needs to change (which can lead to soundness issues if caller and callee disagree on their target features), or LLVM just errors outright.#127731 moves us towards detecting this situation, but that approach needs data about which target feature is needed to pass which vector size. That will be different for each architecture. So this issue is about gathering all that data.
x86_amx_intrinsics
#126622 )The text was updated successfully, but these errors were encountered: