Are there any performance bugs that prevent the usage of shuffle! in the AVX2 backend? #181

gnzlbg · 2018-08-02T14:09:17Z

I just read a blog post about this library and was wondering why does the AVX2 backend do not use the shuffle! macro instead of the many other intrinsics for re-ordering vector elements.

If there are any performance issues with it, it would really help if bugs could be filled in packed_simd upstream.

The text was updated successfully, but these errors were encountered:

gnzlbg · 2018-08-02T18:14:09Z

Duh... the shuffle! macro was never merged into std::simd because of problems with exporting macros from core/std... so I guess that's the reason.

hdevalence · 2018-08-02T18:20:31Z

Thanks for pointing this out! There's no reason other than that when I wrote the original code last November, the intrinsics were using the packed vector types (which were later moved to packed_simd), and I thought these were on track for stabilization, while the shuffle! macro wasn't.

Later on, the intrinsics were changed to use the bag-of-bits types __m256i and friends, but all the code was already using the u32x8 etc. types and doing operations like + on them. All these would have had to have been replaced by add intrinsics etc., so I used the unstable std::simd stuff instead of staying just with the std::arch intrinsics.

Now, there's no reason not to use the shuffle! macro -- in fact, the existing code is already relying on the fact that the general AVX2 vector permute intrinsic is constant-folded into an LLVM shuffle then and lowered to a faster shuffle-by-immediate instruction.

Using the shuffle! macro also seems like it would be helpful on NEON (#147, cc @isislovecruft), and it would be great to be able to help upstream.

gnzlbg · 2018-08-02T18:28:55Z

Looking at the code, the shuffle macro just takes a constant vector of indices.

I think (not sure, haven't tried) that you might be able to keep using the AAAA notation with shuffle! by doing something like:

const AAAA: [i32; 4] = [0, 0, 0, 0];
shuffle!(vec, AAAA);

hdevalence · 2019-06-04T20:56:03Z

Closing this for now just because I don't have any plans to refactor the AVX2 backend right now; it could be reopened later.

hdevalence closed this as completed Jun 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are there any performance bugs that prevent the usage of shuffle! in the AVX2 backend? #181

Are there any performance bugs that prevent the usage of shuffle! in the AVX2 backend? #181

gnzlbg commented Aug 2, 2018

gnzlbg commented Aug 2, 2018

hdevalence commented Aug 2, 2018

gnzlbg commented Aug 2, 2018 •

edited

Loading

hdevalence commented Jun 4, 2019

Are there any performance bugs that prevent the usage of shuffle! in the AVX2 backend? #181

Are there any performance bugs that prevent the usage of shuffle! in the AVX2 backend? #181

Comments

gnzlbg commented Aug 2, 2018

gnzlbg commented Aug 2, 2018

hdevalence commented Aug 2, 2018

gnzlbg commented Aug 2, 2018 • edited Loading

hdevalence commented Jun 4, 2019

gnzlbg commented Aug 2, 2018 •

edited

Loading