Non-portable instructions #3

jvansanten · 2017-04-22T12:22:37Z

-msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mavx -march=native causes the compiler to emit instructions that are not supported on older CPUs. Omitting these entirely and relying only on x86_64's required SSE support slows down 4D muktiple gradient evaluations by a factor 2. Figure out which of these are critical, and decide how much we need to support older CPUs.

The text was updated successfully, but these errors were encountered:

jvansanten · 2017-04-24T23:08:08Z

It may be possible to use function multi-versioning automatically dispatch to architecture-specific versions of functions that may incorporate AVX instructions. Apparently Clang and ICC support a similar mechanism. VC++ has nothing of the sort, but Windows support is very low on my personal priority list.

jvansanten · 2018-09-10T13:10:53Z

The bug label is no longer appropriate after 1742b3a.

jvansanten · 2018-09-10T13:11:06Z

AVX does turn out to have a use (#11), but only in templated functions that are going to be emitted into dependent libraries anyway. One "solution" is therefore to punt, and require that users know they should dispatch to either an AVX-enabled ndssplineeval_gradient<double>() or a evaluate each element of the gradient individually. This doesn't seem ideal, though.

Also, GCC function multi-versioning seems to have gotten even better in version 6, but that will not be deployed on many platforms.

nwhitehorn · 2018-09-11T19:50:24Z

So it seems like, on x86, we just want two versions: an AVX gradient evaluator and a non-AVX gradient evaluator. How often, in the wild, do you still encounter non-AVX x86 CPUs, though? It first shipped seven years ago.

nega0 · 2022-06-02T21:28:19Z

Got bit by this today on a K10 which lacks SSE3, SSE4, FMA, and AVX.

Not sure what the best option is in 2022, function multi-versioning or configure-time detection and multiple compilation units.

I do propose we switch to -march=native and let the compiler do it's thing instead of listing out the instruction sets. This is what I'm going to be doing locally. For packaging we'd have to come up w/ a sane default such as -march=nehalem or -march=core2.

cnweaver · 2022-06-02T21:53:18Z

I want to do function multi-versioning in the long-run (as I think our reliance on gcc <6 is becoming fairly small, and clang has had okay support for a few versions now), but it will require some non-trivial refactoring, I think, and I haven't found the time to do it in earnest.

jvansanten added the bug label Apr 22, 2017

jvansanten self-assigned this Apr 22, 2017

jvansanten added enhancement and removed bug labels Sep 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-portable instructions #3

Non-portable instructions #3

jvansanten commented Apr 22, 2017

jvansanten commented Apr 24, 2017

jvansanten commented Sep 10, 2018

jvansanten commented Sep 10, 2018 •

edited

Loading

nwhitehorn commented Sep 11, 2018

nega0 commented Jun 2, 2022

cnweaver commented Jun 2, 2022

Non-portable instructions #3

Non-portable instructions #3

Comments

jvansanten commented Apr 22, 2017

jvansanten commented Apr 24, 2017

jvansanten commented Sep 10, 2018

jvansanten commented Sep 10, 2018 • edited Loading

nwhitehorn commented Sep 11, 2018

nega0 commented Jun 2, 2022

cnweaver commented Jun 2, 2022

jvansanten commented Sep 10, 2018 •

edited

Loading