Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-portable instructions #3

Open
jvansanten opened this issue Apr 22, 2017 · 6 comments
Open

Non-portable instructions #3

jvansanten opened this issue Apr 22, 2017 · 6 comments
Assignees

Comments

@jvansanten
Copy link
Collaborator

-msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mavx -march=native causes the compiler to emit instructions that are not supported on older CPUs. Omitting these entirely and relying only on x86_64's required SSE support slows down 4D muktiple gradient evaluations by a factor 2. Figure out which of these are critical, and decide how much we need to support older CPUs.

@jvansanten jvansanten added the bug label Apr 22, 2017
@jvansanten jvansanten self-assigned this Apr 22, 2017
@jvansanten
Copy link
Collaborator Author

It may be possible to use function multi-versioning automatically dispatch to architecture-specific versions of functions that may incorporate AVX instructions. Apparently Clang and ICC support a similar mechanism. VC++ has nothing of the sort, but Windows support is very low on my personal priority list.

@jvansanten
Copy link
Collaborator Author

The bug label is no longer appropriate after 1742b3a.

@jvansanten
Copy link
Collaborator Author

jvansanten commented Sep 10, 2018

AVX does turn out to have a use (#11), but only in templated functions that are going to be emitted into dependent libraries anyway. One "solution" is therefore to punt, and require that users know they should dispatch to either an AVX-enabled ndssplineeval_gradient<double>() or a evaluate each element of the gradient individually. This doesn't seem ideal, though.

Also, GCC function multi-versioning seems to have gotten even better in version 6, but that will not be deployed on many platforms.

@nwhitehorn
Copy link
Contributor

So it seems like, on x86, we just want two versions: an AVX gradient evaluator and a non-AVX gradient evaluator. How often, in the wild, do you still encounter non-AVX x86 CPUs, though? It first shipped seven years ago.

@nega0
Copy link
Contributor

nega0 commented Jun 2, 2022

Got bit by this today on a K10 which lacks SSE3, SSE4, FMA, and AVX.

Not sure what the best option is in 2022, function multi-versioning or configure-time detection and multiple compilation units.

I do propose we switch to -march=native and let the compiler do it's thing instead of listing out the instruction sets. This is what I'm going to be doing locally. For packaging we'd have to come up w/ a sane default such as -march=nehalem or -march=core2.

@cnweaver
Copy link
Collaborator

cnweaver commented Jun 2, 2022

I want to do function multi-versioning in the long-run (as I think our reliance on gcc <6 is becoming fairly small, and clang has had okay support for a few versions now), but it will require some non-trivial refactoring, I think, and I haven't found the time to do it in earnest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants