-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-portable instructions #3
Comments
It may be possible to use function multi-versioning automatically dispatch to architecture-specific versions of functions that may incorporate AVX instructions. Apparently Clang and ICC support a similar mechanism. VC++ has nothing of the sort, but Windows support is very low on my personal priority list. |
The bug label is no longer appropriate after 1742b3a. |
AVX does turn out to have a use (#11), but only in templated functions that are going to be emitted into dependent libraries anyway. One "solution" is therefore to punt, and require that users know they should dispatch to either an AVX-enabled Also, GCC function multi-versioning seems to have gotten even better in version 6, but that will not be deployed on many platforms. |
So it seems like, on x86, we just want two versions: an AVX gradient evaluator and a non-AVX gradient evaluator. How often, in the wild, do you still encounter non-AVX x86 CPUs, though? It first shipped seven years ago. |
Got bit by this today on a K10 which lacks SSE3, SSE4, FMA, and AVX. Not sure what the best option is in 2022, function multi-versioning or configure-time detection and multiple compilation units. I do propose we switch to |
I want to do function multi-versioning in the long-run (as I think our reliance on gcc <6 is becoming fairly small, and clang has had okay support for a few versions now), but it will require some non-trivial refactoring, I think, and I haven't found the time to do it in earnest. |
-msse2 -msse3 -msse4 -msse4.1 -msse4.2 -mavx -march=native
causes the compiler to emit instructions that are not supported on older CPUs. Omitting these entirely and relying only on x86_64's required SSE support slows down 4D muktiple gradient evaluations by a factor 2. Figure out which of these are critical, and decide how much we need to support older CPUs.The text was updated successfully, but these errors were encountered: