-
Notifications
You must be signed in to change notification settings - Fork 17
Portability Pointers
Some hints, tips, and things I've discovered in my efforts to write portable code in the 21st century.
While it sounds like hyperbole (I mean, after all, this is the 21st century! ...Right?), the fact of the matter is, virtually nothing is portable. Ok, ok, maybe not nothing, but if you need to use it in your program, or to build your program, or to test your program, chances are it ain't portable. And I'm just talking POSIX-oid platforms here. To wit:
So you want to use something like sed -r -n 's/^.*([0-9]+\.[0-9]+\.[0-9]+).*$/\1/p'
to extract the version of some program in your Makefile and put it into a variable? Well I hope you like walled gardens, because that, my friend, is NON-PORTABLE! Works fine on Linux and Cygwin, luring you in with its false sense of security, then BAM, it totally collapses on a *BSD or OSX system. And the worst part about it is, the problem is not what you think: the backref is totally fine.
It's the -r
. Meaning "use extended regular expressions instead of the 'basic' regexes that nobody would ever use because they make an already impossible-to-read expression even less possible to read, if that's even possible", at least to GNU sed
. For reasons known only to... well, nobody..., the non-GNU sed
on *BSDs and OSXes uses -E
for this purpose instead. True story.
Yep, in the year 200-ought-16, 21st century, extended regular expressions are not portable:
Regular Expressions in sed
The sed utility shall support the BREs [Basic Regular Expressions] described in XBD Basic Regular Expressions,[...]
Well, at least not in sed
anyway. Thanks Open Group Base Specifications Issue 7. If that is your real name.
"So what do we do Gary!?!?!??!?", you ask? Well, you only have three options:
-
Learn to read and write Basic Regular Expressions
This is what Open Group would have you do. Yep, me neither, I only list it for completeness.
-
Do
autoconf
's job for it, and write anM4
macro to find an installedsed
, then determine the correct command-line param... and I lost you atM4
. No prob, this next one is what I do currently: -
Cheat: Use
sed -E
Fun fact: GNU sed
accepts -E
as a synonym for -r
. It's not in the --help
, nor the man
page, but it's there and it works - on at least the Linuxes, Cygwins, PC-BSDs (FreeBSDs), and even OSXs I've tried. Number 3 for the win! You. Are. Welcome!
Editor's Note: Of course, #2 is what you should do in a production context, but that doesn't read as well.
Oh boy, Gentle Reader, fair warning: this is going to be a long one, and I don't have all the details worked out yet. Such a mess for such a simple instruction....
Ok, so around the time SSE4.2 was introduced, the x86-64 ISA finally got the simple, very useful instruction we all had been clamoring for for years: POPCNT. Finally, we were able to count the number of bits set in an word without all the bit-twiddling and/or loops and/or general complete nightmare that it is without the dozen-or-so gates it takes to implement this operation in hardware. Need the number of set bits in that word? POPCNT, BAM!, done! 21st century for the win!
...yeah, except not.
Because (as I understand it) AMD beat Intel to the punch in delivering the POPCNT instruction, POPCNT is not considered part of SSE4.2 proper. You can't just compile with -msse4.2
and use POPCNT, you have to compile with -mpopcnt
to get POPCNT.
Except you don't. Give gcc
(6.1.1) the -msse4.2
flag and it spits out POPCNTs just fine. Why? I can't figure it out; as far as I can tell from extensive study of The Internet(tm) it shouldn't (q.v. the Intel and Wikipedia links above, this StackOverflow answer from some random guy on the internet, etc.) The closest I can come to an explanation is that it appears that there's no -march=
supported by gcc
which supports SSE4.2 but does not support POPCNT (or as AMD would have you call it, ABM); But you don't have to take my word for it.
Ok, I can see I'm losing you (believe me, you'll thank me later for all the time and frustration this article saved you). "Big whoop," you say, "give gcc
a -msse4.2
and be done with it." But as the heading three up there says, things only get worse at runtime. Bizarrely worse.
So portability: Imagine you're like me, toiling away trying to deliver a portable program where performance is important. That means SSE2 is your baseline, and if available at runtime, you call certain critical functions which have been separately-compiled to use SSE4.2 and POPCNT.
Further, assume you're rockin' an Intel Core i7-960 monster as your dev machine (don't laugh), running Windows 7 (don't laugh), running Fedora 24 in a VirtualBox VM to do your Linux development (ok you can laugh now). Two things:
- The Intel Core i7-960 supports SSE4.2 and POPCNT.
- The self-same Intel Core i7-960 as presented to Linux by VirtualBox supports SSE4.2. But not POPCNT.
Don't believe me? Here's what the CPUID instruction has to say from inside the VM (via cpuid
):
$ cpuid
CPU 0:
vendor_id = "GenuineIntel"
version information (1/eax):
[...]
(simple synth) = Intel Core i7-900 (Bloomfield D0) [...]
miscellaneous (1/ebx):
[...]
brand id = 0x00 (0): unknown
feature information (1/edx):
[...]
feature information (1/ecx):
PNI/SSE3: Prescott New Instructions = true
[...]
SSSE3 extensions = true
[...]
SSE4.1 extensions = true
SSE4.2 extensions = true
[...]
POPCNT instruction = false
Is your mind blown yet? My guess is no. Well hold on to your hat my friend:
POPCNT actually works fine in the VM, even though it's advertised as not supported.
And I just blew your mind.
Editor's Note: Similar but even worse things happen to these bits under Valgrind, but that's an article for a different day.
Ok, so we have:
-
gcc
will generate POPCNTs when given-msse4.2
-
There is at least one platform in the wild which indicates it supports SSE4.2, but not POPCNT
2a. That platform actually does support the POPCNT instruction, meaning that its claim of non-support is erroneous.
-
Ergo, we can't trust a CPUID indication of POPCNT non-support.
-
Except if we want to write portable code, we have no other choice but to trust it, and not use POPCNT if CPUID says it doesn't exist.
-
But see #1. You'll get them even if you don't ask for them.
Like I said at the top, I don't have all this worked out yet, because like you, my mind is also still blown from all this conflicting weirdness. My Provisional POPCNT Portability Pointer(tm) at the moment is the following:
Take the conservative route: Don't use POPCNT if CPUID claims that it doesn't exist, even if it very likely does exist and works correctly.
To do that, if you otherwise want to use SSE4.2 instructions, you'll need to do the following:
- If you're compiling a separate module with
-msse4.2
, split it into two: One compiled with-msse4.2 -mno-popcnt
, one compiled with-msse4.2 -mpopcnt
. - Using your favorite flavor of CPU Dispatching/Function Multiversioning (oh man, yet another future artice), choose between the two implementations at load- or run-time based on what CPUID says about POPCNT support.
Complete and should-be-unnecessary hassle, but that's what my research to date indicates should keep you portable. One last time: I don't have a complete understanding of this issue at this time. I've verified that gcc
6.1.1 with -msse4.2 -mno-popcnt
does in fact not generate POPCNTs, even when provoked by using __builtin_popcountll()
. The advice above matches what both Intel's and AMD's manuals say you should do, as quoted in the StackOverflow post noted above. But no guarantees on this one. And I have no idea what happens when we get into AVX territory.
Word to the wise. QED.
All content Copyright(C) 2016 Gary R. Van Sickle