-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] : Distribution via Bioconda #25
Comments
Hi @rob-p and sorry for the late reply to this. I'm interested in distributing SSHash via Bioconda (but I first have to learn how to do it). Do you know if there is a better way of checking or we have to proceed by trial and error? |
Hi @jermp, No worries — things are busy on this end as well ;P. So, the issues that I can see are the following (btw, currently the compilation of
So both of these have the following implications. Bioconda builds will likely work on most client machines, but would fail on machines that lack either instructions included because of If you just want the bioconda build to work on x86-64, it will probably already work on most machines, but we might want to explicitly list out the useful instructions and remove |
Alright, let's dig into |
Ok, my experiences over the last 2 weeks have been helpful here. I think we can just gate |
Following from here: https://wiki.gentoo.org/wiki/Safe_CFLAGS#Find_CPU-specific_options, by doing
I get
on a server, from which we see that |
lol, there are a lot there! So it looks like it does explicitly pull in all of the relevant SSEs up to 4.1 (which I've read before is actually necessary; i.e. telling the compiler SSE4.2 doesn't imply it will also use 4.1 and earlier intrinsics too). It also has the bmi and bmi2 instructions, popcount, mmx, (avx and avx2 — which we probably don't want to require?). There's also |
I think all that is required in the end can be understood from here https://github.com/jermp/pthash/blob/master/include/encoders/util.hpp -- from some special instructions PTHash uses: popcount and parallel-bit-deposit (or SSHash by itself does not introduce any further special instruction. |
And it would be instructive to compare the performance of both tools, PTHash and SSHash, with and without those compiler flags to see how much they impact. I did it in the past for other libraries and I can confirm both |
So, I think For |
Ok, so it looks we have reduced the problem to
Will do it soon. |
Hi @rob-p,
In summary: what it is of interest here is the metric So I would expect to see a similar effect for SSHash as well because it is using Elias-Fano in a couple of places (but hopefully, not a ~2X slowdown...). |
These are the results for 100M keys:
and are consistent with the others reported before. Elias-Fano went from 45 ns/key to 80 ns/key. |
Thanks @jermp! I wonder if there is a |
I also made the following experiment:
from which we get the very same performance. This is consistent with the output here
which also includes
Would it be possible to detect it via CMake? |
Yes. One can have cmake compile arbitrary code and see if it runs. There may even be a cmake module to check for this. For bioconda though, the fear is the host used to compile has this, but runtimte doesn't. In that case one can compile both and dispatch the correct one at runtime. I do this with ksw2 in salmon. |
Continuing from #22
The discussion here is in regard to 2. I have two thoughts here.
I absolutely agree that, until native M1/2 builds are available from bioconda, it would be better for folks to compile themselves, and for that to be made as easy as possible (could we provide pre-compiled binaries?)
Regarding performance, actually, rosetta 2 is pretty amazing in my experience. Even through translation, the M1 (Pro/Max) often outperforms the previous top-end MacBook Pros running i9. My understanding is that rosetta 2 directly translates many of the x86 intrinsics to native Neon intrinsics (or whatever special instructions the M architecture has). While I agree that compilation isn't difficult, I also have a lot of prior experience telling me that my making that statement is very different than the experience a biologist who doesn't focus on software/methods trying to build my tool will have.
Of course, I absolutely understand if you think supporting bioconda builds that run on M1/2 isn't of sufficient priority to warrant effort at this point — we could ask the bioconda people what their path forward and intended timeline is. On the other hand, it would be nice to know what is the delta between what
march=native
offers and what instructions are actually useful / necessary. It may be that we can remove that flag in Conda builds, explicitly specify the instructions we want, and get little-to-no performance degradation and the ability to distribute something via Conda that works on all platforms (which makes it trivial for people to use both locally and on a cluster).The text was updated successfully, but these errors were encountered: