-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SIMD #30
Comments
Would this break GopherJS compatibility? I'm starting to rely on mathgl as my vector library when developing using WebGL, so it'd be really nice to be able to continue to do that. Currently:
|
I'll definitely add either a buildtag or an extra make command that does On Thu, Sep 18, 2014 at 9:50 PM, Dmitri Shuralyov [email protected]
|
Off topic: Woah @ GopherJS |
Other note: apparently SIMD for a Mat3 determinant is also not worth it, though this is because of the necessary passthrough to another function (you can't define assembly on a pointer receiver). |
Auto-vectorization in the compiler would be the ultimate ideal, but one could also make a code generation tool to vectorize your math heavy functions into assembly routines. Edit: This would be a tool for the mathgl user, to vectorize entire algorithms. Not for vectorizing each individual Mat4 routine, etc. |
@Jragonmiris can you post your SIMD code ? |
Let me see if I can dig it up On Wed, Oct 21, 2015 at 1:56 PM Olivier Gagnon [email protected]
|
This is a miscellaneous issue to adding SIMD. I've been doing a lot of work, and what's become clear is that adding SIMD is something that requires a lot of profiling, this may take a while to come to fruition.
Miscelaneous things I've found:
Using explicit assembly (SIMD or not) for anything on a Vec3 or smaller is NOT WORTH IT because of compiler inlining. This includes the cross product, but I haven't checked vector Len yet. If you disable all compiler optimization (-N) it's usually an improvement. I suppose theoretically, if you could convince the compiler to magically inline your SIMD it would work fine, but you can't so...
(Yes, I know SIMD loads 4 values at a time, you can interleave them and use junk slots for Vec3 and Vec2. I figured it was worth experimenting with)
However, the improvements gained on a Vec4 are big enough to be worth it. Combined with pointers it's a massive improvement (10 ns/op to just over 1 ns/op for some simple operations like addition).
Matrices are still a work in progress, but I'm fairly confident I can do some magic with 4x4 matrix inversion and possibly determinants. We'll see if it matters for 3x3. 4x4 Matrix multiplcation can probably be improved simply by adding a SIMD dot product and using the dot on Row/Col instead of writing out the operation like we're doing now.
The text was updated successfully, but these errors were encountered: