This is an implementation of the method described in "Computing the Singular Value Decomposition of 3x3 matrices with minimal branching and elementary floating point operations". I implemented this as part of a group project for a computer graphics course.
Execution time per svd call on the CPU is about 2.0 microseconds. Tested on a AMD Phenom(tm) II X4 965 Processor.
Execution time on the GPU is about 174 microseconds. Tested on a NVIDIA GeForce GTX 460 (profiled using nvvp).
Also included are routines for diagonalization / QR decomposition of 3x3 matrices, which may be useful in their own right.
##Usage
Just include the header file and you are good to go!
#include "svd3.hpp"
float a11, a12, a13, a21, a22, a23, a31, a32, a33;
a11= -0.558253; a12 = -0.0461681; a13 = -0.505735;
a21 = -0.411397; a22 = 0.0365854; a23 = 0.199707;
a31 = 0.285389; a32 =-0.313789; a33 = 0.200189;
float u11, u12, u13,
u21, u22, u23,
u31, u32, u33;
float s11, s12, s13,
s21, s22, s23,
s31, s32, s33;
float v11, v12, v13,
v21, v22, v23,
v31, v32, v33;
svd(a11, a12, a13, a21, a22, a23, a31, a32, a33,
u11, u12, u13, u21, u22, u23, u31, u32, u33,
s11, s12, s13, s21, s22, s23, s31, s32, s33,
v11, v12, v13, v21, v22, v23, v31, v32, v33);
See the included Mathematica notebook for derivations of numerical shortcuts.
MIT License, Eric V. Jang 2014