MADlib v1.1
Release Date: 2013-August-09
New Features:
- Singular Value Decomposition:
- Added Singular Value Decomposition using the Lanczos bidiagonalization
iterative method to decompose the original matrix into PBQ^t, where B is
a bidiagonalized matrix. We assume that the original matrix is too big to
load into memory but B can be loaded into the memory. B is then further
decomposed into XSY^T using Eigen's JacobiSVD function. This restricts the
number of features in the data matrix to about 5000. - This implementation provides SVD (for dense matrix), SVD_BLOCK (also for
dense matrix but faster), SVD_SPARSE (convert a sparse matrix into a
dense one, slower) and SVD_SPARSE_NATIVE (directly operate on the sparse
matrix, much faster for really sparse matrices).
- Added Singular Value Decomposition using the Lanczos bidiagonalization
- Principal Component Analysis:
- Added a PCA training function that generates the top-K principal
components for an input matrix. The original data is mean-centered by the
function with the mean matrix returned by the function as a separate table. - The module also includes the projection function that projects a test data
set to the principal components returned by the train function.
- Added a PCA training function that generates the top-K principal
- Linear Systems:
- Added a module to solve linear system of equations (Ax = b).
- The module utilizes various direct methods from the Eigen library for
dense systems. Given below is a summary of the methods (more details at
http://eigen.tuxfamily.org/dox-devel/group__TutorialLinearAlgebra.html):- Householder QR
- Partial Pivoting LU
- Full Pivoting LU
- Column Pivoting Householder QR
- Full Pivoting Householder QR
- Standard Cholesky decomposition (LLT)
- Robust Cholesky decomposition (LDLT)
- The module also includes direct and iterative methods for sparse linear
systems:
Direct:
- Standard Cholesky decomposition (LLT)
- Robust Cholesky decomposition (LDLT)
Iterative:
- In-memory Conjugate gradient
- In-memory Conjugate gradient with diagonal preconditioners
- In-memory Bi-conjugate gradient
- In-memory Bi-conjugate gradient with incomplete LU preconditioners
Bug fixes and other changes:
- Robust input validation:
- Validation of input parameters to various functions has been improved to
ensure that it does not fail if double quotes are included as part of the
table name.
- Validation of input parameters to various functions has been improved to
- Random Forest
- The ID field in rf_train has been expanded from INT to BIGINT (MADLIB-764)
- Various documentation updates:
- Documentation updated for various modules including elastic net, linear
and logistic regression.
- Documentation updated for various modules including elastic net, linear