Release MADlib v1.1 · madlib/archived_madlib

Release Date: 2013-August-09

New Features:

Singular Value Decomposition:
- Added Singular Value Decomposition using the Lanczos bidiagonalization
  iterative method to decompose the original matrix into PBQ^t, where B is
  a bidiagonalized matrix. We assume that the original matrix is too big to
  load into memory but B can be loaded into the memory. B is then further
  decomposed into XSY^T using Eigen's JacobiSVD function. This restricts the
  number of features in the data matrix to about 5000.
- This implementation provides SVD (for dense matrix), SVD_BLOCK (also for
  dense matrix but faster), SVD_SPARSE (convert a sparse matrix into a
  dense one, slower) and SVD_SPARSE_NATIVE (directly operate on the sparse
  matrix, much faster for really sparse matrices).
Principal Component Analysis:
- Added a PCA training function that generates the top-K principal
  components for an input matrix. The original data is mean-centered by the
  function with the mean matrix returned by the function as a separate table.
- The module also includes the projection function that projects a test data
  set to the principal components returned by the train function.
Linear Systems:
- Added a module to solve linear system of equations (Ax = b).
- The module utilizes various direct methods from the Eigen library for
  dense systems. Given below is a summary of the methods (more details at
  http://eigen.tuxfamily.org/dox-devel/group__TutorialLinearAlgebra.html):
  - Householder QR
  - Partial Pivoting LU
  - Full Pivoting LU
  - Column Pivoting Householder QR
  - Full Pivoting Householder QR
  - Standard Cholesky decomposition (LLT)
  - Robust Cholesky decomposition (LDLT)
- The module also includes direct and iterative methods for sparse linear
  systems:
  Direct:
  - Standard Cholesky decomposition (LLT)
  - Robust Cholesky decomposition (LDLT)
  Iterative:
  - In-memory Conjugate gradient
  - In-memory Conjugate gradient with diagonal preconditioners
  - In-memory Bi-conjugate gradient
  - In-memory Bi-conjugate gradient with incomplete LU preconditioners

Bug fixes and other changes:

Robust input validation:
- Validation of input parameters to various functions has been improved to
  ensure that it does not fail if double quotes are included as part of the
  table name.
Random Forest
- The ID field in rf_train has been expanded from INT to BIGINT (MADLIB-764)
Various documentation updates:
- Documentation updated for various modules including elastic net, linear
  and logistic regression.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MADlib v1.1