Skip to content

MADlib v1.1

Compare
Choose a tag to compare
@iyerr3 iyerr3 released this 22 Mar 21:02
· 408 commits to placeholder since this release

Release Date: 2013-August-09

New Features:

  • Singular Value Decomposition:
    • Added Singular Value Decomposition using the Lanczos bidiagonalization
      iterative method to decompose the original matrix into PBQ^t, where B is
      a bidiagonalized matrix. We assume that the original matrix is too big to
      load into memory but B can be loaded into the memory. B is then further
      decomposed into XSY^T using Eigen's JacobiSVD function. This restricts the
      number of features in the data matrix to about 5000.
    • This implementation provides SVD (for dense matrix), SVD_BLOCK (also for
      dense matrix but faster), SVD_SPARSE (convert a sparse matrix into a
      dense one, slower) and SVD_SPARSE_NATIVE (directly operate on the sparse
      matrix, much faster for really sparse matrices).
  • Principal Component Analysis:
    • Added a PCA training function that generates the top-K principal
      components for an input matrix. The original data is mean-centered by the
      function with the mean matrix returned by the function as a separate table.
    • The module also includes the projection function that projects a test data
      set to the principal components returned by the train function.
  • Linear Systems:
    • Added a module to solve linear system of equations (Ax = b).
    • The module utilizes various direct methods from the Eigen library for
      dense systems. Given below is a summary of the methods (more details at
      http://eigen.tuxfamily.org/dox-devel/group__TutorialLinearAlgebra.html):
      • Householder QR
      • Partial Pivoting LU
      • Full Pivoting LU
      • Column Pivoting Householder QR
      • Full Pivoting Householder QR
      • Standard Cholesky decomposition (LLT)
      • Robust Cholesky decomposition (LDLT)
    • The module also includes direct and iterative methods for sparse linear
      systems:
      Direct:
      - Standard Cholesky decomposition (LLT)
      - Robust Cholesky decomposition (LDLT)
      Iterative:
      - In-memory Conjugate gradient
      - In-memory Conjugate gradient with diagonal preconditioners
      - In-memory Bi-conjugate gradient
      - In-memory Bi-conjugate gradient with incomplete LU preconditioners

Bug fixes and other changes:

  • Robust input validation:
    • Validation of input parameters to various functions has been improved to
      ensure that it does not fail if double quotes are included as part of the
      table name.
  • Random Forest
    • The ID field in rf_train has been expanded from INT to BIGINT (MADLIB-764)
  • Various documentation updates:
    • Documentation updated for various modules including elastic net, linear
      and logistic regression.