MADlib v1.7.1
Release Date: 2015-March-18
New features:
- Random Forest Performance Improvement
- Function forest_train() is 1.5X ~ 4X faster without variable importance,
and up to 100X faster with variable importance - Function forest_predict() is up to 10X faster when type='response'
- Allow user-specified sample ratio to train with a small subsample
- Function forest_train() is 1.5X ~ 4X faster without variable importance,
- Gaussian Naive Bayes: allow continuous variables
- K-Means: Allow user-specified sample ratio for K-means++ seeding
- Miscellaneous
- Array functions: array_square() for element-wise square, madlib.sum()
for array element-wise aggregation - Madpack does not require password when not necessary (MADLIB-357)
- Platform support of PostgreSQL 9.4 and HAWQ 1.3
- Allow views and materialized views for training functions
- Support quantile computation in summary functions for HAWQ and PG 9.4
- Array functions: array_square() for element-wise square, madlib.sum()
Bug fixes:
- Fixed the support of multiple parameter values and NULL in general
cross-validation (MADLIB-898, MADLIB-896) - Fixed infinite loop when detecting recursive view-to-view dependencies for
upgrading (MADLIB-901) - Allow user-specified column names in PCA and multinom_predict()
Known issues:
- Performance for decision tree with cross-validation is poor on a HAWQ
multi-node system.