MADlib v1.6
Release Date: 2014-June-30
New features:
- Added a new unified 'margins' function that computes marginal effects for
linear, logistic, multilogistic, and cox proportional hazards regression. The
new function also introduces support for interaction terms in the independent
array. - Updated convergence for 'elastic_net_train' by checking the change in the
loglikelihood instead of the l2-norm of the change in coefficients. This allows
for faster convergence in problems with multiple optimal solutions.
The default threshold for convergence has been reduced from 1e-4 to 1e-6. - Added a new helper function to convert categorical variables to indicator
variables which can be used directly in regression methods. The function
currently only supports dummy encoding. - Improved performance for cox proportional hazards: average improvement of
20 fold on GPDB and 2.5 fold on HAWQ. - Improved performance on ARIMA by 30%.
- Added new functionality to export linear and logistic regression models as a
PMML object. The new module relies on PyXB to create PMML elements. - Added a function ('array_scalar_add') to 'add' a scalar to an array.
- Added 'numeric' type support for all functions that take 'anyarray' as
argument. - Made usability and aesthetic enhancements to documentation.
Bug Fixes:
- Prepended python module name to sys.path before executing madlib function
to avoid conflicts with user-defined modules. - Added a check in K-Means to ensure dimensionality of all data points are
the same and also equal to the dimensionality of any provided initial centroids
(MADLIB-713, MADLIB-789). - Added a check in multinomial regression to quit early and cleanly if model
size is greater than the maximum permissible memory (MADLIB-667). - Fixed a minor bug with incorrect column names in the decision trees module
(MADLIB-763). - Fixed a bug in Kmeans that resulted in incorrect number of centroids for
particular datasets (MADLIB-857). - Fixed bug when grouping columns have same name as one of the output table
column names (MADLIB-833).
Deprecated Functions:
- Modules profile and quantile have been deprecated in favor of the 'summary'
function. - Module 'svd_mf' has been deprecated in favor of the improved 'svd' function.
- Functions 'margins_logregr' and 'margins_mlogregr' have been deprecated in
favor of the 'margins' function.