Skip to content

Commit

Permalink
Documentation: Fix various inconsistencies in documentation
Browse files Browse the repository at this point in the history
Pivotal Tracker: 58478260

Additional authors:
    - Hai Qian <[email protected]>
    - Shengwen Yang <[email protected]>
    - Xixuan Feng <[email protected]>

Changes:
    - Complete the Release notes
    - Gppkg version number set to 1.8
    - Fix various documentation errors in multiple modules
    - Changed incorrect function declaration in margins_mlogregr
  • Loading branch information
Rahul Iyer committed Nov 25, 2013
1 parent 5480d2f commit 24d9fba
Show file tree
Hide file tree
Showing 15 changed files with 606 additions and 347 deletions.
64 changes: 64 additions & 0 deletions ReleaseNotes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,70 @@ A complete list of changes for each release can be obtained by viewing the git
commit history located at https://github.com/madlib/madlib/commits/master.

Current list of bugs and issues can be found at http://jira.madlib.net.
--------------------------------------------------------------------------------
MADlib v1.4

Release Date: 2013-Nov-25

New Features:
* Improved interface for Multinomial logistic regression:
- Added a new interface that accepts an 'output_table' parameter and
stores the model details in the output table instead of returning as a struct
data type. The updated function also builds a summary table that includes
all parameters and meta-parameters used during model training.
- The output table has been reformatted to present the model coefficients
and related metrics for each category in a separate row. This replaces the
old output format of model stats for all categories combined in a
single array.
* Variance Estimators
- Added Robust Variance estimator for Cox PH models (Lin and Wei, 1989).
It is useful in calculating variances in a dataset with potentially
noisy outliers. Namely, the standard errors are asymptotically normal even
if the model is wrong due to outliers.
- Added Clustered Variance estimator for Cox PH models. It is used
when data contains extra clustering information besides covariates and
are asymptotically normal estimates.
* NULL Handling:
- Modified behavior of regression modules to 'omit' rows containing NULL
values for any of the dependent and independent variables. The number of
rows skipped is provided as part of the output table.
This release includes NULL handling for following modules:
- Linear, Logistic, and Multinomial logistic regression, as well as
Cox Proportional Hazards
- Huber-White sandwich estimators for linear, logistic, and multinomial
logistic regression as well as Cox Proportional Hazards
- Clustered variance estimators for linear, logistic, and multinomial
logistic regression as well as Cox Proportional Hazards
- Marginal effects for logistic and multinomial logistic regression

Deprecated functions:
- Multinomial logistic regression function has been renamed to
'mlogregr_train'. Old function ('mlogregr') has been deprecated,
and will be removed in the next major version update.

- For all multinomial regression estimator functions (list given below),
changes in the argument list were made to collate all optimizer specific
arguments in a single string. An example of the new optimizer parameter is
'max_iter=20, optimizer=irls, precision=0.0001'.
This is in contrast to the original argument list that contained 3 arguments:
'max_iter', 'optimizer', and 'precision'. This change allows adding new
optimizer-specific parameters without changing the argument list.
Affected functions:
- robust_variance_mlogregr
- clustered_variance_mlogregr
- margins_mlogregr

Bug Fixes:
- Fixed an overflow problem in LDA by using INT64 instead of INT32.
- Fixed integer to boolean cast bug in clustered variance for logistic
regression. After this fix, integer columns are accepted for binary
dependent variable using the 'integer to bool' cast rules.
- Fixed two bugs in SVD:
- The 'example' option for online help has been fixed
- Column names for sparse input tables in the 'svd_sparse' and
'svd_sparse_native' functions are no longer restricted to 'row_id',
'col_id' and 'value'.

--------------------------------------------------------------------------------
MADlib v1.3

Expand Down
3 changes: 1 addition & 2 deletions deploy/gppkg/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
# Packaging for Greenplum's gppkg
# ------------------------------------------------------------------------------

# set(MADLIB_GPPKG_VERSION "ossv1.4_pv1.7.2_gpdb4.2")
set(MADLIB_GPPKG_VERSION "1.7.2")
set(MADLIB_GPPKG_VERSION "1.8")
set(MADLIB_GPPKG_RELEASE_NUMBER 1)
set(MADLIB_GPPKG_RPM_SOURCE_DIR
"${CMAKE_BINARY_DIR}/_CPack_Packages/Linux/RPM/${CPACK_PACKAGE_FILE_NAME}"
Expand Down
128 changes: 64 additions & 64 deletions src/ports/postgres/modules/regress/clustered_variance.py_in

Large diffs are not rendered by default.

Loading

0 comments on commit 24d9fba

Please sign in to comment.