Skip to content

Commit

Permalink
updates to readme and minor cran description fix
Browse files Browse the repository at this point in the history
  • Loading branch information
jaredhuling committed Mar 6, 2019
1 parent a2224b4 commit 999cb6d
Show file tree
Hide file tree
Showing 4 changed files with 90 additions and 120 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: fastglm
Type: Package
Title: Fitting fast generalized linear models using 'RcppEigen'
Title: Fast and Stable Fitting of Generalized Linear Models using 'RcppEigen'
Version: 0.0.1
Authors@R: c(
person("Jared", "Huling", , "[email protected]", c("aut", "cre")),
Expand All @@ -11,7 +11,7 @@ Authors@R: c(
)
Maintainer: Jared Huling <[email protected]>
Description: Fits generalized linear models efficiently using 'RcppEigen'. The iteratively reweighted least squares
implementation utilizes the step-halving approach of Marschner (2011) <10.32614/RJ-2011-012> to help safeguard
implementation utilizes the step-halving approach of Marschner (2011) <doi:10.32614/RJ-2011-012> to help safeguard
against convergence issues.
BugReports: https://github.com/jaredhuling/fastglm/issues
License: GPL (>= 2)
Expand Down
55 changes: 31 additions & 24 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ opts_chunk$set(message = FALSE, warning=FALSE)

# Overview of 'fastglm'

The 'fastglm' package is a re-write of `glm()` using `RcppEigen` designed to be computationally efficient.
The 'fastglm' package is a re-write of `glm()` using `RcppEigen` designed to be computationally efficient and algorithmically stable.



Expand Down Expand Up @@ -69,16 +69,17 @@ ct <- microbenchmark(
autoplot(ct, log = FALSE) + stat_summary(fun.y = median, geom = 'point', size = 2)
# comparison of estimates
max(abs(coef(gl1) - gf1$coef))
max(abs(coef(gl1) - gf2$coef))
max(abs(coef(gl1) - gf3$coef))
max(abs(coef(gl1) - gf4$coef))
max(abs(coef(gl1) - gf5$coef))
c(glm_vs_fastglm_qrcpiv = max(abs(coef(gl1) - gf1$coef)),
glm_vs_fastglm_qr = max(abs(coef(gl1) - gf2$coef)),
glm_vs_fastglm_qrfpiv = max(abs(coef(gl1) - gf5$coef)),
glm_vs_fastglm_LLT = max(abs(coef(gl1) - gf3$coef)),
glm_vs_fastglm_LDLT = max(abs(coef(gl1) - gf4$coef)))
# now between glm and speedglm
max(abs(coef(gl1) - sg1$coef))
max(abs(coef(gl1) - sg2$coef))
max(abs(coef(gl1) - sg3$coef))
c(glm_vs_speedglm_eigen = max(abs(coef(gl1) - sg1$coef)),
glm_vs_speedglm_Chol = max(abs(coef(gl1) - sg2$coef)),
glm_vs_speedglm_qr = max(abs(coef(gl1) - sg3$coef)))
```

Expand All @@ -100,32 +101,38 @@ system.time(gfit2 <- glm(y~x, family = Gamma(link = "sqrt")) )
system.time(gfit3 <- glm2::glm2(y~x, family = Gamma(link = "sqrt")) )
system.time(gfit4 <- speedglm(y~x, family = Gamma(link = "sqrt")))
## speedglm appears to diverge
system.time(gfit5 <- speedglm(y~x, family = Gamma(link = "sqrt"), maxit = 500))
## Note that fastglm() returns estimates with the
## largest likelihood
logLik(gfit1)
logLik(gfit2)
logLik(gfit3)
coef(gfit1)[1:5]
coef(gfit2)[1:5]
coef(gfit3)[1:5]
c(fastglm = logLik(gfit1), glm = logLik(gfit2), glm2 = logLik(gfit3),
speedglm = logLik(gfit4), speedglm500 = logLik(gfit5))
rbind(fastglm = coef(gfit1)[1:5],
glm = coef(gfit2)[1:5],
glm2 = coef(gfit3)[1:5],
speedglm = coef(gfit4)[1:5],
speedglm500 = coef(gfit5)[1:5])
## check convergence of fastglm
gfit1$converged
## number of IRLS iterations
gfit1$iter
## check convergence of fastglm and #iterations
# 1 means converged, 0 means not converged
c(gfit1$converged, gfit1$iter)
## now check convergence for glm()
gfit2$converged
gfit2$iter
c(gfit2$converged, gfit2$iter)
## check convergence for glm2()
gfit3$converged
gfit3$iter
c(gfit3$converged, gfit3$iter)
## check convergence for speedglm()
c(gfit4$convergence, gfit4$iter, gfit5$convergence, gfit5$iter)
## increasing number of IRLS iterations for glm() does not help that much
system.time(gfit2 <- glm(y~x, family = Gamma(link = "sqrt"), control = list(maxit = 100)) )
system.time(gfit2 <- glm(y~x, family = Gamma(link = "sqrt"), control = list(maxit = 1000)) )
gfit2$converged
gfit2$iter
Expand Down
151 changes: 57 additions & 94 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Status](https://travis-ci.org/jaredhuling/fastglm.svg?branch=master)](https://tr
# Overview of ‘fastglm’

The ‘fastglm’ package is a re-write of `glm()` using `RcppEigen`
designed to be computationally efficient.
designed to be computationally efficient and algorithmically stable.

# Installing the ‘fastglm’ package

Expand Down Expand Up @@ -64,53 +64,27 @@ autoplot(ct, log = FALSE) + stat_summary(fun.y = median, geom = 'point', size =

``` r
# comparison of estimates
max(abs(coef(gl1) - gf1$coef))
c(glm_vs_fastglm_qrcpiv = max(abs(coef(gl1) - gf1$coef)),
glm_vs_fastglm_qr = max(abs(coef(gl1) - gf2$coef)),
glm_vs_fastglm_qrfpiv = max(abs(coef(gl1) - gf5$coef)),
glm_vs_fastglm_LLT = max(abs(coef(gl1) - gf3$coef)),
glm_vs_fastglm_LDLT = max(abs(coef(gl1) - gf4$coef)))
```

## [1] 2.590289e-14

``` r
max(abs(coef(gl1) - gf2$coef))
```

## [1] 2.546921e-14

``` r
max(abs(coef(gl1) - gf3$coef))
```

## [1] 1.140078e-13

``` r
max(abs(coef(gl1) - gf4$coef))
```

## [1] 1.094264e-13

``` r
max(abs(coef(gl1) - gf5$coef))
```

## [1] 2.776945e-14
## glm_vs_fastglm_qrcpiv glm_vs_fastglm_qr glm_vs_fastglm_qrfpiv
## 2.590289e-14 2.546921e-14 2.776945e-14
## glm_vs_fastglm_LLT glm_vs_fastglm_LDLT
## 1.140078e-13 1.094264e-13

``` r
# now between glm and speedglm
max(abs(coef(gl1) - sg1$coef))
```

## [1] 1.359413e-12

``` r
max(abs(coef(gl1) - sg2$coef))
c(glm_vs_speedglm_eigen = max(abs(coef(gl1) - sg1$coef)),
glm_vs_speedglm_Chol = max(abs(coef(gl1) - sg2$coef)),
glm_vs_speedglm_qr = max(abs(coef(gl1) - sg3$coef)))
```

## [1] 1.359413e-12

``` r
max(abs(coef(gl1) - sg3$coef))
```

## [1] 1.191977e-12
## glm_vs_speedglm_eigen glm_vs_speedglm_Chol glm_vs_speedglm_qr
## 1.359413e-12 1.359413e-12 1.191977e-12

# Stability

Expand All @@ -137,110 +111,99 @@ system.time(gfit1 <- fastglm(cbind(1, x), y, family = Gamma(link = "sqrt")))
```

## user system elapsed
## 0.823 0.027 0.851
## 0.811 0.021 0.839

``` r
system.time(gfit2 <- glm(y~x, family = Gamma(link = "sqrt")) )
```

## user system elapsed
## 2.976 0.120 3.103
## 2.937 0.134 3.115

``` r
system.time(gfit3 <- glm2::glm2(y~x, family = Gamma(link = "sqrt")) )
```

## user system elapsed
## 2.070 0.088 2.166
## 2.087 0.100 2.220

``` r
## Note that fastglm() returns estimates with the
## largest likelihood
logLik(gfit1)
```

## 'log Lik.' -16030.81 (df=102)

``` r
logLik(gfit2)
system.time(gfit4 <- speedglm(y~x, family = Gamma(link = "sqrt")))
```

## 'log Lik.' -16704.05 (df=102)

``` r
logLik(gfit3)
```

## 'log Lik.' -16046.66 (df=102)
## user system elapsed
## 1.686 0.050 1.740

``` r
coef(gfit1)[1:5]
## speedglm appears to diverge
system.time(gfit5 <- speedglm(y~x, family = Gamma(link = "sqrt"), maxit = 500))
```

## (Intercept) X1 X2 X3 X4
## 1.429256009 0.125873599 0.005321164 -0.129389740 0.238937255
## user system elapsed
## 34.142 1.065 35.385

``` r
coef(gfit2)[1:5]
```

## (Intercept) x1 x2 x3 x4
## 1.431168e+00 1.251936e-01 -6.896739e-05 -1.281857e-01 2.366473e-01
## Note that fastglm() returns estimates with the
## largest likelihood

``` r
coef(gfit3)[1:5]
c(fastglm = logLik(gfit1), glm = logLik(gfit2), glm2 = logLik(gfit3),
speedglm = logLik(gfit4), speedglm500 = logLik(gfit5))
```

## (Intercept) x1 x2 x3 x4
## 1.426864e+00 1.242616e-01 -9.860241e-05 -1.254873e-01 2.361301e-01
## fastglm glm glm2 speedglm speedglm500
## -16030.81 -16704.05 -16046.66 -47722.66 -57785.72

``` r
## check convergence of fastglm
gfit1$converged
rbind(fastglm = coef(gfit1)[1:5],
glm = coef(gfit2)[1:5],
glm2 = coef(gfit3)[1:5],
speedglm = coef(gfit4)[1:5],
speedglm500 = coef(gfit5)[1:5])
```

## [1] TRUE
## (Intercept) X1 X2 X3 X4
## fastglm 1.429256 0.1258736 5.321164e-03 -0.1293897 0.2389373
## glm 1.431168 0.1251936 -6.896739e-05 -0.1281857 0.2366473
## glm2 1.426864 0.1242616 -9.860241e-05 -0.1254873 0.2361301
## speedglm -22.182477 3.1784570 -2.970111e+00 -4.9709797 14.0549438
## speedglm500 -27.891929 -13.9080256 -9.690833e+00 2.7279219 -11.1458325

``` r
## number of IRLS iterations
gfit1$iter
## check convergence of fastglm and #iterations
# 1 means converged, 0 means not converged
c(gfit1$converged, gfit1$iter)
```

## [1] 17
## [1] 1 17

``` r
## now check convergence for glm()
gfit2$converged
```

## [1] FALSE

``` r
gfit2$iter
c(gfit2$converged, gfit2$iter)
```

## [1] 25
## [1] 0 25

``` r
## check convergence for glm2()
gfit3$converged
c(gfit3$converged, gfit3$iter)
```

## [1] TRUE
## [1] 1 19

``` r
gfit3$iter
## check convergence for speedglm()
c(gfit4$convergence, gfit4$iter, gfit5$convergence, gfit5$iter)
```

## [1] 19
## [1] 0 25 0 500

``` r
## increasing number of IRLS iterations for glm() does not help that much
system.time(gfit2 <- glm(y~x, family = Gamma(link = "sqrt"), control = list(maxit = 100)) )
system.time(gfit2 <- glm(y~x, family = Gamma(link = "sqrt"), control = list(maxit = 1000)) )
```

## user system elapsed
## 11.118 0.427 11.567
## 108.178 4.134 113.147

``` r
gfit2$converged
Expand All @@ -252,7 +215,7 @@ gfit2$converged
gfit2$iter
```

## [1] 100
## [1] 1000

``` r
logLik(gfit1)
Expand All @@ -264,4 +227,4 @@ logLik(gfit1)
logLik(gfit2)
```

## 'log Lik.' -16054.15 (df=102)
## 'log Lik.' -16333.99 (df=102)
Binary file modified vignettes/gen_data-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 999cb6d

Please sign in to comment.