-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
181 lines (144 loc) · 8.26 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
---
title: "SLSEDesign: Optimal designs using the second-order Least squares estimator"
author: |
| *Chi-Kuang Yeh, Julie Zhou, Jason Hou-Liu*
|
date: "*`r format(Sys.time(), '%B %d, %Y')`*"
output: github_document
---
\newcommand{\cov}{\mathbb{c}cov}
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
<!-- badges: start -->
[![CRAN
status](https://www.r-pkg.org/badges/version/SLSEdesign)](https://CRAN.R-project.org/package=SLSEdesign)
[![R-CMD-check](https://github.com/chikuang/SLSEdesign/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/chikuang/SLSEdesign/actions/workflows/R-CMD-check.yaml)
[![CRAN download](http://cranlogs.r-pkg.org/badges/grand-total/SLSEdesign?color=blue)](https://cran.r-project.org/package=SLSEdesign)
[![](https://img.shields.io/github/languages/code-size/chikuang/SLSEdesign.svg)](https://github.com/chikuang/SLSEdesign)
<!-- badges: end -->
## Description
This is a package to compute the optimal regression design under the second-order Least squares estimator
## Installation
SLSEdesign is now available on [CRAN](https://cran.r-project.org/). Hence you may install it by typing
```r
install.packages("SLSEdesign")
```
or you may download the develop version by typing
```r
devtools::install_github("chikuang/SLSEdesign") # or pak::pkg_install("chikuang/SLSEdesign")
library(SLSEdesign)
```
## Details
Consider a general regression model,
$$y_i=\eta(\mathbf{x}_i, \mathbf{\theta})+ \epsilon_i, \quad i=1, \ldots, n,$$
where $y_i$ is the $i$-th observation of a response variable $y$ at design point
$\mathbf{x}_i \in S \subset \mathbb{R}^p$, $S$ is a design space,
$\mathbf{\theta} \in \mathbb{R}^q$ is the unknown regression parameter vector,
response function $\eta(\mathbf{x}_i, \mathbf{\theta})$ can be a linear or nonlinear
function of $\mathbf{\theta}$, and the errors $\epsilon_i$ are assumed to be uncorrelated with mean zero and finite variance $\sigma^2$.
Let $\hat{\mathbf{\theta}}$ be an estimator of $\mathbf{\theta}$, such as the least squares estimator. Various optimal designs are defined by minimizing $\phi \left( \mathbb{c}ov(\hat{\mathbf{\theta}}) \right)$
over the design points
$\mathbf{x}_1, \ldots, \mathbf{x}_n$, where function $\phi(\cdot)$ can be determinant, trace, or other scalar functions. The resulting designs are called optimal exact designs (OEDs), which depend on the response function $\eta(\cdot,\cdot)$, the design space $S$, the estimator $\hat{\mathbf{\theta}}$, the scalar function $\phi(\cdot)$, and the number of points $n$.
Second order least-squares estimator is defined as
```math
(\boldsymbol{\hat{\theta}}^\top,\hat{\sigma}^2)^\top:=\underset{\boldsymbol{\theta},\sigma^2}{\mathrm{argmin}}\sum_{i=1}^n \begin{pmatrix}
y_i-\eta(\boldsymbol{x}_i;\boldsymbol{\theta})\\
y_i^2-\eta^2(\boldsymbol{x}_i;\boldsymbol{\theta})-\sigma^2
\end{pmatrix}^\top W(\boldsymbol{x_i}) \begin{pmatrix}
y_i-\eta(\boldsymbol{x_i};\boldsymbol{\theta})\\
y_i^2-\eta^2(\boldsymbol{x_i};\boldsymbol{\theta})-\sigma^2
\end{pmatrix}.
```
### Comparison between ordinary least-squares and second order least-squares estimators
Note that $`W(\boldsymbol{x_i})`$ is a $`2\times 2`$ non-negative semi-definite matrix which may or may not depend on $\boldsymbol{x_i}$ (Wang and Leblanc, 2008). It is clear that SLSE is a natural extension of the OLSE which is defined based on the first-order difference function (i.e. $`y_i-\mathbb{E}[y_i]=y_i-\eta(\boldsymbol{x_i};\boldsymbol{\theta})`$). On the other hand, SLSE is defined using not only the first-order difference function, but also second-order difference function (i.e. $`y_i^2-\mathbb{E}[y_i^2]=y_i^2-(\eta^2(\boldsymbol{x_i};\boldsymbol{\theta})+\sigma^2))`$. One might think about the downsides of SLSE after discussing its advantages over OLSE. SLSE does have its disadvantages. It is not a linear estimator, and there is no closed-form solution. It requires more computational resources compared to OLSE due to its nonlinearity. However, numerical results can be easily computed for SLSE nowadays. As a result, SLSE is a powerful alternative estimator to be considered in research studies and real-life applications.
In particular, if we set the skewness parameter $t$ to be zero, the resulting optimal designs under SLSE and OLSE **will be the same**!
## Examples
#### D-optimal design of the 3rd order polynomial regression model
$$
y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 +\varepsilon_i
$$
A partial derivative of the mean function is required:
```r
poly3 <- function(xi,theta){
matrix(c(1, xi, xi^2, xi^3), ncol = 1)
}
```
We first calculate the D-optimal design when the skewness parameter `t` is set to be zero. The resulting D-optimal design should be the same as the optimal design under the ordinary least-squares estimator.
```r
my_design <- Dopt(N = 31, u = seq(-1, 1, length.out = 31),
tt = 0, FUN = poly3, theta = rep(1, 4), num_iter = 500)
my_design$design
# location weight
# 1 -1.0 0.2615264
# 10 -0.4 0.2373288
# 22 0.4 0.2373288
# 31 1.0 0.2615264
my_design$val
# 5.133616
```
Now we look at the situation when the skewness parameter `t` is in the interval (0, 1], for instance, $0.7$.
```r
my_design <- Dopt(N = 31, u = seq(-1, 1, length.out = 31),
tt = 0.7, FUN = poly3, theta = rep(1, 4), num_iter = 500)
my_design$design
# location weight
# 1 -1.0 0.2714088
# 10 -0.4 0.2287621
# 22 0.4 0.2287621
# 31 1.0 0.2714088
my_design$val
# 6.27293
```
Add equivalence theorem plot for D-optimal design:
```r
design = data.frame(location = c(-1, -0.447, 0.447, 1),
weight = rep(0.25, 4))
u = seq(-1, 1, length.out = 201)
plot_direction_Dopt(u, design, tt=0, FUN = poly3,
theta = rep(0, 4))
```
<img src="man/fig/README-demo-equivalence.png" width="50%" />
#### D-optimal design of the 3rd order polynomial regression model without intercept
$$
y_i = \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 +\varepsilon_i
$$
In the last example, the support points did not change as `t` increases. However, it is not always the case, and. the optimal design may be depending on `t`.
```r
poly3_no_intercept <- function(xi, theta){
matrix(c(xi, xi^2, xi^3), ncol = 1)
}
my_design <- Dopt(N = 31, u = seq(-1, 1, length.out = 31),
tt = 0, FUN = poly3_no_intercept, theta = rep(1, 3), num_iter = 500)
my_design$design
# location weight
# 1 -1.0 0.3275005
# 7 -0.6 0.1565560
# 25 0.6 0.1565560
# 31 1.0 0.3275005
my_design$val
# 3.651524
my_design <- Dopt(N = 31, u = seq(-1, 1, length.out = 31),
tt = 0.9, FUN = poly3_no_intercept, theta = rep(1, 3), num_iter = 500)
my_design$design
# location weight
# 1 -1.0 0.2888423
# 10 -0.4 0.2096781
# 22 0.4 0.2096781
# 31 1.0 0.2888423
my_design$val
# 4.892601
```
## TODO
+ [x] Version update for the develop version
+ [x] Improve the computational speed by vectorizing the code, and remove loops
+ [x] Remove the unnecessary dependence: tibble, gridExtra, graphics and stats
+ [ ] Add c-optimality criterion
+ [ ] Python and Julia version of the package, which are expected to be faster than in R
+ [ ] Merge the functions that compute the directional derivatives. Maybe adding an extra argument to indicate the design criterion used.
## Reference
1. Gao, Lucy L. and Zhou, Julie. (2017). [D-optimal designs based on the second-order least squares estimator](https://link.springer.com/article/10.1007/s00362-015-0688-9). *Statistical Papers*, 58, 77–94.
2. Gao, Lucy L. and Zhou, Julie. (2014). [New optimal design criteria for regression models with asymmetric errors](https://www.sciencedirect.com/science/article/pii/S037837581400007X). *Journal of Statistical Planning and Inference*, 149, 140-151.
3. Wang, Liqun and Leblanc, Alexandre. (2008). [Second-order nonlinear least squares estimation](https://link.springer.com/article/10.1007/s10463-007-0139-z). *Annals of the Institute of Statistical Mathematics*, 60, 883–900.
4. Yeh, Chi-Kuang and Zhou, Julie. (2021). [Properties of optimal regression designs under the second-order least squares estimator](https://link.springer.com/article/10.1007/s00362-018-01076-6). *Statistical Papers*, 62, 75–92.
5. Yin, Yue and Zhou, Julie. (2017). [Optimal designs for regression models using the second-order least squares estimator](https://www.jstor.org/stable/26384103). *Statistica Sinica*, 27, 1841-1856.