Skip to content

Latest commit

 

History

History
156 lines (108 loc) · 8.1 KB

README.md

File metadata and controls

156 lines (108 loc) · 8.1 KB

tmleCommunity

Targeted Maximum Likelihood Estimation for Hierarchical Data

Summary

The tmleCommunity package performs targeted minimum loss-based estimation (TMLE) of the average causal effect of community-based intervention(s) at a single time point on an individual-based outcome of interest. It provides three approaches to analyze hierarchical data: community-level TMLE, inidividual-level TMLE and stratified TMLE. Implementations of the inverse-probability-of-treatment-weighting (IPTW) and the G-computation formula (GCOMP) are also available for each approach. The user-supplied arbitrary intervention can be either binary, categorical or continuous, also supporting univariate and multivariate setting.

Details

As a double-robust and asymptotically efficient substitution estimator that respects global constraints of the statistical model, targeted maximum likelihood (or minimum loss-based) estimation (TMLE) provides asymptotically valid statistical inference, with potential reduction in bias and gain in efficiency. The development of the tmleCommunity package for R was motivated by the increasing demand of a user-friendly tool to estimate the impact of community-based arbitrary exposures in community-independent data structures with a semi-parametric efficient estimator. Besides, the esimation results of TMLE, IPTW and GCOMP, the statistical inference (Standard errors, t statistc, p-value and confidence intervals) of both TMLE and IPTW are provided based on the corresponding influence curve, respectively. Optional data-adaptive estimation of exposure and outcome mechanisms using the SuperLearner package (stacked regression model), sl3 package (Super Learner algorithm for ensemble learning and model stacking with pipelines), h2o and h2oEnsemble packages (optimized for doing 'in memory' processing of distributed, parallel machine learning algorithms on clusters) is strongly recommended.

tmleCommunity is under active development so please submit any bug reports or feature requests to the issue queue, or email Chi directly.

Installation

Install Development Version

The following are two ways that you can install the development version of the tmleCommunity package.

  • Install directly from GitHub in R using remotes::install_github():
# install.packages("remotes")
library(remotes)
remotes::install_github("chizhangucb/tmleCommunity")

If the installation fails, you may need to install h2oEnsemble and sl3 packages mannually by

# h2oEnsemble 
install.packages("https://h2o-release.s3.amazonaws.com/h2o-ensemble/R/h2oEnsemble_0.2.1.tar.gz", repos = NULL)
# sl3
remotes::install_github("tlverse/sl3")

Alternatively, you could download the entire packge to the local path (e.g., Desktop) by either clicking the (green) download button on the this page, or cloning the repo through terminal via the following code

git clone https://github.com/chizhangucb/tmleCommunity

Then open RStudio and run the following code

# set the working directory to the directory where tmleCommunity pacakge is stored
setwd("some_path/tmleCommunity")

# 1. If you only want to use the package instead of installing it in R library, use 
devtools::load_all()

# 2. If you want to install it, then uze
devtools::install()
library(tmleCommunity)

CRAN

Forthcoming 2018

Documentation

Once the package is installed, please refer to the help file ?'tmleCommunity-package' and tmleCommunity function documentation for details and examples.

?'tmleCommunity-package'
?tmleCommunity

Example

We will use the sample dataset (E=(E1,E2),W=(W1,W2,W3),A,Y) that come along with the package:

data(comSample.wmT.bA.bY_list)  # load the sample data 
comSample.wmT.bA.bY <- comSample.wmT.bA.bY_list$comSample.wmT.bA.bY
N <- NROW(comSample.wmT.bA.bY)

Estimating the additive treatment effect (ATE) for two deterministic interventions (f_gstar1 = 1 vs f_gstar2 = 0) via community-level / individual-level analysis.

# speed.glm using correctly specified Qform, hform.g0 and hform.gstar;
Qform.corr <- "Y ~ E1 + E2 + W2 + W3 + A" # correct Q form
gform.corr <- "A ~ E1 + E2 + W1"  # correct g

# Setting global options that may be used in tmleCommunity(), e.g., using speed.glm
tmleCom_Options(Qestimator = "speedglm__glm", gestimator = "speedglm__glm", maxNperBin = N)

# Community-level analysis without a pooled individual-level regression on outcome
tmleCom_wmT.bA.bY.1a_sglm <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "community_level", communityID = "id", pooled.Q = FALSE, 
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)


# Community-level analysis with a pooled individual-level regression on outcome
tmleCom_wmT.bA.bY.1b_sglm <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "community_level", communityID = "id", pooled.Q = TRUE, 
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)
tmleCom_wmT.bA.bY.1b_sglm$ATE$estimates

# Individual-level analysis with both individual-level outcome and treatment mechanisms
tmleCom_wmT.bA.bY.2_sglm <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "individual_level", communityID = "id", 
                Qform = Qform.corr, hform.g0 = gform.corr, hform.gstar = gform.corr)
tmleCom_wmT.bA.bY.2_sglm$ATE$estimates

If you are uncertain about the model specification of exposure and outcome mechanisms, data-adaptive estimation methods may be a better choice than parametric models.

# SuperLearner for both outcome and treatment (clever covariate) regressions
# using all parent nodes (of Y and A) as regressors (respectively)
require("SuperLearner")
tmleCom_Options(Qestimator = "SuperLearner", gestimator = "SuperLearner", 
                maxNperBin = N, SL.library = c("SL.glm", "SL.step", "SL.bayesglm"))
tmleCom_wmT.bA.bY.2_SL <- 
  tmleCommunity(data = comSample.wmT.bA.bY, Ynode = "Y", Anodes = "A", 
                WEnodes = c("E1", "E2", "W1", "W2", "W3"), f_gstar1 = 1L, f_gstar2 = 0L,
                community.step = "community_level", communityID = "id", pooled.Q = TRUE, 
                Qform = NULL, hform.g0 = NULL, hform.gstar = NULL)
tmleCom_wmT.bA.bY.2_SL$ATE$estimates

Authors

Chi Zhang, Oleg Sofrygin, Jennifer Ahern, M. J. van der Laan

References

Balzer L. B., Zheng W., van der Laan M. J., Petersen M. L. and the SEARCH Collaboration (2017). A New Approach to Hierarchical Data Analysis: Targeted Maximum Likelihood Estimation of Cluster-Based Effects Under Interference. ArXiv e-prints. 1706.02675.

Muoz, I. D. and van der Laan, M. (2012). Population Intervention Causal Effects Based on Stochastic Interventions. Biometrics, 68(2):541-549.

Sofrygin, O. and van der Laan, M. J. (2015). tmlenet: Targeted Maximum Likelihood Estima- tion for Network Data. R package version 0.1.9. https://github.com/osofr/tmlenet

van der Laan, M. (2014). Causal Inference for a Population of Causally Connected Units. Journal of Causal Inference, 2(1)

van der Laan, Mark J. and Gruber, Susan (2011). "Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome". U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 290. http://biostats.bepress.com/ucbbiostat/paper290

van der Laan, Mark J. and Rose, Sherri, "Targeted Learning: Causal Inference for Observational and Experimental Data" New York: Springer, 2011.

News

tmleCommunity 0.1.0

=====================

2018-08-14

h2oEnsemble has been removed from the list of required libraries since it's not in mainstream repositories.