This is the timetable we followed last year. This year at the start we'll interweave learning to program, so the exact details may change especially at the beginning.
- Intro to class and pretest
- Homework: skills
- Get set up with Git and GitHub.
- Homework: reading
- How algorithms unite data science
- Structured programming
- The inter-relation of hypotheses, models, and algorithms
- 3 algorithm classes: model, training, inference
- Homework: skills
- Git and GitHub: stage, commit, push, backtrack
- Homework: reflection/analysis
- Linear regression workflow
- Homework: mini lecture
- Algorithms in data science: 3 classes
- Analysis workflows
- Model algorithms intro
- Simple linear model in math and code
- Design matrix and parameterization
- Linear model shorthand
- Homework: mini lecture
- Training algorithms intro
- Class: Code a grid search training algorithm
- Optimization algorithms
- Nelder-Mead simplex algorithm using
optim()
- SSQ QR decomposition algorithm used in
lm()
- Nelder-Mead simplex algorithm using
- Inference algorithms intro
- Inference problems/goals
- Looking back or looking forward
- Looking back: considering the ways data could have happened
- Frequentist inference algorithms
- The sampling distribution algorithm - considering the ways a sample could have happened
- Plug in principle
- Inference algorithms for
lm()
simple linear model - Confidence intervals from the sampling distribution
- Coverage algorithm
- P-value algorithm
- Prediction intervals
- Homework:
- Coding optimization algorithms
- Coding confidence and prediction intervals
- Homework: skills
- Git and GitHub: branch, merge, amend, .gitignore, GUI
- Using code styles
- Homework: reading
- Bootstrap algorithm
- Bootstrap algorithm
- Non-parametric, empirical, parametric
- Bootstrapped confidence interval
- Bootstrapped p-value
- Homework:
- Coding bootstrap algorithms
-
Homework: skills
- Reproducible science
- Writing markdown documents
-
Homework: reading
- Pitfalls of p-values
- Conceptual foundations of likelihood
- Likelihood inference for the linear model
-
Homework: mini lecture
- Likelihood
-
Homework: problem set
- Likelihood problems from McElreath Ch 2
-
Likelihood inference
- Conditional probability: P(y|theta)
- Likelihood principle, likelihood function, likelihood ratio
- Training algorithm: maximum likelihood
- Inference algorithm: likelihood profiles
- Pure likelihood intervals
-
Homework:
- Coding likelihood intervals for the linear model
- Homework: skills
- Practice git branch and merge with your own code
- Homework: reading and mini lecture
- Bayes' theorem, prior, posterior
- Homework: problem set
- Bayesian problems from McElreath Ch 2
- Homework: reading & active coding
- Bayesian model notation
- Bayesian inference algorithm
- Coding problems from McElreath Ch 4
- Bayesian inference algorithms
- Posterior sampling
- Histograms as approximations to posterior distribution
- Credible intervals: central posterior interval, highest posterior density interval
- Prediction intervals
- Inference for linear models
-
Homework: skills
- Reproducible workflow
- Render an R script to markdown
-
MCMC algorithms for Bayesian training
- Metropolis-Hastings, Gibbs sampling, Hamiltonian Monte Carlo
- Stan, rstan, rstanarm
- MCMC diagnostics
-
Further Bayesian topics
- Choosing priors
- Posteriors for derived quantities
-
Homework: coding
- McElreath Ch 8 problems
-
Class: ant species richness dataset
-
Homework: skills
- Visualization
- Theory and concepts behind ggplot
- Using ggplot
-
Model diagnostics
- Checking model assumptions
- Assessing training robustness
- Residuals, influence, leverage, outliers, QQ plots, nonlinearity, non-constant variance
- Case deletion diagnostics for influence
-
Homework: coding
- Diagnostics for the naive ants linear model
-
Generalized linear models (GLMs)
- Link functions
- Data distributions in the exponential family
- Homework: skills
- Data manipulation with
dplyr
- Data manipulation with
- GLMs: ant data case study
- Frequentist/likelihood analysis using
glm()
- Bayesian analysis using
stan_glm()
- Frequentist/likelihood analysis using
- Homework: skills
- Tidy data
- Homework: individual project
- Preproposal
- Data visualization
- Exploratory data analysis of hierarchical data
- Using
dplyr
andggplot
- Model formulae in R
- Shorthand notation for linear model design matrix
- Introduction to multilevel models (Normal, linear)
- Grouping variables
- Fixed vs random effects
- Partial pooling and shrinkage
- Using
lmer()
andstan_lmer()
- Workflow algorithm
- Sketch the data design
- Write the equations
- Linear model syntax
Homework: reading
- Radon multilevel case study (Gelman & Hill Ch 12)
- Homework: skills
- R markdown
- Homework: individual project
- Project proposal
- Begin EDA using dplyr and ggplot
- Radon multilevel case study
- Likelihood and Bayesian algorithms
- Predictor variables at different scales
- Alternative parameterizations
- Credible and prediction intervals
- Homework: skills
- Using LaTeX to write equations
- Homework: individual project
- Modeling predictors at different scales
- Scales of grouping & predictor variables
- Sketch data structure, write math, write R model formulae
- Modeling predictors at different scales
- Multiple nested scales: math and R formulae
- How multilevel models account for autocorrelation
- Generalized linear mixed models (GLMM)
- Accounting for overdispersion using multilevel models
- Case study: revisiting the ants data as multilevel
- Math and R formulae
- Using
glmer()
andstan_glmer()
- Homework: individual project
- Fit multilevel model
- Homework: reading & exercises
- Experimental design
- Binomial GLMM
- Study design: experimental design and sampling design
- Replication, randomization, control
- Multiscale experimental and sampling design (randomized blocks, split plots etc)
- Grouping models for multilevel designs
- Homework: individual project
- Draft report
- Homework: reading
- Cross validation
- Model selection
- Model selection from various perspectives (frequentist, Bayesian, IC)
- CV, AIC, BIC, WAIC, LOOIC
- Introduction to machine learning
- Cross validation algorithm
- Either individual meetings about project or maybe these topics:
- Priors in multilevel models
- Zero inflated models
- Fake data simulation of multilevel designs
- Posterior predictive check in Bayesian (and potentially frequentist) models
- Leave-one-out influence check
- Homework: individual project
- Advanced diagnostics
- Fake data simulation
- Posterior predictive check
- LOO influence check
- Advanced diagnostics
I am currently designing a full two-semester sequence of this course and am writing a textbook Data Science for Ecology. These topics will be covered in the book and a selection in future semesters and seminars:
- Bayes factors
- Permutation inference algorithms
- rstan - beyond GLMMs, general hierarchical models
- Machine learning (see my course website from Spring 2022)
- regression trees, random forest, boosting, neural networks & deep learning
- Time series analysis
- Spatio-temporal models
- Structural equation models
- Unsupervised learning (aka "multivariate statistics")
- Ecological forecasting (see my course website from Spring 2020)