Skip to content

Latest commit

 

History

History
131 lines (65 loc) · 27.5 KB

README.md

File metadata and controls

131 lines (65 loc) · 27.5 KB

rstats-ed

Inspired by the Using Julia in the classroom page and suggestion by @Peter_Griffin's post on RStudio Community as well as the Learn the tidyverse page.

The goal is to create a repository where one can discover courses and learning materials for learning and teaching R.

This is a user curated list and is bound to be non-comprehensive at all times. If you have suggestions for courses to add, please submit a pull request or add an issue.

In person courses

University courses teaching R

These are semester/quarter long courses taught fully, or for the most part, in an in person setting.

2020

  • Data Science in R: A Robust Toolkit For Psychological Research, by Danielle Navarro. An introductory class offered at the University of New South Wales, aimed at students with little to no background in statistics or computing. Practical scientific topics covered include data visualisation, data wrangling, document preparation with R markdown, sharing code with GitHub and project management. The programming side to the class covers data structures in R, flow control and programming with functions. For fun, there are also sections on creating generative art in R.

  • Psych 252: Statistical Methods, by Tobias Gerstenberg. Stanford University. This course offers an introduction to advanced topics in statistics with the focus of understanding data in the behavioral and social sciences. It is a practical course in which learning statistical concepts and building models in R go hand in hand. The course is organized into three parts: In the first part, we will learn how to visualize, wrangle, and simulate data in R. In the second part, we will cover topics in frequentist statistics (such as multiple regression, logistic regression, and mixed effects models) using the general linear model as an organizing framework. We will learn how to compare models using simulation methods such as bootstrapping and cross-validation. In the third part, we will focus on Bayesian data analysis as an alternative framework for answering statistical questions. [Book] [Slides]

  • Bayesian Statistics - Estatística Bayesiana, by Jose Storopoli. A course in portuguese offered at Universidade Nove de Julho - UNINOVE in São Paulo, Brazil. It is aimed for graduate students (mainly from social/humane sciences) that have basic background in statistics and no background at all in computing. The topics covered are Bayesian vs Frequentist, Linear Models, General Linear Models and Hierarchical Models. The toolset is based on rstanarm and brms packages.

2019

  • Introduction to Data Science, by Mine Çetinkaya-Rundel. University of Edinburgh. Year 1 undergraduate course for students with no background in stats, computing, or data science. Combines techniques from statistics, math, computer science, and social sciences, to learn how to use data to understand natural phenomena, explore patterns, model outcomes, and make predictions. Data wrangling, exploratory data analysis, predictive modeling, data visualization, and effective communication of results. Focuses on tidyverse. Discussions around reproducibility, data sharing, data privacy.

  • Statistical Programming by Colin Rundel. University of Edinburgh. MSc in Stats with Data Science course. Focuses on data wrangling and visualisation with the tidyverse, functional programming, simulation, and optimization.

  • Modeling Criminological Data by Juanjo Medina and Reka Solymosi. University of Manchester. 2nd year undergraduate course introducing students to inferential statistics using R & R Studio. Created by Juanjo Medina and Reka Solymosi. There is also a Graduate version (Masters level) couse along the same lines called R for Criminologists.

  • Crime Mapping in R by Juanjo Medina and Reka Solymosi. University of Manchester. Elective course available to final year Undergraduate students as well as Postgraduate Taugh students (Masters) course. The course introduces students to spatial crime analysis, includes visualising and analysing spatial data.

2018

  • R-DAVIS - R-Data Analysis & Visualization In Science - University of California, Davis; Grad Student Taught, Graduate Group in Ecology. This course content integrates and builds on Data Carpentry Ecology lessons, and is taught as part of required curriculum for graduate students in the Graduate Group in Ecology (GGE) at the UC Davis. This course provides an introduction to tidy data, project management, data manipulation, visualization and analysis, across a broad set of ecological data. We also teach Git and Github along with RStudio. The focus of this course is to provide graduate students with training that develops and teaches the tools applicable to the entire process of reproducible data-driven research and encourage the use of open-source tools. By learning how to get the computer to do your work for you, you will be able to do more science faster, and your future-self will thank you. All lectures are recorded/posted on Youtube and the content is online.

  • STA 112 - Better Living with Data Science - Duke University; Mine Çetinkaya-Rundel. Data Science course for first year undergraduates with little to no computing background. Combines techniques from statistics, math, computer science, and social sciences, to learn how to use data to understand natural phenomena, explore patterns, model outcomes, and make predictions. Data wrangling, exploratory data analysis, predictive modeling, data visualization, and effective communication of results. Discussions around reproducibility, data sharing, data privacy.

  • MAT 301 - Introduction to Probability and Statistics - City University of New York; Sebastian Hoyos-Torres. An Introduction to Probability and Statistics for undergraduates which focuses on understanding probabilistic distributions. Since calculus is not a prerequisite for this course, this course also shows users how to apply calculus conceptually to probabilistic distributions (when continuous distributions are discussed) and let R do a majority of the computations. Tidy principles are emphasized throughout the course but some base R is inevitably used.

  • MPA 630: Data Science for Public Management - Brigham Young University; Andrew Heiss. Data science and statistics class for Master of Public Administration (MPA) students with little math or computing experience. Uses ModernDive, R for Data Science, and OpenIntro Statistics to cover tidyverse data wrangling, inference, and hypothesis testing. All projects and in-class examples use data related to public affairs, administration, and policy.

  • MPA 635: Data Visualization - Brigham Young University; Andrew Heiss. Data visualization class for Master of Public Administration (MPA) students with some experience with R. Uses Alberto Cairo's The Truthful Art: Data, Charts, and Maps for Communication, Kieran Healy's Data Visualization: A Practical Introduction, Claus Wilke's Fundamentals of Data Visualization, and R for Data Science to cover principles of graphic design and fundamentals of visualizing data with ggplot2.

  • MACS 30500: Computing for the Social Sciences - University of Chicago, Benjamin Soltoff. This is an applied course for social scientists with little-to-no programming experience who wish to harness growing digital and computational resources. The focus of the course is on generating reproducible research through the use of programming languages (R) and version control software (Git). Major emphasis is placed on a pragmatic understanding of core principles of programming and packaged implementations of methods. Students will leave the course with basic computational skills implemented through many computational methods and approaches to social science; while students will not become expert programmers, they will gain the knowledge of how to adapt and expand these skills as they are presented with new questions, methods, and data.

  • ESPM 288: Reproducible & Collaborative Data Science - UC Berkeley, Carl Boettiger. Data Science course for first year graduate students in both the natural and social sciences. A modular, flipped-classroom approach that combines reading, exercises and videos based on R for Data Science and DataCamp with more open-ended assignments to replicate, extend, and sometimes challenge key results from the scientific literature on global change. Note: an upper-division undergraduate version of the course is also being developed under the title Data Science for Global Change Ecology

  • STA 523: Statistical Computing - Duke University, Colin Rundel. Statistical programming with R and its interfaces with custom code development for central statistical models. Best practices and software development for reproducible results, selecting topics from: use of markup languages, understanding data structures, design of graphics, object oriented programming, vectorized code, scoping, documenting code, profiling and debugging, building modular code, and version control- all in contexts of specific applied statistical analyses.

  • DSCI 521: Computing Platforms for Data Science - University of British Columbia, Tiffany Timbers. How to install, maintain, and use the data scientific software "stack". The Unix operating system, integrated development environments (Jupyter and RStudio), and problem solving strategies.

  • CT5102: Programming for Data Analytics - NUI Galway, Jim Duggan: A module that is part of the M.Sc. Computer Science (Data Analytics). There are twelve topics, and these will be updated as the course progresses. The course structure has main three elements: (1) Base R (Vectors, Functions, Lists, Matrices and Data Frame), (2) Data Science, with the tidyverse packages in R (ggplot2, dplyr, readr, tidyr, lubridate and stringr), (3) Advanced R, including closures, object systems (S3, S4 and RC), and building packages.

  • Microsoft Research Data Science Summer School - Microsoft Research New York, Jake Hofman. An intensive, eight-week hands-on introduction to data science for college students in the New York City area focused on increasing diversity in computer science. Students learn Git, Bash, and R, focusing on concepts in statistics, modeling, and machine learning. All coursework is available on Github. Students produce an original group research project at the end of the program. Projects from the past several years are available here, along with corresponding data and code.

  • R Module - Data Visualization Diploma - Pontifical Catholic University of Chile; Pachá, Joshua Kunst. Tidy data principles for the non-expert. This course introduces the Tidyverse and covers how to import, tidy, transform, visualize, model and communicate data. The final goal of this module is to use ggplot2 as a tool to communicate and understand data.

  • SOC 4015/5050: Quantitative Analysis - Saint Louis University, Chris Prener. This course provides an introduction to applied statistical analysis for both undergraduate and graduate students with an emphasis placed on statistical techniques that are most common in the sociological literature. Students learn Git via GitHub desktop, R, and RStudio. A heavy emphasis is placed on literate programming.

  • SOC 4650/5650: Introduction to Geographic Information Science - Saint Louis University, Chris Prener. This class introduces both the theoretical and technical skills that constitute the field of Geographic Information Science (GISc). Techniques introduced include data cleaning and management, map production and cartography, and the manipulation of both tabular and spatial data. Students learn Git via GitHub desktop, R, and RStudio as well as ArcGIS. In the 2018 edition, Lectures 01-06 and 08-11 included R specific content.

  • STATS101: Data Science 101 - Stanford University, John Duchi James Johndrow. The course provides a solid introduction to data science, both exposing students to computational tools they can proficiently use to analyze data and exploring the conceptual challenges of inferential reasoning. Each module/week represents a new “data adventure,” analyzing real datasets, exploring different questions and trying out tools. R is used. Topics include: Inference, Prediction, Machine Learning, Visualization and Numerical Summaries of data, Reproducibility.

  • STA130H1: An Introduction to Statistical Reasoning and Data Science - University of Toronto, Alison Gibbs Nathan Taback. The course gives students a broad introduction to many of the ways statisticians learn from data. In addition to statistical reasoning, learning from data involves computation and communication. The students use R programming language and environment for statistical computing, and tutorials will introduce students to communicating statistical knowledge. Topics include: data visualization, data wrangling and summarizing data, statistical testing and estimation, statistical models for description and prediction, supervised and unsupervised statistical learning, ethical issues in data collection and analysis. Class slides, notes, and other important information can be found on the course website.

  • SDS192: Introduction to Data Science - Smith College, Ben Baumer. An introduction to data science using Python, R, and SQL. Students will learn how to scrape, process, and clean data from the web; manipulate data in a variety of formats; contextualize variation in data; construct point and interval estimates using resampling techniques; visualize multidimensional data; design accurate, clear, and appropriate data graphics; create data maps and perform basic spatial analysis; and query large relational databases. Four group projects. Course website includes lecture notes and labs.

  • BIOL607: Introduction to Computational Data Analysis - University of Massachusetts Boston, Jarrett Byrnes. This course covers the basic statistical knowledge necessary for a graduate student to design, execute, and analyze a basic research project - from t-tests to multiple linear regression. It does so while using frequentist, information theoretic, and Bayesian approaches. The course aims to have students focus on thinking about the biological processes that they are studying in their research and how to translate them into statistical models. All labs are conducted in R with code examples, and slides are available in RMarkdown.

  • Recitation for BIO132: Biostatistics - Tufts University, Eric Scott and Avalon Owens. This is a required recitation for an intro biostatistics course taught by Dr. Sara Lewis. The primary goal of the recitation is to give students the tools to complete weekly problem sets with R Markdown documents and to reinforce lecture concepts through teaching students statistical concepts in R. The recitation takes a tidyverse-centric approach starting with data visualization with ggplot2 and later introducing the concept of tidy data along with dplyr and tidyr. Topics include: probability, confidence intervals, t-tests, power analysis, ANOVA, correlation, regression, and categorical data analysis.

2017

  • Modeling Social Data - Columbia University, Jake Hofman. One semester class for upper division undergraduate and first year graduate students that focuses on data-driven modeling for large-scale, social data. Material draws on statistics, computer science, and the social sciences. R is the primary language taught for the course, students gain experience collecting, cleaning, analyzing, and modeling with the tidyverse and related tools. All slides, code, and student-scribed notes are available on Github. Students complete a final project in small groups where work on an original research problem of their choice.

  • DATA101: An Introduction to Data Science - College of Charleston, Paul Anderson. Introduction to the use of computer based tools for the analysis of large data sets for the purpose of knowledge discovery. Students will learn to understand the Data Science process and the difference between deductive hypothesis-driven and inductive data-driven modeling. Students will have hands-on experience with various on-line analytical processing and data mining software and complete a project using real data. Uses R. The students participate in a Data Science competition. The github repository contains lecture notes and data.

  • BIOL355: Introduction to Data Science for Biology - University of Massachusetts Boston, Jarrett Byrnes. This course introduces undergraduates to the basic concepts of how we use data in the biological sciences. It emphasizes data creation, curation, manipulation, visualization, and some basic analyses, all using R with the tidyverse. It also teaches functional programming style and geospatial visualization. All labs are in R and slides are available in RMarkdown.

Workshops, short courses, modules

These are workshops, short courses, or modules taught fully, or for the most part, in an in person setting.

  • Coding togetheR A series of collaborative workshops to teach foundational R coding and data science skills at the University of Southampton in 2019. This book contains the materials covered over eight, two hour sessions. The workshops are for anyone at the University of Southampton with data to analyse and who is struggling with their current tools. This series of eight weekly two hour workshops provides an introduction to working with data using R in a supported environment. Unlike traditional lessons, we all code together with the emphasis on participants learning by doing and by helping each other.

  • The Carpentries: Software Carpentry offers workshops are domain-agnostic, and teach the Unix shell, coding in R or Python, and version control using Git. Data Carpentry workshops are domain-specific, and focus on teaching skills for working with data effectively and reproducibly.

  • A Jupyter + R ( + mybinder.org) tutorial for social scientists - Goldsmiths, University of London, Caspar Addyman - A self taught or one session class taught to our Masters Level Advanced Methods class. Using an online notebook shared using MyBinder, it goes through basics of editing code and markdown in Jupyter notebooks, how to host these on MyBinder with fixed date snapshots for reproducibility. It provides very simple examples of loading local or remote data files, filtering and graphing with tidyverse and running simple statistical tests using ezANOVA.

  • Introduction to R for household survey - INDEC institute; Diego Kozlowski, Guido Weksler and Natsumi Shokida. An introductory course to R and the Households survey made by de National Institute of Statistics and Censuses from Argentina. The objective is to introduce R base, the Tidyverse, and Markdown, for beening able to reproduce the mains statistics of the labor market and poverty measures. Is oriented to people coming from a background of social science, who needs to wrangle with microdata from the survey. The Course is in spanish

  • Data Science Seminar - - Saint Louis University, led by Chris Prener and Christy Garcia. We offer a series of seminars each semester on using R for a variety of tasks, including conducting reproducible research, cleaning and plotting data, making maps, and fitting linear models. Our content is available on GitHub and a full list of our seminars is available on our website.

  • Reproducible analysis of bigger, naturally-occurring datasets using R, Rmarkdown, and the tidyverse - Michael C. Frank. A workshop originally presented at Data on the Mind 2017, this tutorial is an introduction to analyzing datasets using Tidyverse code (including readr, dplyr, tidyr, and shiny). Oriented towards researchers in psychology, cognitive science, or experimental science who may be interested in learning more about how R is a "powerful tool for statistical data analysis and reproducible research."

  • FAIR-R: A fork of Software Carpentry's "Programming wiht R" lesson that suggests one possible translation of the EU's FAIR Data Principles into the R programming realm. Presented @TIBHannover's "FAIR Data and Software" workshop.

  • Getting started with R A weeklong intensive course introducing students to R and the tidyverse run as part of the Unviersity of Manchester's methods@manchester summer school. Developed by Henry Partridge and Reka Solymosi, with further contributions from Sam Langton.

Online courses

MOOCs

Massive open online courses taught on platforms like Coursera, EdX, etc.

  • Statistics with R Coursera Specialization: Mine Çetinkaya-Rundel, Merlise Clyde, Colin Rundel, David Banks. 5 courses: Introduction to Data and Probability, Inferential Statistics, Linear Regression and Modeling, Bayesian Statistics, and Capstone.

  • JHU Data Science Coursera Specialization: This Specialization covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results. In the final Capstone Project, you’ll apply the skills learned by building a data product using real-world data.

  • Chromebook Data Science: Chromebook Data Science (CBDS) is a free, massive open online educational program offered through Leanpub to help anyone who can read, write, and use a computer to move into data science.

Tutorials

Built with learnr or a similar technology.

  • RStudio Cloud Primers: RStudio Cloud is a free, cloud based version of the RStudio IDE. Packaged within RStudio Cloud are primers, which are collections of interactive tutorials made with learnr. These primers teach the basics of R and the Tidyverse.

  • To R from Stata: An Introduction: The purpose of this tutorial is to provide a relatively light introduction to R for Stata users - straight to the point (mostly), and organized around the things that Stata users already know.

  • SwirlStats: Built with learnr, swirl is an R package that enables new useRs to learn R programming and data science interactively in their RStudio consoles.

  • UCI Data Science Initiative Tutorials: The UCI Data Science Initiative has written tutorials on data analysis and visualization with R at both introductory and advanced levels. In addition to their R tutorials, the Initiative also has introductory tutorials on scientific computing and natural language processing with Python and Julia.

Other online courses

  • LinkedIn Learning offerings: LinkedIn Learning (previously lynda.com) offers beginner, intermediate, and advanced courses in R, as well as Python, SQL, Java, C, PHP, Javascript and other languages. All courses are offered as downloadable video content and are used in many universities.

  • Exercism.io's R track offers programming puzzles to solve against a provided set of test cases. Mimicking the workflow of test-driven development (TDD), Exercism emphasises iteration and refactoring. After solving a puzzle, solutions can be discussed with a mentor and peers' solutions can be reviewed. All this aims to foster the craft aspects of programming.