Author: anna.pryslopska[ AT ]ling.uni-stuttgart.de
These are the original materials from the course "Digital Research Toolkit for Linguists taught by me in the Summer Semester 2024 at the University of Stuttgart. The materials will be updated weekly. Identifying information of the in-class participants will be removed, so some slides, data or exercises may be missing.
You are more than welcome to follow along but I will not be able to grade or evaluate your homework.
If you want to replicate this course, you can do so with proper attribution. To replicate the data, follow these links for Experiment 1 (full Moses illusion experiment) and Experiment 2 (demo of self-paced reading with acceptability judgment).
This seminar provides a gentle, hands-on introduction to the essential tools for quantitative research for students of linguistics and the humanities overall. During the course of the seminar, the students will familiarize themselves with software that is rarely taught but is invaluable in developing an efficient, transparent, reusable, and scalable research workflow (e.g. R basics, LaTeX, git). From text file, through data visualization, to creating beautiful reports: this course will empower students to improve their skill and help them establish good practices.
The seminar is targeted at students with little to no experience with programming. It provides key skills that are useful for research and industry jobs.
We will cover topics such as:
✔ How can I make sense of my data?
✔ How can I communicate what I found?
✔ How do I share my data and collaborate with others?
We will NOT cover topics such as:
❌ Experiment design
❌ Inferential statistics
❌ Cognitive modelling
❌ Corpus research
This is a rough overview of the topics discussed every week. These are subject to change, depending on how the class goes.
Week | Topic | Description | Assignments | Materials |
---|---|---|---|---|
1 | Introduction & overview | Course overview and expectations, classroom management and assignments/grading etc. Data collection. | Complete Experiment 1 and Experiment 2 and recruit one more person. Install R and RStudio, install Texmaker or make an Overleaf account. | Slides |
2 | Data, R and RStudio | Intro recap, directories, R and RStudio, installing and loading packages, working with scripts | Read chapters 2, 6 and 7 of R for Data Science, complete assignment 1 | Slides, code |
3 | Reading data, data inspection and manipulation | Looking at your data, data types, importing, making sense of the data, intro to sorting, filtering, subsetting, removing missing data, data manipulation | Read chapters 3, 4 and 5 of R for Data Science, complete assignment 2. | Slides, code, data |
4 | Data manipulation | Basic operators, data manipulation (filtering, sorting, subsetting, arranging), pipelines, tidy code, practice. | Compete assignment 3 | Slides, code, data |
5 | Data manipulation and error handling | Summary statistics, grouping, merging, if ... else, naming variables, tidy code, error handling and getting help. | Assignment 4, read the slides from the QCBS R Workshop Series Workshop 3: Introduction to data visualisation with ggplot2 |
Slides, code |
6 | Data visualization | Communicating with graphics, choice of visualization, plot types, best practices, visualizing in R (ggplot2 , esquisse ), exporting plots and data |
Complete assignment 5. If you haven't yet, install the package esquisse |
Slides, code |
7 | No class | Holiday | ||
8 | Data visualization | Data visualization recap, best practices, lying with plots, practical exercises, exporting/saving plots and data. | Complete assignment 6. Install Quarto. Watch the introductory video | Slides large and compressed, code |
9 | Creating reports with Quarto and knitr | Pandoc, markdown, Quarto, basic syntax and elements, export, document, and chunk options, documentation | Complete assignment 7. | Slides, compressed Quarto files |
10 | Typesetting documents with LaTeX | What is LaTeX, basic document and file structure, advantages and disadvantages, from R to LaTeX | Complete assignment 8, read chapter 2 of The Not So Short Introduction to LaTeX. | Slides, basic LaTeX file (zip) |
11 | Typesetting documents with LaTeX | Editing text (commands, whitespace, environments, font properties, figures, and tables), glosses, IPA symbols, semantic formulae, syntactic trees | Complete assignment 9, read Bibliography management with biblatex | Slides |
12 | Typesetting documents with LaTeX and bibliography management | Large projects, citations, references, bibliography styles, bib file structure | Complete assignment 10 | Slides, big project files |
13 | Literature and reference management, common command line commands | Reference managers, looking up literature, command line commands (grep, diff, ping, cd, etc.) | Complete assignment 11 | Slides, corpus1.txt, corpus2.txt, corpus3.txt, big project 1, big project 2 |
14 | Text editors, version control and Git | Text editors, Git, GitHub, version control | Complete assignment 12 | Slides, example readme file |
15 | Version control and Git | Git, GitHub, SSH, reverting to older versions | In class assignment | Slides, SSH for GitHub video |
- GitHub Git guide:
https://github.com/git-guides/
- Another git guide:
http://rogerdudler.github.io/git-guide/
- Git tutorial:
http://git-scm.com/docs/gittutorial
- Another git tutorial:
https://www.w3schools.com/git/
- Git cheat sheets:
https://training.github.com/
- Where to ask questions: Stackoverflow
- Overleaf (n.d.) Bibliography management with biblatex. Accessed: 2024-06-24. URL:
https://www.overleaf.com/learn/latex/Bibliography_management_with_biblatex
- Dickinson, Markus and Josh Herring (2008). LaTeX for Linguists. Accessed: 2024-06-07. URL:
https://cl.indiana.edu/~md7/08/latex/slides.pdf
. - LaTeX/Linguistics - Wikibooks (2024). Accessed: 2024-06-07. URL:
https://en.wikibooks.org/wiki/LaTeX/Linguistics
. - Oetiker, Tobias et al. (2023). The Not So Short Introduction to LATEX. Accessed: 2024-06-07. URL:
https://tobi.oetiker.ch/lshort/lshort.pdf
.
- Introductory video:
https://www.youtube.com/watch?v=_f3latmOhew
- Documentation:
https://quarto.org/docs/get-started/
- QCBS R Workshop Series
https://r.qcbs.ca/
- Wickham, Hadley, Mine Çetinkaya-Rundel, and Garrett Grolemund (2023). R for data science: import, tidy, transform, visualize, and model data. 2nd ed. O’Reilly Media, Inc. URL:
https://r4ds.hadley.nz/
.
- Free-response: Erickson, Thomas D and Mark E Mattson (1981). “From words to meaning: A semantic illusion”. In: Journal of Verbal Learning and Verbal Behavior 20.5, pp. 540–551. DOI:
10.1016/s0022-5371(81)90165-1
. - Self-paced reading with acceptability judgments: Gibson, Edward, Leon Bergen, and Steven T Piantadosi (2013). “Rational integration of noisy evidence and prior semantic expectations in sentence interpretation”. In: Proceedings of the National Academy of Sciences 110.20, pp. 8051–8056. DOI:
10.1073/pnas.1216438110
.