-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathPTT_Analysis_of_Exam_Scores.Rmd
199 lines (128 loc) · 7.35 KB
/
PTT_Analysis_of_Exam_Scores.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
---
title: Statistical Analysis of High School and College Entrance Exam Scores in Taiwan with Online Data
author:
- Christine P. Chai
date: \today
output:
pdf_document:
extra_dependencies: float
number_sections: true
citation_package: natbib
subtitle: A Fully Reproducible Approach with R Markdown
header-includes: \renewcommand{\and}{\\}
bibliography: references.bib
biblio-style: apalike
link-citations: true
---
\renewcommand{\cite}{\citep}
```{r latex-cite-command, include=FALSE}
# %\let\cite\citep
# % from \citep to \cite to cite in author style, e.g. [Mule, 2008]
# % \bibliographystyle{plainnat}
# %\citep: citation in parentheses, e.g. [Mule, 2008]
# %\citet: citation as author, e.g. Mule [2008]
# %\cite: citation as author, \citet by default
```
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
def.chunk.hook <- knitr::knit_hooks$get("chunk")
knitr::knit_hooks$set(chunk = function(x, options) {
x <- def.chunk.hook(x, options)
ifelse(options$size != "normalsize", paste0("\n \\", options$size,"\n\n", x, "\n\n \\normalsize"), x)
})
knitr::opts_chunk$set(fig.width=6, fig.align = 'center', fig.pos = "H", out.extra = "")
```
# Executive Summary {.unnumbered}
In this project, we investigate the relationship between high school and college entrance exam scores in Taiwan using self-reported data online. The goal is to demonstrate reproducible statistical analysis in $\mathsf{R}$ Markdown and LaTeX to PDF. We start by exploring the data to formulate the problem statement and identify the appropriate model, which can be an iterative process. We eventually decided on a binary outcome of the college entrance exams, and hence implemented a logistic regression model. We also validate the model with out-of-sample prediction methods, including cross validation as well as separate training and testing sets. Finally, we use some metrics to evaluate the model performance. The end-to-end data analysis can be compiled with a single button in RStudio, without copying-and-pasting output from one program to another.^[The code is documented on the author's GitHub: <https://github.com/star1327p/Exam-Scores-PTT>]
# Disclaimer {.unnumbered}
The opinions and views expressed in this manuscript are those of the author, and do not necessarily state or reflect those of any institution or government entity. The author has a statistics and engineering background, without any training in middle school or high school teaching.
# Introduction {#intro}
```{r include-intro, child = 'Ch01_to_Ch02_Context/01-Introduction.Rmd'}
```
# Background {#background}
```{r include-background, child = 'Ch01_to_Ch02_Context/02-Background.Rmd'}
```
# Data Description {#data}
```{r include-data-description, child = 'Ch03_to_Ch06_Explore/03-Data-Description.Rmd'}
```
# Exploratory Data Analysis {#eda}
```{r include-exploratory, child = 'Ch03_to_Ch06_Explore/04-Exploratory.Rmd'}
```
# Linear Regression {#linear-reg}
```{r include-linear-reg, child = 'Ch03_to_Ch06_Explore/05-Linear-Regression.Rmd'}
```
# Top Scorers: A Closer Look {#explore-top}
```{r include-linear-reg, child = 'Ch03_to_Ch06_Explore/06-Top-Scorers-01-Univariate.Rmd'}
```
## Bivariate Exploration of Top Scorers {#bivariate-top-scorers}
```{r include-chisq-test, child = 'Ch03_to_Ch06_Explore/06-Top-Scorers-02-Bivariate.Rmd'}
```
# Logistic Regression {#logit-reg}
```{r include-logit-reg, child = 'Ch07_to_Ch08_Model/07-Logistic-Regression.Rmd'}
```
# Model Validation: In-Sample Prediction {#validation}
```{r include-in-sample-00, child = 'Ch07_to_Ch08_Model/08-In-Sample-00-Foreword.Rmd'}
```
## Implementation of In-Sample Prediction {#in-sample}
```{r include-in-sample-01, child = 'Ch07_to_Ch08_Model/08-In-Sample-01-Implementation.Rmd'}
```
## Interpretation of Confusion Matrix {#interpretation}
```{r include-in-sample-02, child = 'Ch07_to_Ch08_Model/08-In-Sample-02-Confusion-Matrix.Rmd'}
```
## Breakdown by High School Entrance Exam Scores {#another-breakdown}
```{r include-in-sample-03, child = 'Ch07_to_Ch08_Model/08-In-Sample-03-Score-Breakdown.Rmd'}
```
# Model Validation: Out-of-Sample Prediction {#out-of-sample}
```{r include-out-sample-00, child = 'Ch09_Validation/09-Out-Sample-00-Foreword.Rmd'}
```
## Separate Training and Testing Datasets {#sep-train-test}
```{r include-out-sample-01, child = 'Ch09_Validation/09-Out-Sample-01-Sep-Explain.Rmd'}
```
### Implementation {#train-test-demo}
```{r include-out-sample-02, child = 'Ch09_Validation/09-Out-Sample-02-Sep-Implement.Rmd'}
```
### Organizing the Code for Reusability {#org-code-reuse}
```{r include-out-sample-03, child = 'Ch09_Validation/09-Out-Sample-03-Sep-Organize.Rmd'}
```
## Cross Validation {#cross-validation}
```{r include-out-sample-04, child = 'Ch09_Validation/09-Out-Sample-04-Cross-Validation.Rmd'}
```
### K-fold Cross Validation {#k-fold}
```{r include-out-sample-05, child = 'Ch09_Validation/09-Out-Sample-05-K-Fold.Rmd'}
```
### Leave-one-out Cross Validation {#leave-one-out}
```{r include-out-sample-06, child = 'Ch09_Validation/09-Out-Sample-06-Leave-One-Out.Rmd'}
```
## Comparison of Results {#cmp-results}
```{r include-out-sample-07, child = 'Ch09_Validation/09-Out-Sample-07-Comparison-Results.Rmd'}
```
# Model Metrics: ROC and AUC {#roc-auc}
```{r include-roc-auc-00, child = 'Ch10_Eval_Metrics/10-Model-Metrics-00-Foreword.Rmd'}
```
## Demonstrative ROC Curve {#roc-demo}
```{r include-roc-auc-01, child = 'Ch10_Eval_Metrics/10-Model-Metrics-01-Demo.Rmd'}
```
## Data Observation and Processing {#roc-prep}
```{r include-roc-auc-02, child = 'Ch10_Eval_Metrics/10-Model-Metrics-02-Observation.Rmd'}
```
## Implementation and Results {#roc-auc-results}
```{r include-roc-auc-03, child = 'Ch10_Eval_Metrics/10-Model-Metrics-03-ROC-AUC.Rmd'}
```
# Recap of the Project {#recap}
```{r include-project-recap, child = 'Ch11_to_Ch14_Discuss/11-Project-Recap.Rmd'}
```
# Recommended Resources for Learning {#resources}
```{r include-additional-resources, child = 'Ch11_to_Ch14_Discuss/12-Additional-Resources.Rmd'}
```
# Personal Scores and Remarks {#personal-remarks}
```{r include-personal, child = 'Ch11_to_Ch14_Discuss/13-Personal-Scores-Remarks.Rmd'}
```
# Final Words {.unnumbered}
```{r include-final-words, child = 'Ch11_to_Ch14_Discuss/14-Final-Words.Rmd'}
```
# Acknowledgments {.unnumbered}
The author would like to thank Dr. Mine \c{C}etinkaya-Rundel and Dr. David Banks at Duke University; they both motivated the author to teach statistics and create reproducible work in $\mathsf{R}$. The author would like to acknowledge Dr. Cliburn Chan and Dr. Janice McCarthy for introducing her to GitHub in the statistical computation course at Duke University.^[<https://people.duke.edu/~ccc14/sta-663-2015/>] This provided her the foundations to use GitHub as a modern version control system in the first place.
While developing this project, the author also received constructive feedback from several people. The author is grateful to her former Microsoft colleagues Smit Patel and Dylan Stout for helping her troubleshoot GitHub issues. The author also wants to thank her friend Yi-Ting Chang for checking the narratives about Taiwan's education, because Yi-Ting is a high-school teacher in Taiwan as of 2024.
Finally, the author gives a special mention to her significant other, Hugh Hendrickson, for all his support in the author's professional career development.
\addcontentsline{toc}{section}{References}