-
Notifications
You must be signed in to change notification settings - Fork 1
/
Exercises_3and4.R
123 lines (39 loc) · 2.24 KB
/
Exercises_3and4.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
##### Exercise 3: Building a Generalized Linear Model (GLM) ######
## Here we will build a GLM to predict death using the LOS_models dataset from the NHSRdatasets package.
#Load package, install if you don't have it.
#install.packages("NHSRdatasets")
library(NHSRdatasets)
#load the dataset 'LOS_model' using the 'data' function
data(LOS_model)
# Inspect the data. You can use View, summary or other functions.
# Descriptions of the fields are available in the help file: ?LOS_model
View(LOS_model)
summary(LOS_model)
## Now lets build a glm to predict Death. Add Age to your model.
## What other argument does glm take, compared to lm?
## Lets look at the model summary
## Is age significant? How would we interpret it?
## Try this model as a scaled model.
# Lets look at the model summary
## Can you interpret scale now?
# Let make them more interpretable by transforming them back to odds scale.
# To do this, we "reverse" the link function.
# We used a log link, so the inverse is to 'exponentiate' the coefficients
## Now add length of stay (LOS) into the model
## Lets look at the model summary
## How might you compare these two models, with and without LOS?
## Are these results what you expect? Why?
## Is this different?
## Is our model any good? Check the AUC, using the ModelMetrics, or yardstick packages
################ Exercise 4: Prediction ######################
## Let's use our models to predict. We can predict back onto the same data, or new data, using the
## `newdata` argument. We will fit back to our data today, so we do not strictly need to specify it,
## but if we do, it will handle missing data better.
## First for our lm, we'll add it back into the frammingham data.frame, as a column called 'preds' :
# We can compare them to the original data. Lets do this with a scatter plot:
## Now let try the same thing for our logistic model. This is different because of the link function.
## We need to specify which scale to predict on:
## When visualising it, what happens if we build a scatter plot?
## How else could we visualise it?
## Need to reflect 'Death' in groups: box plot, violin, plot, overlayed histograms or densities etc.
## What do your plot(s) suggest?