forked from rdpeng/rprogdatascience
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdebugging.Rmd
279 lines (198 loc) · 12.4 KB
/
debugging.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
# Debugging
```{r,echo=FALSE}
knitr::opts_chunk$set(comment = NA, prompt = TRUE, collapse = TRUE,
error = TRUE, warning = TRUE, message = TRUE)
set.seed(1)
```
## Something’s Wrong!
[Watch a video of this section](https://youtu.be/LHQxbRInyyc) (note that this video differs slightly from the material presented here)
R has a number of ways to indicate to you that something’s not right. There are different levels of indication that can be used, ranging from mere notification to fatal error. Executing any function in R may result in the following *conditions*.
- `message`: A generic notification/diagnostic message produced by the `message()` function; execution of the function continues
- `warning`: An indication that something is wrong but not necessarily fatal; execution of the function continues. Warnings are generated by the `warning()` function
- `error`: An indication that a fatal problem has occurred and execution of the function stops. Errors are produced by the `stop()` function.
- `condition`: A generic concept for indicating that something unexpected has occurred; programmers can create their own custom conditions if they want.
Here is an example of a warning that you might receive in the course of using R.
```{r}
log(-1)
```
This warning lets you know that taking the log of a negative number results in a `NaN` value because you can't take the log of negative numbers. Nevertheless, R doesn't give an error, because it has a useful value that it can return, the `NaN` value. The warning is just there to let you know that something unexpected happen. Depending on what you are programming, you may have intentionally taken the log of a negative number in order to move on to another section of code.
Here is another function that is designed to print a message to the console depending on the nature of its input.
```{r}
printmessage <- function(x) {
if(x > 0)
print("x is greater than zero")
else
print("x is less than or equal to zero")
invisible(x)
}
```
This function is simple---it prints a message telling you whether `x` is greater than zero or less than or equal to zero. It also returns its input *invisibly*, which is a common practice with "print" functions. Returning an object invisibly means that the return value does not get auto-printed when the function is called.
Take a hard look at the function above and see if you can identify any bugs or problems.
We can execute the function as follows.
```{r}
printmessage(1)
```
The function seems to work fine at this point. No errors, warnings, or messages.
```{r}
printmessage(NA)
```
What happened?
Well, the first thing the function does is test if `x > 0`. But you can't do that test if `x` is a `NA` or `NaN` value. R doesn't know what to do in this case so it stops with a fatal error.
We can fix this problem by anticipating the possibility of `NA` values and checking to see if the input is `NA` with the `is.na()` function.
```{r}
printmessage2 <- function(x) {
if(is.na(x))
print("x is a missing value!")
else if(x > 0)
print("x is greater than zero")
else
print("x is less than or equal to zero")
invisible(x)
}
```
Now we can run the following.
```{r}
printmessage2(NA)
```
And all is fine.
Now what about the following situation.
```{r}
x <- log(c(-1, 2))
printmessage2(x)
```
Now what?? Why are we getting this warning? The warning says "the condition has length > 1 and only the first element will be used".
The problem here is that I passed `printmessage2()` a vector `x` that was of length 2 rather then length 1. Inside the body of `printmessage2()` the expression `is.na(x)` returns a vector that is tested in the `if` statement. However, `if` cannot take vector arguments so you get a warning. The fundamental problem here is that `printmessage2()` is not *vectorized*.
We can solve this problem two ways. One is by simply not allowing vector arguments. The other way is to vectorize the `printmessage2()` function to allow it to take vector arguments.
For the first way, we simply need to check the length of the input.
```{r}
printmessage3 <- function(x) {
if(length(x) > 1L)
stop("'x' has length > 1")
if(is.na(x))
print("x is a missing value!")
else if(x > 0)
print("x is greater than zero")
else
print("x is less than or equal to zero")
invisible(x)
}
```
Now when we pass `printmessage3()` a vector we should get an error.
```{r}
printmessage3(1:2)
```
Vectorizing the function can be accomplished easily with the `Vectorize()` function.
```{r}
printmessage4 <- Vectorize(printmessage2)
out <- printmessage4(c(-1, 2))
```
You can see now that the correct messages are printed without any warning or error. Note that I stored the return value of `printmessage4()` in a separate R object called `out`. This is because when I use the `Vectorize()` function it no longer preserves the invisibility of the return value.
## Figuring Out What's Wrong
The primary task of debugging any R code is correctly diagnosing what the problem is. When diagnosing a problem with your code (or somebody else's), it's important first understand what you were expecting to occur. Then you need to idenfity what *did* occur and how did it deviate from your expectations. Some basic questions you need to ask are
- What was your input? How did you call the function?
- What were you expecting? Output, messages, other results?
- What did you get?
- How does what you get differ from what you were expecting?
- Were your expectations correct in the first place?
- Can you reproduce the problem (exactly)?
Being able to answer these questions is important not just for your own sake, but in situations where you may need to ask someone else for help with debugging the problem. Seasoned programmers will be asking you these exact questions.
## Debugging Tools in R
[Watch a video of this section](https://youtu.be/h9rs6-Cwwto)
R provides a number of tools to help you with debugging your code. The primary tools for debugging functions in R are
- `traceback()`: prints out the function call stack after an error occurs; does nothing if there’s no error
- `debug()`: flags a function for “debug” mode which allows you to step through execution of a function one line at a time
- `browser()`: suspends the execution of a function wherever it is called and puts the function in debug mode
- `trace()`: allows you to insert debugging code into a function a specific places
- `recover()`: allows you to modify the error behavior so that you can browse the function call stack
These functions are interactive tools specifically designed to allow you to pick through a function. There's also the more blunt technique of inserting `print()` or `cat()` statements in the function.
## Using `traceback()`
[Watch a video of this section](https://youtu.be/VT9ZxCp6o-I)
The `traceback()` function prints out the *function call stack* after an error has occurred. The function call stack is the sequence of functions that was called before the error occurred.
For example, you may have a function `a()` which subsequently calls function `b()` which calls `c()` and then `d()`. If an error occurs, it may not be immediately clear in which function the error occurred. The `traceback()` function shows you how many levels deep you were when the error occurred.
```r
> mean(x)
Error in mean(x) : object 'x' not found
> traceback()
1: mean(x)
```
Here, it's clear that the error occurred inside the `mean()` function because the object `x` does not exist.
The `traceback()` function must be called immediately after an error occurs. Once another function is called, you lose the traceback.
Here is a slightly more complicated example using the `lm()` function for linear modeling.
```r
> lm(y ~ x)
Error in eval(expr, envir, enclos) : object ’y’ not found
> traceback()
7: eval(expr, envir, enclos)
6: eval(predvars, data, env)
5: model.frame.default(formula = y ~ x, drop.unused.levels = TRUE)
4: model.frame(formula = y ~ x, drop.unused.levels = TRUE)
3: eval(expr, envir, enclos)
2: eval(mf, parent.frame())
1: lm(y ~ x)
```
You can see now that the error did not get thrown until the 7th level of the function call stack, in which case the `eval()` function tried to evaluate the formula `y ~ x` and realized the object `y` did not exist.
Looking at the traceback is useful for figuring out roughly where an error occurred but it's not useful for more detailed debugging. For that you might turn to the `debug()` function.
## Using `debug()`
The `debug()` function initiates an interactive debugger (also known as the "browser" in R) for a function. With the debugger, you can step through an R function one expression at a time to pinpoint exactly where an error occurs.
The `debug()` function takes a function as its first argument. Here is an example of debugging the `lm()` function.
```r
> debug(lm) ## Flag the 'lm()' function for interactive debugging
> lm(y ~ x)
debugging in: lm(y ~ x)
debug: {
ret.x <- x
ret.y <- y
cl <- match.call()
...
if (!qr)
z$qr <- NULL
z
}
Browse[2]>
```
Now, every time you call the `lm()` function it will launch the interactive debugger. To turn this behavior off you need to call the `undebug()` function.
The debugger calls the browser at the very top level of the function body. From there you can step through each expression in the body. There are a few special commands you can call in the browser:
* `n` executes the current expression and moves to the next expression
* `c` continues execution of the function and does not stop until either an error or the function exits
* `Q` quits the browser
Here's an example of a browser session with the `lm()` function.
```r
Browse[2]> n ## Evalute this expression and move to the next one
debug: ret.x <- x
Browse[2]> n
debug: ret.y <- y
Browse[2]> n
debug: cl <- match.call()
Browse[2]> n
debug: mf <- match.call(expand.dots = FALSE)
Browse[2]> n
debug: m <- match(c("formula", "data", "subset", "weights", "na.action",
"offset"), names(mf), 0L)
```
While you are in the browser you can execute any other R function that might be available to you in a regular session. In particular, you can use `ls()` to see what is in your current environment (the function environment) and `print()` to print out the values of R objects in the function environment.
You can turn off interactive debugging with the `undebug()` function.
```r
undebug(lm) ## Unflag the 'lm()' function for debugging
```
## Using `recover()`
The `recover()` function can be used to modify the error behavior of R when an error occurs. Normally, when an error occurs in a function, R will print out an error message, exit out of the function, and return you to your workspace to await further commands.
With `recover()` you can tell R that when an error occurs, it should halt execution at the exact point at which the error occurred. That can give you the opportunity to poke around in the environment in which the error occurred. This can be useful to see if there are any R objects or data that have been corrupted or mistakenly modified.
```r
> options(error = recover) ## Change default R error behavior
> read.csv("nosuchfile") ## This code doesn't work
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file ’nosuchfile’: No such file or directory
Enter a frame number, or 0 to exit
1: read.csv("nosuchfile")
2: read.table(file = file, header = header, sep = sep, quote = quote, dec =
3: file(file, "rt")
Selection:
```
The `recover()` function will first print out the function call stack when an error occurrs. Then, you can choose to jump around the call stack and investigate the problem. When you choose a frame number, you will be put in the browser (just like the interactive debugger triggered with `debug()`) and will have the ability to poke around.
## Summary
- There are three main indications of a problem/condition: `message`, `warning`, `error`; only an `error` is fatal
- When analyzing a function with a problem, make sure you can reproduce the problem, clearly state your expectations and how the output differs from your expectation
- Interactive debugging tools `traceback`, `debug`, `browser`, `trace`, and `recover` can be used to find problematic code in functions
- Debugging tools are not a substitute for thinking!