-
Notifications
You must be signed in to change notification settings - Fork 0
/
docker.Rmd
385 lines (297 loc) · 16.7 KB
/
docker.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
# Docker Container {#pc-docker}
## Introduction
This chapter explains how to develop and run the Docker container for the Poverty
Calculator API on local World Bank laptops. For DEV, QA and Production
deployments please see the chapter on [Deploying to Azure](#azure).
For the Table Maker API please refer to [TBD]{style="color:red"}.
## Prerequisites {#prerequisites}
- Local admin account
- Docker Desktop for Windows
- WSL 2 (recommended)
- Visual Studio Code (recommended)
See below for recommendations on how to
install Docker Desktop and WSL 2.
Visual Code is not strictly needed, but the [VS Docker
plugin](https://marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-docker)
is one of the best tools to interact with Docker.
## Docker on World Bank laptops
### Install Docker {#wb-docker-install}
1. Install [Docker Dekstop](https://docs.docker.com/docker-for-windows/release-notes/) (v. 3.6.0 or later)
- Remember to check the box to enable WSL integration.
2. Activate WSL
- Check if the WSL feature on Windows is activated (e.g. run `wsl --help` from Powershell). If it is then proceed to step 3.
Note: If you get a message saying "service cannot be started" try restarting WSL like suggested below under [Known problems and solutions](#known-problems-and-solutions).
- Go to Control Panel -> Programs -> Turn Windows features on or off and check the box *Windows Subsystem for Linux*, or follow [these steps](https://docs.microsoft.com/en-us/windows/wsl/install-win10) if you prefer to activate WSL from the command line.
- Reboot system.
3. Install WSL 2
- Download and install the [Linux kernel update package](https://docs.microsoft.com/windows/wsl/wsl2-kernel).
- Set WSL 2 as your default version (run `wsl --set-default-version 2` from Powershell).
4. Configure Docker access privileges
- In order to run Docker wo/ admin privileges you need to add yourself to the "docker-users" group.
- Open Computer Management. Go to Local User and Groups -> Groups -> docker-users and add your regular account (`WB/wb<UPI>`) to the list of users.
- Reboot system.
5. Start Docker Desktop and enjoy your hard work :-)
### Known problems and solutions {#known-problems-and-solutions}
Docker Dekstop sometimes fails with the following error.
System.InvalidOperationException:
Failed to deploy distro docker-desktop to C:\Users\<User>\AppData\Local\Docker\wsl\distro: exit code: -1
stdout: Error: 0xffffffff
The root cause of this problem is likely that the World Bank VPN connection modifies
the same network configuration as the `wsl` VM. Turning off your VPN when working with
Docker locally is thus a good idea.
A temporary solution to the issue is to open a CMD shell in Admin
mode, and then run the following code to restart `LxssManager` .
``` {.cmd}
> sc config LxssManager start=auto
[SC] ChangeServiceConfig SUCCESS
```
After this you will also need to restart Docker Desktop.
### Tips and tricks {#tips-and-tricks}
1. Start your Docker working day by opening a CMD shell in admin-mode and run
`sc config LxssManager start=auto`. Keep the shell open because you might
need to re-run the command if WSL crashes.
2. Turn off your VPN when using Docker. This avoids the issue with WSL crashing
from time to time. It still might happen though if the VPN switces on
automatically or if you for other reasons switch back and forth.
3. It is possible that solutions like [wsl-vpnkit](https://github.com/sakai135/wsl-vpnkit)
and [vpnkit](https://github.com/moby/vpnkit) will help with the WSL issues,
but these have not been tested yet. Feel free to give it a try yourself.
4. If WSL causes too much pain, try switching to the legacy Hyper-V backend.
(Go to Docker Desktop -\> General Setting -\> Use the WSL 2 based engine).
5. Re-build an image from a partial cache by adding a `ARG` or `ENV` variable
right before the layer you want to invalidate (yes, this is hacky but it
works). E.g. do something like:
``` {.dockerfile}
# Install PIP specific R packages
ENV test=test06302021
RUN Rscript -e "remotes::install_github('PIP-Technical-Team/wbpip@master', repos = '${CRAN}')"
RUN Rscript -e "remotes::install_github('PIP-Technical-Team/pipapi@master', repos = '${CRAN}')"
```
## Create volume {#create-volume}
The data that is used in the poverty calculations is stored outside the Docker
container. We therefore need to link a data source on our local machine to the
container when we run it. This could be accomplished with a [bind
mount](https://docs.docker.com/storage/bind-mounts/), but the preferred way is
to use a [named volume](https://docs.docker.com/storage/volumes/).
You can create and populate a volume by following the steps below. First we
create an empty volume, then we link this to a temporay container, before using
`docker cp` to copy the files to the container and volume. After the data has
been copied you can discard the temporary container. The choice of Ubuntu 20.04
for the *temporary* container is arbitrary. You could use another image if you want.
``` {.bash}
# Create volume
docker volume create pip-vol
# Mount volume to tmp container
docker run -d --name pip-data --mount type=volume,source=pip-vol,target=/data ubuntu:20.04
# Copy data to container (and volume)
docker cp <data-folder>/. pip-data:data
# Stop and remove tmp container
docker stop pip-data
docker rm pip-data
```
For some purposes you will only need to create this volume once, but remember to
update the contents of the volume in case the data structure changes or you want
to make sure the container is running against the latest available data. See the
[Azure DevOps Data repo](https://github.com/PIP-Technical-Team/PIP_private/blob/main/README.md#resources)
(DEV branch) for the latest updates.
The volume can be inspected with the `docker inspect` and `docker system`
commands. If you are running Docker Desktop on Windows with WSL 2.0 then then
the physical location of Docker volumes is usually found under
`\\wsl$\docker-desktop-data\version-pack-data\community\docker\volumes`. For
more information on volumes see the [Use
volumes](https://docs.docker.com/storage/volumes/) section in Docker's reference
manual.
## Build image {#build-image}
Docker images are built using the `docker build` command. Here we build the
image and set the image name to `pip-api`.
``` {.bash}
docker build -t pip-api .
```
You can also set a specific tag for the image. For example you can specify the
version number or the base image used as source. How you decide to tag your
images is up to you. Often a tag of `latest` (the default) will suffice, but in
other instances keeping track of different versions of your development image
will be useful.
``` {.bash}
# Set tag when building
docker build -t pip-api:0.0.4
# Set tag after build (version number)
docker image tag pip-api:latest pip-api:0.0.4
# Set tag after build (base image)
docker image tag pip-api:latest pip-api:ubi8
```
Note that each layer in a Dockerfile is cached when built. These layers won't be
re-run unless the code to produce that specific layer or a previous layer in the
Dockerfile changes. This is handy for fast iterations, but can also cause the
image to be outdated.
If needed you can use the `--no-cache` flag to force a re-build. This is useful
when you want to make sure that the latest Linux updates are installed or for
updating the `{wbpip}` and `{pipapi}` R packages. See also the [Tips and
Tricks](#tips-and-tricks) section on how to do partial re-builds.
``` {.bash}
docker build --no-cache -t pip-api .
```
## Run container {#run-container}
Run the container by exposing port 80 and mounting a volume with the survey and
auxiliary data. The data structure in the attached volume must correspond to the
sub-folder specifications in `R/main.R`.
``` {.bash}
docker run --rm -d -p 80:80/tcp --mount src=pip-vol,target=/ipp,type=volume,readonly --name pip-api pip-api
```
You can also run the container with a bind mount if you prefer, but note that reading the files inside mount folder
will be slower then with a volumne setup.
``` {.bash}
docker run --rm -d -p 80:80/tcp --mount src="<data-folder>",target=/data,type=bind --name pip-api pip-api
```
Navigate to [`http://localhost/__docs__/`](localhost/__docs__/) to see the
running API.
For details on the `docker run` command and its options see the [Docker
Docs](https://docs.docker.com/engine/reference/commandline/run/).
## Debugging {#debugging}
**Run container interactively:**
Run the container in interactive mode, using the `-it` flag, to see output
messages.
``` {.bash}
$ docker run --rm -it -p 80:80/tcp --mount src=pip-vol,target=/ipp,type=volume,readonly --name pip-api pip-api
Running plumber API at http://0.0.0.0:80
Running swagger Docs at http://127.0.0.1:80/__docs__/
```
**Inspect container:**
For development purposes it can be useful to inspect the Docker container.
Luckily it is very easy to enter a running container with the `docker exec`
command.
``` {.bash}
# Enter the container as default user
$ docker exec -it pip-api /bin/bash
plumber@bd8ea77299ca:/$
# Enter the container as root user
$ docker exec -it -u 0 pip-api /bin/bash
root@bd8ea77299ca:/$
```
## Security {#security}
Since the PIP Techincal Team is developing both the API and the Docker
container, we also have a larger responsibility for security. When deploying to
Azure the image will need to go through a security scan, provided by
[Aquasec](https://blog.aquasec.com/), that is generally quite strict. Any high
level vulnerability found by the Aquasec scan will result in a failed
deployment. It is not possible to leverage Aquasec on local machines, but we can
get an indication of potential vulnerabilites by taking advantage of the built
in security scan that comes with Docker. Additionally it is also possible to
scan the contents of the R packages used in the image.
Note that neither the Snyk or Oyster scans described below are requirements from
OIS. They are only included here as additional and preliminary tools.
**Run image security scan:**
Before deploying to Azure you can run preliminary security scans with Snyk. See
[Vulnerability scanning for Docker local
images](https://docs.docker.com/engine/scan/) and [Docker Security Scanning
Guide 2021](https://snyk.io/learn/docker-security-scanning/) for details.
``` {.bash}
docker scan --file .\Dockerfile pip-api
```
It is important to note that the Aquasec scan that runs on Azure is different
that the Snyk scan, and can detect different vulnerabilites. Even though few or
zero vulnerabilites are found by Snyk, further issues might still be detected by
Aquasec.
**Run security scan for R packages:**
You can scan the R packages inside the container with the `{oysteR}` package,
which scans R projects against insecure dependencies using the [OSS
Index](https://ossindex.sonatype.org/).
First log into a running container as root, and start R.
``` {.bash}
$ docker exec -it -u 0 pip-api /bin/bash
root@ab54fbf1eb36:/# R
```
Then install the `{oyster}` package, and run an audit.
``` {.r}
install.packages("oysteR")
library("oysteR")
audit <- audit_installed_r_pkgs()
get_vulnerabilities(audit)
```
Note that finding zero vulnerabilities is no guarantee against threats. For
example is scanning of C++ code inside R packages not yet implemented. For more
details and latest developments on the `{oyster}` package see the
[README](https://github.com/sonatype-nexus-community/oysteR) on Github.
## Base image options {#base-image-options}
In the development process of the Dockerfile there have been used different base
images.
The current image in the main branch is based on `ubi8`, but other options
(`rocker`, `centos8`) are available in seperate branches. These seperate
branches will not be actively maintained, but should be kept in case there is a
need to switch to another base image in the future.
The main need to switch base image is likely to stem from the Azure DevOps
security scan. For example do images based on `rocker` or `centos8` work well
locally, but neither will pass deployment. CentOS images are unfortunately not
on the list of approved images by OIS, while
[Rocker](https://github.com/rocker-org/rocker-versioned2/) images (which are
based on Ubuntu 20.04 LTS) currently has a vulnerability
([CVE-2019-18276](https://ubuntu.com/security/CVE-2019-18276)) that is
classified as a high level threat by Aqua Scan.
In fact one of the Linux dependencies for R on UBI 8, `libX11`, also currently
trigger a high level vulnerability
([CVE-2020-14363](https://access.redhat.com/security/cve/CVE-2020-14363)) in
Aqua Scan. Even though this issue has been solved in the offical RHEL 8 release,
the issue still persist for UBI 8. It can however be solved by manually
downloading a newer version of libX11 (1.6.12 or later), and removing the
problematic version (1.6.8).
In case there is any need to develop with other base images in the future it is
**strongly recommended** to start with one of the Linux distributions where
[RStudio provides precompiled binaries for
R](https://docs.rstudio.com/resources/install-r/). This avoids the need to
install R from source.
## Using R on Linux
Docker images are usually based on Linux containers. Some familiarity with Linux is
thus necessary in order to develop and work with Docker.
R users in particular should familiarize themselves with the difference between installing R
and R packages from source vs binary, as well as potential Linux dependencies that might
be nessecary to make a specific package work.
**Installing R:**
[Installing R from source](https://docs.rstudio.com/resources/install-r-source/) is possible
on most Linux distributions, but this not always an easy endavour. It will often require you to
maintain a list of the nessecary Linux dependencies for R. Installing from source is also much
slower than installing from a binary version.
Another way to install R on Linux is to use the available pre-compiled binary in your
Linux distro's repository (e.g `sudo apt install r-base` on Ubuntu). The problem with
this approach is that the latest version of R might not yet be available, and for some
Linux distro's there will also only be a limited number of R versions available.
This impedes the flexilibility needed for a production grade project.
The recommended approach is thus to rely on
[RStudio's precompiled binaries](https://docs.rstudio.com/resources/install-r/).
These binaries are available for the most common Linux distributions,
including Ubuntu, Debian and CentOS / RHEL.
Relying on these binaries will limited the number of base image options avaiable,
e.g. a ligth-weight distro like [Alpine](https://www.alpinelinux.org/) is not supported,
nor is the newest release of Ubuntu (Hirsute Hippo). But it will make much easier
to maintain, upgrade and change the R version used in the project.
**Installing R packages:**
As a contrast to Windows and macOS, CRAN only supports source versions of the R packages
available for Linux.
This will significantly slow down the installation process of each pacakge.
Luckily RStudio provides binary versions of all packages on CRAN for a series of different Linux distro's through their
[Public Pacakge Manager](https://packagemanager.rstudio.com/). It is therefore strongly recommended
that you rely on RSPM as your installation repo.
An additional benefit of RSPM is that you can also select a specific date for the package versions
to use.
On the RSPM website go the [Setup](https://packagemanager.rstudio.com/client/#/repos/1/overview)
page on the main menu and select the appropiate client OS and date. You should then see a URL similar too
`https://packagemanager.rstudio.com/all/__linux__/centos8/4743918`, where `centos8` represents the distro
and `4743918` the date.
For a more detailed introduction to the difference between binary and source packages, see
[Package structure and state](https://r-pkgs.org/package-structure-state.html) in the book
R Packages.
**Installing R package dependencies:**
Linux distro's doesn't come with the dependencies needed to work with all R packages. This includes common
packages like `{data.table}` and `{plumber}`. It is important that such dependencies are installed prior
to the installation of your R packages.
The best way to look for dependencies is to use the RSPM website and search for the specific package in question.
Remeber to also select the correct Linux distribution.
## Resources {#resources}
For a further introduction to Docker take a look at this
[Data Science for Economists lecture on Docker](https://raw.githack.com/uo-ec607/lectures/master/13-docker/13-docker.html#1)
by Grant McDermott.
See [Best practices for writing
Dockerfiles](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/)
for advice on how to build Docker images.
Follow the [Docker for Windows release
notes](https://docs.docker.com/docker-for-windows/release-notes/) for
information on new releases and bug fixes.