-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
216 lines (132 loc) · 10.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
[![GitHub stars](https://img.shields.io/github/stars/rOpenSpain/istacbaser.svg?style=social&label=Star&maxAge=2592000)](https://GitHub.com/rOpenSpain/istacbaser/stargazers/) [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/rOpenSpain/istacbaser/graphs/commit-activity) [![GitHub contributors](https://img.shields.io/github/contributors/rOpenSpain/istacbaser.svg)](https://GitHub.com/rOpenSpain/istacbaser/graphs/contributors/) [![GitHub version](https://badge.fury.io/gh/rOpenSpain%2Fistacbaser.svg)](https://github.com/rOpenSpain/istacbaser) [![Tidyverse lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://github.com/rOpenSpain/istacbaser)
# Introduction
The [Canary Islands Statistics Institute (Instituto Canario de Estadítica, ISTAC)](http://www.gobiernodecanarias.org/istac/) is the central organ of the autonomous statistical system and official research center of the Government of the Canary Islands, created and regulated by Law 1/1991 of 28 January, Statistics of the Autonomous Community of the Canary Islands (CAC), and among others, assigns the following functions:
* Providing statistical information: The ISTAC has among its objectives to provide, with technical and professional independence, statistical information of interest to the CAC, taking into account the fragmentation of the territory and its singularities and complying with the principles established in the Code of Good Practices of the European Statistics.
* Coordinate public statistical activity: The ISTAC is the body responsible for promoting, managing and coordinating the public statistical activity of the CAC, assuming the exercise of the statutory competence provided for in Article 30, paragraph 23, of the Statute of Autonomy of the Canary Islands .
To help provide access to this rich source of information, ISTAC themselves, provide a well structured [API](https://es.slideshare.net/ISTAC/guia-de-uso-api-de-acceso-a-istac-base). While this API is very useful for integration into web services and other high-level applications, it becomes quickly overwhelming for researchers who have neither the time nor the expertise to develop software to interface with the API. This leaves the researcher to rely on manual bulk downloads of spreadsheets of the data they are interested in. This too is can quickly become overwhelming, as the work is manual, time consuming, and not easily reproducible.
## istacbaser
The goal of the `istacbaser` R-package is to provide a bridge between these alternatives and allow researchers to focus on their research questions and not the question of accessing the data. The `istacbaser` R-package allows researchers to quickly search and download the data of their particular interest in a programmatic and reproducible fashion; this facilitates a seamless integration into their workflow and allows analysis to be quickly rerun on different areas of interest and with realtime access to the latest available data.
## Installation
You can install the released version of istacbaser from [Github](https://github.com/rOpenSpain/istacbaser) with:
``` r
devtools::install.github("rOpenSpain/istacbaser")
```
### Highlighted features of the `istacbaser` R-package:
- Access to all data available in the API
- Support for searching and downloading data
- Ability to return `POSIXct` dates for easy integration into plotting and time-series analysis techniques
- Support for Most Recent Value queries
- Support for `grep` style searching for data descriptions and names
# Getting Started
The first step would be searching for the data you are interested in. `istacbase_search()` provides `grep` style searching of all available indicators from the ISTAC API and returns the indicator information that matches your query.
## Finding available data with `cache`
For performance and ease of use, a cached version of useful information is provided with the `istacbaser` R-package. This data is called `cache` and provides a snapshot of available islands, indicators, and other relevant information. `cache` is by default the the source from which `istacbase_search()` and `istacbase()` uses to find matching information. The structure of `cache` is as follows
```{r}
library(istacbaser)
str(cache, max.level = 1)
```
## Search available data with `istacbase_search()`
`istacbase_search()` searches through the `cache` data frame to find indicators that match a search pattern. An example of the structure of this data frame is below
```{r, echo=FALSE, results='asis'}
knitr::kable(head(istacbaser::cache[4310:4311, ]))
```
By default the search is done over the `titulo` field and returns all the columns of the matching rows. The `ID` values are inputs into `istac()`, the function for downloading the data. To return the key columns `ID` and `titulo` for the `cache` data frame, you can set `extra = TRUE`.
```{r}
library(istacbaser)
busqueda <- istacbase_search(pattern = "parado")
head(busqueda)
```
Other fields can be searched by simply changing the `fields` parameter. For example
```{r}
library(istacbaser)
EPA_busqueda <- istacbase_search(pattern = "Encuesta de Población Activa", fields = "encuesta")
head(EPA_busqueda)
```
Regular expressions are also supported.
```{r}
library(istacbaser)
# 'pobreza' OR 'parados' OR 'trabajador'
popatr_busqueda <- istacbase_search(pattern = "pobreza|parados|trabajador")
head(popatr_busqueda)
```
## Downloading data with `istacbase()`
Once you have found the set of indicators that you would like to explore further, the next step is downloading the data with `istacbase()`. The following examples are meant to highlight the different ways in which `istacbase()` can be used and demonstrate the major optional parameters.
The default value for the `islas` parameter is a special value of `all` which as you might expect, returns data on the selected `ID` for every available country or region, if it is aviable.
```{r}
library(istacbaser)
pop_data <- istacbase(istacbase_table = "dem.pob.exp.res.40")
head(pop_data)
```
If you are interested in only some subset of islands you can pass along the specific island to the `island` parameter. The islands that can be passed to the `islas` parameter correspond to `all`,`Canarias`,`Lanzarote`,`Fuerteventura`,`Gran Canaria`,`Tenerife`,`La Gomera`,`La Palma` or `El Hierro`.
```{r}
library(istacbaser)
# Population, total
# country values: iso3c, iso2c, regionID, adminID, incomeID
pop_data <- istacbase(islas = c("Lanzarote","Fuerteventura"), istacbase_table = "dem.pob.exp.res.40")
head(pop_data)
```
### Using `POSIXct`
Setting the parameter `POSIXct = TRUE` gives you the posibility to work with dates. You can set a `startdate` and `enddate`.
```{r}
library(istacbaser)
# islas values: all,Canarias,Lanzarote,Fuerteventura,Gran Canaria,Tenerife,La Gomera,La Palma or El Hierro
pop_data <- istacbase(islas = c("Lanzarote","Fuerteventura"), istacbase_table = "dem.pob.exp.res.40")
head(pop_data)
```
### Using `POSIXct = TRUE`
The default format for the `Periodo` or `Años` column is not conducive to sorting or plotting, especially when downloading sub annual data, such as monthly or quarterly data. To address this, if `TRUE`, the `POSIXct` parameter adds the additional columns `fecha` and `periodicidad`. `fecha` converts the default date into a `POSIXct`. `periodicidad` denotes the time resolution that the date represents. This option requires the use of the package `lubridate (>= 1.5.0)`. If `POSIXct = TRUE` and `lubridate (>= 1.5.0)` is not available, a `warning` is produced and the option is ignored.
`startdate` and `enddate` must be in the format `YYYY`.
```{r}
library(istacbaser)
pop_data <- istacbase(istacbase_table = "dem.pob.exp.res.40", POSIXct = TRUE, startdate = 2010, enddate = 2016)
head(pop_data)
```
The `POSIXct = TRUE` option makes plotting and sorting dates much easier.
```{r, fig.height = 4, fig.width = 7.5}
library(istacbaser)
library(ggplot2)
pop_data <- istacbase(islas = "Canarias", istacbase_table = "dem.pob.exp.res.40", POSIXct = TRUE, startdate = 2010, enddate = 2016)
ggplot(pop_data, aes(x = fecha, y = valor, colour = Nacionalidades)) +
geom_line(size = 1) +
labs(title = "Población según sexos y nacionalidades. Islas de Canarias y años", x = "Fecha", y = "Habitantes") +
theme(legend.key=element_blank(), legend.key.size=unit(1,"point"),
legend.text=element_text(size=7), axis.text.x = element_text(angle = 45, hjust = 1)) +
guides(colour=guide_legend(nrow=20)) +
facet_wrap(~ Sexos)
```
### Using `mrv`
If you do not know the latest date an indicator you are interested in is available you can use the `mrv` instead of `startdate` and `enddate`. `mrv` stands for most recent value and takes a `integer` corresponding to the number of most recent values you wish to return
```{r}
library(istacbaser)
pop_data <- istacbase(istacbase_table = 'dem.pob.exp.res.40', POSIXct = TRUE, mrv = 1)
head(pop_data)
```
You can increase this value and it will return no more than the `mrv` value. However, if `mrv` is greater than the number of available data it will return all data instead.
### Using `freq`
If the data has several granularity, you can select son eof them with the parameter `freq`. Possible values are:
`anual`,`semestral`,`trimestral`,`mensual`,`quincenal`,`semanal`. If the granularity selected is not aviable all granularity will be showed.
```{r}
library(istacbaser)
soc_data <- istacbase(istacbase_table = 'soc.sal.est.ser.1625', POSIXct = TRUE, freq = "semestral")
head(soc_data)
```
## `startdate`, `enddate`, `mrv`, `freq` with `POSIXct = FALSE`
If you make a query with `istac()` and `POSIXct = FALSE` the `startdate`, `enddate`, `mrv`, `freq` are ignored and the function will launch a warning.
```{r}
library(istacbaser)
soc_data <- istacbase(istacbase_table = 'soc.sal.est.ser.1625', POSIXct = FALSE, freq = "semestral")
head(soc_data)
```
[![ForTheBadge built-with-love](http://ForTheBadge.com/images/badges/built-with-love.svg)](https://github.com/rOpenSpain/istacbaser) by [Christian González-Martel](https://github.com/chrglez) & [José Manuel Cazorla Artiles](https://github.com/jmcartiles)