forked from cboettig/eml2
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
246 lines (174 loc) · 8.28 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
---
output:
md_document:
variant: gfm
---
[![Build Status](https://travis-ci.org/cboettig/eml2.svg?branch=master)](https://travis-ci.org/cboettig/eml2)
[![Coverage Status](https://img.shields.io/codecov/c/github/cboettig/eml2/master.svg)](https://codecov.io/github/cboettig/eml2?branch=master)
[![stability-experimental](https://img.shields.io/badge/stability-experimental-orange.svg)](https://github.com/joethorley/stability-badges#experimental)
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
```{r include=FALSE}
has_pandoc <- rmarkdown::pandoc_available()
```
# eml2
The goal of `eml2` is to provide both a drop-in replacement for the higher-level functions of the existing EML package while also providing additional functionality.
`eml2` uses only simple and familiar list structures (S3 classes) instead of the more cumbersome use of S4 found in the original EML. While the higher-level functions are identical, this makes it easier to for most users and developers to work with `eml` objects and also to write their own functions for creating and manipulating EML objects. Under the hood, `eml2` relies on the [emld](https://github.com/cboettig/emld) package, which uses a Linked Data representation for EML. It is this approach which lets us combine the simplicity of lists with the specificy required by the XML schema.
# Creating EML
```{r message=FALSE, warning=FALSE}
library(eml2)
```
## A minimal valid EML document:
```{r}
me <- list(individualName = list(givenName = "Carl", surName = "Boettiger"))
my_eml <- list(dataset = list(
title = "A Mimimal Valid EML Dataset",
creator = me,
contact = me)
)
write_eml(my_eml, "ex.xml")
eml_validate("ex.xml")
```
## A Richer Example
Here we show a the creation of a relatively complete EML document using `eml2`. This closely parallels the function calls shown in the original EML [R-package vignette](https://ropensci.github.io/EML/articles/creating-EML.html).
## `set_*` methods
The original EML R package defines a set of higher-level `set_*` methods to facilitate the creation of complex metadata structures. `eml2` provides these same methods, taking the same arguments for `set_coverage`, `set_attributes`, `set_physical`, `set_methods` and `set_textType`, as illustrated here:
### Coverage metadata
```{r}
geographicDescription <- "Harvard Forest Greenhouse, Tom Swamp Tract (Harvard Forest)"
coverage <-
set_coverage(begin = '2012-06-01', end = '2013-12-31',
sci_names = "Sarracenia purpurea",
geographicDescription = geographicDescription,
west = -122.44, east = -117.15,
north = 37.38, south = 30.00,
altitudeMin = 160, altitudeMaximum = 330,
altitudeUnits = "meter")
```
### Reading in text from Word and Markdown
We read in detailed methods written in a Word doc. This uses EML's docbook-style markup to preserve formatting of paragraphs, lists, titles, and so forth. (This is a drop-in replacement for EML `set_method()`)
```{r eval=has_pandoc}
methods_file <- system.file("examples/hf205-methods.docx", package = "EML")
methods <- set_methods(methods_file)
```
We can also read in text that uses Markdown for markup elements:
```{r eval=has_pandoc}
abstract_file <- system.file("examples/hf205-abstract.md", package = "EML")
abstract <- set_TextType(abstract_file)
```
### Attribute Metadata from Tables
Attribute metadata can be verbose, and is often defined in separate tables (e.g. separate Excel sheets or `.csv` files).
Here we use attribute metadata and factor definitions as given from `.csv` files.
```{r}
attributes <- read.table(system.file("extdata/hf205_attributes.csv", package = "eml2"))
factors <- read.table(system.file("extdata/hf205_factors.csv", package = "eml2"))
attributeList <-
set_attributes(attributes,
factors,
col_classes = c("character",
"Date",
"Date",
"Date",
"factor",
"factor",
"factor",
"numeric"))
```
### Data file format
Though the `physical` metadata specifying the file format is extremely flexible, the `set_physical` function provides defaults appropriate for `.csv` files.
DEVELOPER NOTE: ideally the `set_physical` method should guess the appropriate metadata structure based on the file extension.
```{r}
physical <- set_physical("hf205-01-TPexp1.csv")
```
## Generic construction
In the `EML` R package, objects for which there is no `set_` method are constructed using the `new()` S4 constructor. This provided an easy way to see the list of available slots. In `eml2`, all objects are just lists, and so there is no need for special methods. We can create any object directly by nesting lists with names corresponding to the EML elements. Here we create a `keywordSet` from scratch:
```{r}
keywordSet <- list(
list(
keywordThesaurus = "LTER controlled vocabulary",
keyword = list("bacteria",
"carnivorous plants",
"genetics",
"thresholds")
),
list(
keywordThesaurus = "LTER core area",
keyword = list("populations", "inorganic nutrients", "disturbance")
),
list(
keywordThesaurus = "HFR default",
keyword = list("Harvard Forest", "HFR", "LTER", "USA")
))
```
Of course, this assumes that we have some knowledge of what the possible terms permitted in an EML keywordSet are! Not so useful for novices. We can get a preview of the elements that any object can take using the `emld::template()` option, but this involves a two-part workflow. Instead, `eml2` provides generic `construct` methods for all objects.
## Constructor methods
For instance, the function `eml$creator()` has function arguments corresponding to each possible slot for a creator. This means we can rely on *tab completion* (and/or autocomplete previews in RStudio) to see what the possible options are. `eml$` functions exist for all complex types. If `eml$` does not exist for an argument (e.g. there is no `eml$givenName`), then the field takes a simple string argument.
### Creating parties (creator, contact, publisher)
```{r}
aaron <- eml$creator(
individualName = eml$individualName(
givenName = "Aaron",
surName = "Ellison"),
electronicMailAddress = "[email protected]")
```
```{r}
HF_address <- eml$address(
deliveryPoint = "324 North Main Street",
city = "Petersham",
administrativeArea = "MA",
postalCode = "01366",
country = "USA")
```
```{r}
publisher <- eml$publisher(
organizationName = "Harvard Forest",
address = HF_address)
```
```{r}
contact <-
list(
individualName = aaron$individualName,
electronicMailAddress = aaron$electronicMailAddress,
address = HF_address,
organizationName = "Harvard Forest",
phone = "000-000-0000")
```
### Putting it all together
```{r}
my_eml <- eml$eml(
packageId = uuid::UUIDgenerate(),
system = "uuid",
dataset = eml$dataset(
title = "Thresholds and Tipping Points in a Sarracenia",
creator = aaron,
pubDate = "2012",
intellectualRights = "http://www.lternet.edu/data/netpolicy.html.",
abstract = abstract,
keywordSet = keywordSet,
coverage = coverage,
contact = contact,
methods = methods,
dataTable = eml$dataTable(
entityName = "hf205-01-TPexp1.csv",
entityDescription = "tipping point experiment 1",
physical = physical,
attributeList = attributeList)
))
```
## Serialize and validate
We can also validate first and then serialize:
```{r}
eml_validate(my_eml)
write_eml(my_eml, "eml.xml")
```
```{r include = FALSE}
unlink("eml.xml")
unlink("ex.xml")
codemetar::write_codemeta()
```