The goal of eml2
is to provide both a drop-in replacement for the
higher-level functions of the existing EML package while also providing
additional functionality.
eml2
uses only simple and familiar list structures (S3 classes)
instead of the more cumbersome use of S4 found in the original EML.
While the higher-level functions are identical, this makes it easier to
for most users and developers to work with eml
objects and also to
write their own functions for creating and manipulating EML objects.
Under the hood, eml2
relies on the
emld package, which uses a Linked
Data representation for EML. It is this approach which lets us combine
the simplicity of lists with the specificy required by the XML
schema.
library(eml2)
me <- list(individualName = list(givenName = "Carl", surName = "Boettiger"))
my_eml <- list(dataset = list(
title = "A Mimimal Valid EML Dataset",
creator = me,
contact = me)
)
write_eml(my_eml, "ex.xml")
eml_validate("ex.xml")
#> [1] TRUE
#> attr(,"errors")
#> character(0)
Here we show a the creation of a relatively complete EML document using
eml2
. This closely parallels the function calls shown in the original
EML R-package
vignette.
The original EML R package defines a set of higher-level set_*
methods
to facilitate the creation of complex metadata structures. eml2
provides these same methods, taking the same arguments for
set_coverage
, set_attributes
, set_physical
, set_methods
and
set_textType
, as illustrated
here:
geographicDescription <- "Harvard Forest Greenhouse, Tom Swamp Tract (Harvard Forest)"
coverage <-
set_coverage(begin = '2012-06-01', end = '2013-12-31',
sci_names = "Sarracenia purpurea",
geographicDescription = geographicDescription,
west = -122.44, east = -117.15,
north = 37.38, south = 30.00,
altitudeMin = 160, altitudeMaximum = 330,
altitudeUnits = "meter")
We read in detailed methods written in a Word doc. This uses EML’s
docbook-style markup to preserve formatting of paragraphs, lists,
titles, and so forth. (This is a drop-in replacement for EML
set_method()
)
methods_file <- system.file("examples/hf205-methods.docx", package = "EML")
methods <- set_methods(methods_file)
We can also read in text that uses Markdown for markup elements:
abstract_file <- system.file("examples/hf205-abstract.md", package = "EML")
abstract <- set_TextType(abstract_file)
Attribute metadata can be verbose, and is often defined in separate
tables (e.g. separate Excel sheets or .csv
files). Here we use
attribute metadata and factor definitions as given from .csv
files.
attributes <- read.table(system.file("extdata/hf205_attributes.csv", package = "eml2"))
factors <- read.table(system.file("extdata/hf205_factors.csv", package = "eml2"))
attributeList <-
set_attributes(attributes,
factors,
col_classes = c("character",
"Date",
"Date",
"Date",
"factor",
"factor",
"factor",
"numeric"))
Though the physical
metadata specifying the file format is extremely
flexible, the set_physical
function provides defaults appropriate for
.csv
files. DEVELOPER NOTE: ideally the set_physical
method should
guess the appropriate metadata structure based on the file extension.
physical <- set_physical("hf205-01-TPexp1.csv")
In the EML
R package, objects for which there is no set_
method are
constructed using the new()
S4 constructor. This provided an easy way
to see the list of available slots. In eml2
, all objects are just
lists, and so there is no need for special methods. We can create any
object directly by nesting lists with names corresponding to the EML
elements. Here we create a keywordSet
from scratch:
keywordSet <- list(
list(
keywordThesaurus = "LTER controlled vocabulary",
keyword = list("bacteria",
"carnivorous plants",
"genetics",
"thresholds")
),
list(
keywordThesaurus = "LTER core area",
keyword = list("populations", "inorganic nutrients", "disturbance")
),
list(
keywordThesaurus = "HFR default",
keyword = list("Harvard Forest", "HFR", "LTER", "USA")
))
Of course, this assumes that we have some knowledge of what the possible
terms permitted in an EML keywordSet are! Not so useful for novices. We
can get a preview of the elements that any object can take using the
emld::template()
option, but this involves a two-part workflow.
Instead, eml2
provides generic construct
methods for all objects.
For instance, the function eml$creator()
has function arguments
corresponding to each possible slot for a creator. This means we can
rely on tab completion (and/or autocomplete previews in RStudio) to
see what the possible options are. eml$
functions exist for all
complex types. If eml$
does not exist for an argument (e.g. there is
no eml$givenName
), then the field takes a simple string argument.
aaron <- eml$creator(
individualName = eml$individualName(
givenName = "Aaron",
surName = "Ellison"),
electronicMailAddress = "[email protected]")
HF_address <- eml$address(
deliveryPoint = "324 North Main Street",
city = "Petersham",
administrativeArea = "MA",
postalCode = "01366",
country = "USA")
publisher <- eml$publisher(
organizationName = "Harvard Forest",
address = HF_address)
contact <-
list(
individualName = aaron$individualName,
electronicMailAddress = aaron$electronicMailAddress,
address = HF_address,
organizationName = "Harvard Forest",
phone = "000-000-0000")
my_eml <- eml$eml(
packageId = uuid::UUIDgenerate(),
system = "uuid",
dataset = eml$dataset(
title = "Thresholds and Tipping Points in a Sarracenia",
creator = aaron,
pubDate = "2012",
intellectualRights = "http://www.lternet.edu/data/netpolicy.html.",
abstract = abstract,
keywordSet = keywordSet,
coverage = coverage,
contact = contact,
methods = methods,
dataTable = eml$dataTable(
entityName = "hf205-01-TPexp1.csv",
entityDescription = "tipping point experiment 1",
physical = physical,
attributeList = attributeList)
))
We can also validate first and then serialize:
eml_validate(my_eml)
#> [1] TRUE
#> attr(,"errors")
#> character(0)
write_eml(my_eml, "eml.xml")