Skip to content
forked from cboettig/eml2

A list-based rewrite of higher-level functions from EML

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

scelmendorf/eml2

 
 

Repository files navigation

Build Status Coverage Status stability-experimental

eml2

The goal of eml2 is to provide both a drop-in replacement for the higher-level functions of the existing EML package while also providing additional functionality.

eml2 uses only simple and familiar list structures (S3 classes) instead of the more cumbersome use of S4 found in the original EML. While the higher-level functions are identical, this makes it easier to for most users and developers to work with eml objects and also to write their own functions for creating and manipulating EML objects. Under the hood, eml2 relies on the emld package, which uses a Linked Data representation for EML. It is this approach which lets us combine the simplicity of lists with the specificy required by the XML schema.

Creating EML

library(eml2)

A minimal valid EML document:

me <- list(individualName = list(givenName = "Carl", surName = "Boettiger"))
my_eml <- list(dataset = list(
              title = "A Mimimal Valid EML Dataset",
              creator = me,
              contact = me)
            )


write_eml(my_eml, "ex.xml")
eml_validate("ex.xml")
#> [1] TRUE
#> attr(,"errors")
#> character(0)

A Richer Example

Here we show a the creation of a relatively complete EML document using eml2. This closely parallels the function calls shown in the original EML R-package vignette.

set_* methods

The original EML R package defines a set of higher-level set_* methods to facilitate the creation of complex metadata structures. eml2 provides these same methods, taking the same arguments for set_coverage, set_attributes, set_physical, set_methods and set_textType, as illustrated here:

Coverage metadata

geographicDescription <- "Harvard Forest Greenhouse, Tom Swamp Tract (Harvard Forest)"
coverage <- 
  set_coverage(begin = '2012-06-01', end = '2013-12-31',
               sci_names = "Sarracenia purpurea",
               geographicDescription = geographicDescription,
               west = -122.44, east = -117.15, 
               north = 37.38, south = 30.00,
               altitudeMin = 160, altitudeMaximum = 330,
               altitudeUnits = "meter")

Reading in text from Word and Markdown

We read in detailed methods written in a Word doc. This uses EML’s docbook-style markup to preserve formatting of paragraphs, lists, titles, and so forth. (This is a drop-in replacement for EML set_method())

methods_file <- system.file("examples/hf205-methods.docx", package = "EML")
methods <- set_methods(methods_file)

We can also read in text that uses Markdown for markup elements:

abstract_file <-  system.file("examples/hf205-abstract.md", package = "EML")
abstract <- set_TextType(abstract_file)

Attribute Metadata from Tables

Attribute metadata can be verbose, and is often defined in separate tables (e.g. separate Excel sheets or .csv files). Here we use attribute metadata and factor definitions as given from .csv files.

attributes <- read.table(system.file("extdata/hf205_attributes.csv", package = "eml2"))
factors <- read.table(system.file("extdata/hf205_factors.csv", package = "eml2"))
attributeList <- 
  set_attributes(attributes, 
                 factors, 
                 col_classes = c("character", 
                                 "Date",
                                 "Date",
                                 "Date",
                                 "factor",
                                 "factor",
                                 "factor",
                                 "numeric"))

Data file format

Though the physical metadata specifying the file format is extremely flexible, the set_physical function provides defaults appropriate for .csv files. DEVELOPER NOTE: ideally the set_physical method should guess the appropriate metadata structure based on the file extension.

physical <- set_physical("hf205-01-TPexp1.csv")

Generic construction

In the EML R package, objects for which there is no set_ method are constructed using the new() S4 constructor. This provided an easy way to see the list of available slots. In eml2, all objects are just lists, and so there is no need for special methods. We can create any object directly by nesting lists with names corresponding to the EML elements. Here we create a keywordSet from scratch:

keywordSet <- list(
    list(
        keywordThesaurus = "LTER controlled vocabulary",
        keyword = list("bacteria",
                    "carnivorous plants",
                    "genetics",
                    "thresholds")
        ),
    list(
        keywordThesaurus = "LTER core area",
        keyword =  list("populations", "inorganic nutrients", "disturbance")
        ),
    list(
        keywordThesaurus = "HFR default",
        keyword = list("Harvard Forest", "HFR", "LTER", "USA")
        ))

Of course, this assumes that we have some knowledge of what the possible terms permitted in an EML keywordSet are! Not so useful for novices. We can get a preview of the elements that any object can take using the emld::template() option, but this involves a two-part workflow. Instead, eml2 provides generic construct methods for all objects.

Constructor methods

For instance, the function eml$creator() has function arguments corresponding to each possible slot for a creator. This means we can rely on tab completion (and/or autocomplete previews in RStudio) to see what the possible options are. eml$ functions exist for all complex types. If eml$ does not exist for an argument (e.g. there is no eml$givenName), then the field takes a simple string argument.

Creating parties (creator, contact, publisher)

aaron <- eml$creator(
  individualName = eml$individualName(
    givenName = "Aaron", 
    surName = "Ellison"),
  electronicMailAddress = "[email protected]")
HF_address <- eml$address(
                  deliveryPoint = "324 North Main Street",
                  city = "Petersham",
                  administrativeArea = "MA",
                  postalCode = "01366",
                  country = "USA")
publisher <- eml$publisher(
                 organizationName = "Harvard Forest",
                 address = HF_address)
contact <- 
  list(
    individualName = aaron$individualName,
    electronicMailAddress = aaron$electronicMailAddress,
    address = HF_address,
    organizationName = "Harvard Forest",
    phone = "000-000-0000")

Putting it all together

my_eml <- eml$eml(
           packageId = uuid::UUIDgenerate(),  
           system = "uuid",
           dataset = eml$dataset(
               title = "Thresholds and Tipping Points in a Sarracenia",
               creator = aaron,
               pubDate = "2012",
               intellectualRights = "http://www.lternet.edu/data/netpolicy.html.",
               abstract = abstract,
               keywordSet = keywordSet,
               coverage = coverage,
               contact = contact,
               methods = methods,
               dataTable = eml$dataTable(
                 entityName = "hf205-01-TPexp1.csv",
                 entityDescription = "tipping point experiment 1",
                 physical = physical,
                 attributeList = attributeList)
               ))

Serialize and validate

We can also validate first and then serialize:

eml_validate(my_eml)
#> [1] TRUE
#> attr(,"errors")
#> character(0)
write_eml(my_eml, "eml.xml")

About

A list-based rewrite of higher-level functions from EML

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 80.1%
  • TeX 13.6%
  • JavaScript 6.1%
  • CSS 0.2%