Skip to content

Commit

Permalink
Merge pull request #130 from inbo/update_metadata_assignments
Browse files Browse the repository at this point in the history
Update scopes in metadata for assignment functions
  • Loading branch information
peterdesmet authored Oct 16, 2024
2 parents 4332a64 + a3472d5 commit 80e0674
Show file tree
Hide file tree
Showing 26 changed files with 441 additions and 130 deletions.
9 changes: 6 additions & 3 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@
* New function `write_eml()` transforms Camtrap DP metadata to EML (#61).
* New function `round_coordinates()` allows to fuzzy/generalize location information by rounding deployment `latitude` and `longitude`. It also updates `coordinateUncertainty` in the deployments and `coordinatePrecision` and spatial scope in the metadata (#85).
* New function `shift_time()` allows to shift/correct date-times in data and metadata for specified deploymentIDs and duration (#105).
* `filter_deployments()` now updates the spatial and temporal scope in the metadata (#72).
* `filter_observations()` and `filter_media()` now update the taxonomic scope in the metadata (#73).
* `read_camtrap_dp()` now upgrades datasets to Camtrap DP 1.0.1. The internal function `convert()` has been renamed to `upgrade()` (#113).
* `filter_deployments()` and `deployments()<-` now update the spatial, temporal and taxonomic scope in the metadata based on the returned data (#72, #111, #132).
* `filter_observations()`, `filter_media()`, `media()<-` and `observations()<-` now update the taxonomic scope in the metadata based on the returned data (#73, #111).
* `read_camtrapdp()` now updates the spatial and temporal scope in metadata based on the data (#130).
* `read_camtrapdp()` now upgrades datasets to Camtrap DP 1.0.1. The internal function `convert()` has been renamed to `upgrade()` (#113).
* Internal function `build_taxa()` is renamed to `taxonomic()` (#130).
* `taxa()` now removes duplicates (#130).

# camtrapdp 0.3.1

Expand Down
15 changes: 13 additions & 2 deletions R/deployments.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,11 @@
#' `deployments()` gets the deployments from a Camera Trap Data Package object.
#'
#' `deployments<-()` is the assignment equivalent.
#' It should only be used within other functions, where the expected data
#' structure can be guaranteed.
#'
#' - It should only be used within other functions, where the expected data
#' structure can be guaranteed.
#' - Metadata (`x$spatial` and `x$temporal`) are updated to match the assigned
#' deployments.
#'
#' @inheritParams print.camtrapdp
#' @return A [tibble::tibble()] data frame with deployments.
Expand Down Expand Up @@ -33,6 +36,14 @@ deployments <- function(x) {
class = "camtrapdp_error_assignment_wrong_class"
)
}

purrr::pluck(x, "data", "deployments") <- dplyr::as_tibble(value)

# Update spatial and temporal scope in metadata
x <-
x %>%
update_spatial() %>%
update_temporal()

return(x)
}
12 changes: 3 additions & 9 deletions R/filter_deployments.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
#'
#' - Media are filtered on associated `deploymentID`.
#' - Observations are filtered on associated `deploymentID`.
#' - Metadata (`x$temporal` and `x$spatial`) are updated to match the filtered
#' deployments.
#' - Metadata (`x$spatial`, `x$temporal` and `x$taxonomic`) are updated to match
#' the filtered deployments.
#'
#' @inheritParams print.camtrapdp
#' @param ... Filtering conditions, see `dplyr::filter()`.
Expand Down Expand Up @@ -58,16 +58,10 @@ filter_deployments <- function(x, ...) {
observations(x) %>%
dplyr::filter(.data$deploymentID %in% deployments$deploymentID)

# Assign filtered data
# Assign filtered data (also updates spatial, temporal and taxonomic scope)
deployments(x) <- deployments
media(x) <- media
observations(x) <- observations

# Update temporal and spatial scope in metadata
x <-
x %>%
update_temporal() %>%
update_spatial()

return(x)
}
5 changes: 1 addition & 4 deletions R/filter_media.R
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,9 @@ filter_media <- function(x, ...) {
(.data$observationLevel == "event" & .data$eventID %in% select_event_ids)
)

# Assign filtered data
# Assign filtered data (also updates taxonomic scope)
media(x) <- media
observations(x) <- observations

# Update taxonomic scope in metadata
x <- update_taxonomic(x)

return(x)
}
5 changes: 1 addition & 4 deletions R/filter_observations.R
Original file line number Diff line number Diff line change
Expand Up @@ -81,12 +81,9 @@ filter_observations <- function(x, ...) {
.data$eventID %in% select_event_ids
)

# Assign filtered data
# Assign filtered data (also update taxonomic scope)
media(x) <- media
observations(x) <- observations

# Update taxonomic scope in metadata
x <- update_taxonomic(x)

return(x)
}
4 changes: 2 additions & 2 deletions R/media.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
#' `media()` gets the media from a Camera Trap Data Package object.
#'
#' `media<-()` is the assignment equivalent.
#' It should only be used within other functions, where the expected data
#' structure can be guaranteed.
#' - It should only be used within other functions, where the expected data
#' structure can be guaranteed.
#'
#' @inheritParams print.camtrapdp
#' @return A [tibble::tibble()] data frame with media.
Expand Down
10 changes: 8 additions & 2 deletions R/observations.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@
#' object.
#'
#' `observations<-()` is the assignment equivalent.
#' It should only be used within other functions, where the expected data
#' structure can be guaranteed.
#' - It should only be used within other functions, where the expected data
#' structure can be guaranteed.
#' - Metadata (`x$taxonomic`) are updated to match the assigned observations.
#'
#' @inheritParams print.camtrapdp
#' @return A [tibble::tibble()] data frame with observations.
Expand Down Expand Up @@ -34,6 +35,11 @@ observations <- function(x) {
class = "camtrapdp_error_assignment_wrong_class"
)
}

purrr::pluck(x, "data", "observations") <- dplyr::as_tibble(value)

# Update taxonomic scope in metadata
x <- update_taxonomic(x)

return(x)
}
48 changes: 32 additions & 16 deletions R/read_camtrapdp.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,14 @@
#' Note that this can result in media being linked to multiple events (and thus
#' being duplicated), for example when events and sub-events were defined.
#'
#' @section Update metadata:
#'
#' Camtrap DP metadata has a `spatial` and `temporal` property that contains the
#' spatial and temporal coverage of the package respectively.
#'
#' This function **will automatically update the spatial and temporal scopes**
#' in metadata based on the data.
#'
#' @param file Path or URL to a `datapackage.json` file.
#' @return A Camera Trap Data Package object.
#' @family read functions
Expand Down Expand Up @@ -62,18 +70,33 @@ read_camtrapdp <- function(file) {
attr(x, "version") <- version

# Read and attach csv data
deployments(x) <-
# Assignment functions should not be used here, to bypass metadata update
# and validation, which comes later
purrr::pluck(x, "data", "deployments") <-
frictionless::read_resource(package, "deployments")
media(x) <-
purrr::pluck(x, "data", "media") <-
frictionless::read_resource(package, "media")
observations(x) <-
purrr::pluck(x, "data", "observations") <-
frictionless::read_resource(package, "observations")

# Upgrade
x <- upgrade(x, upgrade_to = "1.0.1")

# Add eventID to media
media(x) <-
dplyr::left_join(
media(x),
events(x),
by = dplyr::join_by(
"deploymentID",
"timestamp" >= "eventStart",
"timestamp" <= "eventEnd"
)
) %>%
dplyr::select(-"eventStart", -"eventEnd")

# Add taxonomic info to observations
taxonomy <- build_taxa(x)
taxonomy <- taxonomic(x)
if (!is.null(taxonomy)) {
# Add taxon. as column suffix
colnames(taxonomy) <- paste("taxon", colnames(taxonomy), sep = ".")
Expand All @@ -87,18 +110,11 @@ read_camtrapdp <- function(file) {
)
}

# Add eventID to media
media(x) <-
dplyr::left_join(
media(x),
events(x),
by = dplyr::join_by(
"deploymentID",
"timestamp" >= "eventStart",
"timestamp" <= "eventEnd"
)
) %>%
dplyr::select(-"eventStart", -"eventEnd")
# Update temporal and spatial scope in metadata
x <-
x %>%
update_temporal() %>%
update_spatial()

return(x)
}
46 changes: 44 additions & 2 deletions R/taxa.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
#' Get taxa
#'
#' @description
#' Gets the (unique) scientific names and associated taxonomic information from
#' the observations of a Camera Trap Data Package object.
#' Duplicate taxa (i.e. with the same `scientificName`) are removed, retaining
#' the taxon with (first) a `taxonID` and (second) the most taxonomic
#' information.
#'
#' @inheritParams print.camtrapdp
#' @return A [tibble::tibble()] data frame with the taxonomic information,
Expand All @@ -13,9 +17,47 @@
#' taxa(x)
taxa <- function(x) {
check_camtrapdp(x)
observations(x) %>%
taxa <-
observations(x) %>%
dplyr::filter(!is.na(.data$scientificName)) %>%
dplyr::select("scientificName", dplyr::starts_with("taxon.")) %>%
dplyr::distinct() %>%
dplyr::rename_with(~ sub("^taxon.", "", .x))
dplyr::rename_with(~ sub("^taxon.", "", .x)) %>%
dplyr::arrange(scientificName)

# Remove duplicates without taxonID
if ("taxonID" %in% names(taxa)) {
duplicates_without_taxonid <-
taxa %>%
dplyr::group_by(.data$scientificName) %>%
dplyr::filter(dplyr::n() > 1) %>%
dplyr::filter(is.na(.data$taxonID))
taxa <- dplyr::anti_join(
taxa,
duplicates_without_taxonid,
by = names(duplicates_without_taxonid)
)
}

# Remove duplicates with the least information
duplicates_with_least_info <-
taxa %>%
dplyr::mutate(columns_with_info = rowSums(!is.na(.))) %>%
dplyr::group_by(.data$scientificName) %>%
dplyr::filter(dplyr::n() > 1) %>%
dplyr::arrange(dplyr::desc(columns_with_info)) %>%
dplyr::slice_tail(n = -1) %>% # Remove first row from group (with most info)
dplyr::ungroup() %>%
dplyr::select(-columns_with_info)
taxa <- dplyr::anti_join(
taxa,
duplicates_with_least_info,
by = names(duplicates_with_least_info)
)

# Drop any columns that are empty (e.g as result of dropping duplicates)
cols_to_keep <- colSums(is.na(taxa)) != nrow(taxa)
taxa <- taxa[, cols_to_keep, drop = FALSE]

return(taxa)
}
22 changes: 11 additions & 11 deletions R/build_taxa.R → R/taxonomic.R
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
#' Build a data frame with taxonomic information
#' Get taxonomic metadata and build a data frame
#'
#' Builds a data frame from the `taxonomy` property in a Camera Trap Data
#' Package object.
#' Gets the `x$taxonomic` property in a Camera Trap Data Package object and
#' builds a data frame with the taxonomic information.
#'
#' @inheritParams print.camtrapdp
#' @return A data frame with the taxonomic information.
#' @noRd
build_taxa <- function(x) {
taxonomic <- function(x) {
# Extract the taxonomic information
taxonomic_list <- purrr::pluck(x, "taxonomic")

Expand All @@ -16,7 +16,7 @@ build_taxa <- function(x) {
}

# Convert list into a data.frame
taxonomy_df <-
taxa <-
purrr::map(
taxonomic_list,
purrr::list_flatten,
Expand All @@ -26,7 +26,7 @@ build_taxa <- function(x) {
purrr::list_rbind()

# Warn if there are duplicate scientificNames
scientific_names <- purrr::pluck(taxonomy_df, "scientificName")
scientific_names <- purrr::pluck(taxa, "scientificName")
duplicate_names <- scientific_names[duplicated(scientific_names)]
if (length(duplicate_names) > 0) {
cli::cli_warn(
Expand All @@ -40,16 +40,16 @@ build_taxa <- function(x) {
}

# Only keep the first row if a scientificName occurs more than once
taxonomy_df <- dplyr::distinct(
taxonomy_df,
taxa <- dplyr::distinct(
taxa,
.data$scientificName,
.keep_all = TRUE
)

# Drop any columns that are empty (e.g as result of dropping duplicates)
cols_to_keep <- colSums(is.na(taxonomy_df)) != nrow(taxonomy_df)
taxonomy_df <- taxonomy_df[, cols_to_keep, drop = FALSE]
cols_to_keep <- colSums(is.na(taxa)) != nrow(taxa)
taxa <- taxa[, cols_to_keep, drop = FALSE]

# Return data.frame
return(taxonomy_df)
return(taxa)
}
Loading

0 comments on commit 80e0674

Please sign in to comment.