Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New function merge_camtrapdp() #112

Open
wants to merge 125 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
ffa9ee6
new function merge_camtrapdp()
sannegovaert Jul 23, 2024
0b677c5
Create test-merge_camtrapdp.R
sannegovaert Jul 24, 2024
10d9227
document()
sannegovaert Jul 24, 2024
04afa53
add test "merge_camtrapdp() returns no duplicated deploymentID's"
sannegovaert Jul 24, 2024
2df5162
fix typo
sannegovaert Jul 25, 2024
d8f6f0a
import digest
sannegovaert Jul 25, 2024
2ebdcc0
update test
sannegovaert Jul 25, 2024
bcea3dc
give unique deploymentIDs to deployments
sannegovaert Jul 25, 2024
82fa057
set unique mediaID's and observationID's
sannegovaert Jul 25, 2024
429f201
update comment
sannegovaert Jul 25, 2024
8ce8ad2
update documentation
sannegovaert Jul 25, 2024
9297cd9
update examples
sannegovaert Jul 25, 2024
fefac69
correct typo
sannegovaert Jul 25, 2024
e80eb49
delete space in messages
sannegovaert Jul 25, 2024
298b459
test merge_camtrapdp() returns message when ID's are replaced
sannegovaert Jul 25, 2024
61bb20d
update test "merge_camtrapdp() returns unique deplpymentID's, mediaID…
sannegovaert Jul 26, 2024
53eaded
character limit
sannegovaert Jul 26, 2024
03051d9
create helper function `replace_duplicated_deploymentID()`
sannegovaert Jul 26, 2024
af034cf
create helper function `vdigest_crc32()`
sannegovaert Jul 26, 2024
279faaa
rename and update function `replace_duplicated_deploymentID()` to `ge…
sannegovaert Jul 26, 2024
2c6c71f
update comments
sannegovaert Jul 26, 2024
585d14e
new helper function `generate_mediaID()`
sannegovaert Jul 26, 2024
5f007fb
add helper function `generate_observationID()`
sannegovaert Jul 26, 2024
94f3c36
update `generate_observationID()` to `replace_observationID()`
sannegovaert Jul 26, 2024
0924691
update `generate_mediaID()` to `replace_mediaID()`
sannegovaert Jul 29, 2024
40e0a14
update `generate_deploymentID()` to `replace_deploymentID()`
sannegovaert Jul 29, 2024
f6caca2
new helper function `replace_duplicatedIDs()`
sannegovaert Jul 29, 2024
b533e4e
update comment
sannegovaert Jul 29, 2024
63bb091
check for length ID
sannegovaert Jul 29, 2024
a740d25
test for valid name and title
sannegovaert Jul 29, 2024
13442c1
capitilize comments
sannegovaert Jul 30, 2024
adf5fa1
new helper function `normalize_list()`
sannegovaert Jul 30, 2024
3454961
new helper function `is_subset()`
sannegovaert Jul 30, 2024
9d07cf7
add and use helper functions `update_unique()` and `remove_duplicates()`
sannegovaert Jul 30, 2024
ce1cba2
uncomment camtrapdp_error_length_ aborts because it cannot be tested …
sannegovaert Jul 30, 2024
a07f3d0
replace non-ASCII characters
sannegovaert Jul 30, 2024
3dad912
replace stats::setNames
sannegovaert Jul 30, 2024
1690c3c
grammar
sannegovaert Jul 31, 2024
ccbaea6
Merge branch 'main' into merge_datasets
peterdesmet Sep 9, 2024
8a17dc0
Merge branch 'main' into merge_datasets
sannegovaert Sep 26, 2024
693465d
leave name and title empty (don't have the user set those in the func…
sannegovaert Sep 26, 2024
16f5f28
typo
sannegovaert Sep 26, 2024
c534494
remove title and name arguments from tests
sannegovaert Sep 26, 2024
a00e6cd
Do not generate an id. That also solves the problem of having meaning…
sannegovaert Sep 26, 2024
9c72c1b
remove params title and name from documentation
sannegovaert Sep 26, 2024
500262d
documen()
sannegovaert Sep 26, 2024
7e7011b
replace project with projects
sannegovaert Sep 26, 2024
4519945
add helper function check_duplicate_ids()
sannegovaert Sep 27, 2024
7c3f9c6
remove `replace_ ()` helper functions
sannegovaert Sep 27, 2024
3f145ea
add documentation
sannegovaert Sep 27, 2024
5e52a7b
new helper function 'add_suffx()`
sannegovaert Sep 27, 2024
4f1146d
remove helper function `replace_duplicatedIDs()`
sannegovaert Sep 27, 2024
af0876e
use `add_suffx()`
sannegovaert Sep 27, 2024
4f2ea1e
add param suffix
sannegovaert Sep 27, 2024
d198a68
keep NAs in mediaID when adding suffix
sannegovaert Sep 27, 2024
6cf7e0e
do not merge objects in helper function
sannegovaert Sep 27, 2024
8328065
also add suffix to eventIDs and individualDs
sannegovaert Sep 27, 2024
7ac0413
add warnings
sannegovaert Sep 27, 2024
42bdbde
avoid warning message of `any()`
sannegovaert Sep 27, 2024
db1266c
update tests
sannegovaert Sep 27, 2024
57c0bb3
individualIDs are allowed to be duplicated between data packages
sannegovaert Sep 27, 2024
08f668d
replace suffix with prefix
sannegovaert Sep 27, 2024
f7ca592
typo
sannegovaert Sep 27, 2024
ee90618
merge_camtrapdp() adds prefixes to all values of identifiers
sannegovaert Sep 27, 2024
c0119f0
devtools::document()
sannegovaert Sep 27, 2024
af85ae6
test on warning invalid prefix
sannegovaert Sep 27, 2024
4509b47
Update merge_camtrapdp.R
sannegovaert Sep 27, 2024
8f759b0
raise error, not warning
sannegovaert Sep 27, 2024
fecb8a4
merge_camtrapdp() returns error on duplicate Data Package id
sannegovaert Sep 27, 2024
99e91ad
correction: not a warning but error
sannegovaert Sep 27, 2024
fe7b09b
give unique ids to example datasets to merge
sannegovaert Sep 27, 2024
da0cb18
reorder tests
sannegovaert Sep 30, 2024
dac7973
camtrapdp id must be character
sannegovaert Sep 30, 2024
abe30ab
change default prefix
sannegovaert Sep 30, 2024
b5c0992
set default in function
sannegovaert Sep 30, 2024
006f217
typo
sannegovaert Sep 30, 2024
cbda9db
add tests for metadata
sannegovaert Sep 30, 2024
d76b491
account for ID == NULL
sannegovaert Sep 30, 2024
472f3d3
correct mistake in keywords
sannegovaert Sep 30, 2024
4386be9
add tests (work in progress)
sannegovaert Sep 30, 2024
2ff6d69
typo
sannegovaert Oct 1, 2024
96b9322
update test on metadata
sannegovaert Oct 1, 2024
6aa0983
small update
sannegovaert Oct 1, 2024
7da1b70
update prefix
sannegovaert Oct 8, 2024
c3e874c
update parameter names
sannegovaert Oct 8, 2024
d2acd21
update parameter names and prefix
sannegovaert Oct 8, 2024
649f0f1
fix name merged DP
sannegovaert Oct 8, 2024
3c25c05
rename merged DP
sannegovaert Oct 8, 2024
13d9b44
add tests on custom prefixes
sannegovaert Oct 8, 2024
8822b7e
Update test-merge_camtrapdp.R
sannegovaert Oct 8, 2024
156d6d8
add test on piping
sannegovaert Oct 8, 2024
a9c07d9
set id to NULL instead of NA
sannegovaert Oct 8, 2024
b6b8c68
add tests for description
sannegovaert Oct 8, 2024
f35e46d
id is set to NULL instead of NA
sannegovaert Oct 8, 2024
3c4bb10
taxonomic scope should also be updated in filter_deployments()
sannegovaert Oct 9, 2024
633171b
set directory
sannegovaert Oct 9, 2024
d9943ce
update documentation
sannegovaert Oct 9, 2024
bb70934
Merge branch 'main' into merge_datasets
peterdesmet Oct 16, 2024
56454a8
Update test-merge_camtrapdp.R
sannegovaert Oct 16, 2024
913eb98
Update NEWS.md
sannegovaert Oct 16, 2024
f8327b6
add documentation
sannegovaert Oct 16, 2024
99ebe0d
update on project(s)
sannegovaert Oct 16, 2024
637ee27
use x and y instead of x1 and x2
sannegovaert Oct 16, 2024
0ed0511
fix example
sannegovaert Oct 16, 2024
e166bb6
add visible binding for global variables
sannegovaert Oct 16, 2024
eea76b6
typo
sannegovaert Oct 16, 2024
ccc479b
document()
sannegovaert Oct 16, 2024
800a49c
Update DESCRIPTION
sannegovaert Oct 16, 2024
e86a7b7
avoid error on lacking visible binding
sannegovaert Oct 16, 2024
dce3167
undo mistake
sannegovaert Oct 17, 2024
74983fd
Merge branch 'main' into merge_datasets
peterdesmet Oct 17, 2024
e659911
Merge branch 'main' into merge_datasets
peterdesmet Oct 25, 2024
b959be5
check for additional resources
sannegovaert Oct 25, 2024
f3c6633
add additional resources
sannegovaert Oct 25, 2024
82d2d26
move to helper functions
sannegovaert Oct 25, 2024
19f98bd
typo's
sannegovaert Oct 25, 2024
4451fee
Update test-merge_camtrapdp.R
sannegovaert Oct 25, 2024
427226f
fix `merge_additional_resources()`
sannegovaert Oct 25, 2024
f3f2a45
update documentation
sannegovaert Oct 25, 2024
28802d3
reorder
sannegovaert Oct 28, 2024
fd068f9
Add new helper function
sannegovaert Oct 28, 2024
572d5af
Replace NULL values (generated because of reading JSON) with NA
sannegovaert Oct 28, 2024
84dc619
Update test-write_camtrapdp.R
sannegovaert Oct 28, 2024
7adf2c3
Use resources()
peterdesmet Oct 28, 2024
5e7360f
avoid tidyselect warning
sannegovaert Oct 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ export(filter_media)
export(filter_observations)
export(locations)
export(media)
export(merge_camtrapdp)
export(observations)
export(read_camtrapdp)
export(round_coordinates)
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# camtrapdp (development version)

* New function `write_camtrapdp()` writes a Camera Trap Data Package to disk as a `datapackage.json` and CSV files (#137).
* New function `merge_camtrapdp()` allows to merge two datasets (#112).
* New function `write_eml()` transforms Camtrap DP metadata to EML (#99).
* New function `round_coordinates()` allows to fuzzy/generalize location information by rounding deployment `latitude` and `longitude`. It also updates `coordinateUncertainty` in the deployments and `coordinatePrecision` and spatial scope in the metadata (#106).
* New function `shift_time()` allows to shift/correct date-times in data and metadata for specified deploymentIDs and duration (#108).
Expand Down
159 changes: 159 additions & 0 deletions R/merge_camtrapdp.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
#' Merge Camera Trap Data packages
#'
#' @param x,y Camera Trap Data Package objects (as returned by
#' `read_camtrapdp()`), to be coerced to one.
#' @param prefix If there are duplicate IDs between x and y, prefixes will be
#' added to all the values of each identifier with duplicates, to disambiguate
#' them. Should be a character vector of length 2. By default, the prefixes are
#' the ID's of the Data Package.
#' @return `xy_merged` Merged Camera Trap Data Package
#' @family transformation functions
#' @export
#' @section Merging details:
#' Deployments, media and observations are combined. If there are duplicate IDs
#' between x and y, prefixes will be added to all the values of each identifier
#' with duplicates, to disambiguate them.
#' Additional resources are added, but not combined. If additional resources
#' have the same name, prefixes will be added to the resource name.
#' The following properties are set:
#' - **name**: Set to NA.
#' - **id**: Set to NULL.
#' - **created**: Set to current timestamp.
#' - **title**: Set to NA.
#' - **contributors**: A combination is made and duplicates are removed.
#' - **description**: A combination is made.
#' - **version**: Set to 1.0.
#' - **keywords**: A combination is made and duplicates are removed.
#' - **image**: Set to NULL.
#' - **homepage**: Set to NULL.
#' - **sources**: A combination is made and duplicates are removed.
#' - **licenses**: A combination is made and duplicates are removed.
#' - **bibliographicCitation**: Set to NULL.
#' - **project**: List of the projects.
#' - **coordinatePrecision**: Set to the least precise `coordinatePrecision`.
#' - **spatial**: Reset based on the new deployments.
#' - **temporal**: Reset based on the new deployments.
#' - **taxonomic**: A combination is made and duplicates are removed.
#' - **relatedIdentifiers**: A combination is made and duplicates are removed.
#' - **references**: A combination is made and duplicates are removed.
#' @section Merging multiple Camera Trap Data Packages:
#' `merge_camtrapdp()` can be used in a pipe to merge multiple camtrap DP.
#' - x %>% merge_camtrapdp(y) %>% merge_camtrapdp(z)
#' @examples
#' x <- example_dataset() %>%
#' filter_deployments(deploymentID %in% c("00a2c20d", "29b7d356"))
#' y <- example_dataset() %>%
#' filter_deployments(deploymentID %in% c("577b543a", "62c200a9"))
#' x$id <- "1"
#' y$id <- "2"
#' xy_merged <- merge_camtrapdp(x, y)
merge_camtrapdp <- function(x, y, prefix = c(x$id, y$id)) {
check_camtrapdp(x)
check_camtrapdp(y)

if (!is.null(x$id) & !is.null(y$id)) {
if (x$id == y$id) {
cli::cli_abort(
c(
paste0(
"{.arg x} and {.arg y} should be different Camera Trap Data",
"Package objects with unique identifiers."
),
x = "{.arg x} and {.arg y} have the same id: {.value x$id}"
),
class = "camtrapdp_error_camtrapdpid_duplicated"
)
}
}

# check if identifiers have duplicates
results_duplicate_ids <- check_duplicate_ids(x, y)

# Add prefix to identifiers with duplicates
if (TRUE %in% results_duplicate_ids) {

if (!is.character(prefix) || length(prefix) != 2) {
cli::cli_abort(
c(
paste(
"{.arg prefix} must be a character vector of length 2, not",
"a {class(prefix)} object of length {length(prefix)}."
)
),
class = "camtrapdp_error_prefix_invalid"
)
}

if (any(is.na(prefix))) {
cli::cli_abort(
"{.arg prefix} can't be 'NA'.",
class = "camtrapdp_error_prefix_NA"
)
}

x <- add_prefix(x, results_duplicate_ids, paste0(prefix[1], "_"))
y <- add_prefix(y, results_duplicate_ids, paste0(prefix[2], "_"))
}

# Merge Camera Trap DP resources
xy_merged <- x
deployments(xy_merged) <- dplyr::bind_rows(deployments(x), deployments(y))
media(xy_merged) <- dplyr::bind_rows(media(x), media(y))
observations(xy_merged) <- dplyr::bind_rows(observations(x), observations(y))

# Merge additional resources
xy_merged <- merge_additional_resources(xy_merged, x, y, prefix)

# Merge/update metadata
xy_merged$name <- NA
xy_merged$id <- NULL
xy_merged$created <- format(Sys.time(), "%Y-%m-%dT%H:%M:%SZ")
xy_merged$title <- NA
xy_merged$contributors <- remove_duplicates(c(x$contributors, y$contributors))
xy_merged$description <- paste(x$description, y$description, sep = "/n")
xy_merged$version <- "1.0"
xy_merged$keywords <- unique(c(x$keywords, y$keywords))
xy_merged$image <- NULL
xy_merged$homepage <- NULL
xy_merged$sources <- remove_duplicates(c(x$sources, y$sources))
xy_merged$licenses <- remove_duplicates(c(x$licenses, y$licenses))
xy_merged$project <- list(x$project, y$project)
xy_merged$bibliographicCitation <- NULL
xy_merged$coordinatePrecision <-
max(x$coordinatePrecision, y$coordinatePrecision, na.rm = TRUE)

if (!is.null(x$id)) {
relatedIdentifiers_x <- list(
relationType = "IsDerivedFrom",
relatedIdentifier = as.character(x$id),
resourceTypeGeneral = "Data package",
relatedIdentifierType = "id"
)
} else {
relatedIdentifiers_x <- list()
}
if (!is.null(y$id)) {
relatedIdentifiers_y <- list(
relationType = "IsDerivedFrom",
relatedIdentifier = as.character(y$id),
resourceTypeGeneral = "Data package",
relatedIdentifierType = "id"
)
} else {
relatedIdentifiers_y <- list()
}
new_relatedIdentifiers <- list(relatedIdentifiers_x, relatedIdentifiers_y)
xy_merged$relatedIdentifiers <- remove_duplicates(
c(x$relatedIdentifiers, y$relatedIdentifiers, new_relatedIdentifiers)
)

xy_merged$references <- unique(c(x$references, y$references))
xy_merged$directory <- "."

xy_merged <- xy_merged %>%
update_spatial() %>%
update_temporal() %>%
update_taxonomic()

return(xy_merged)
}
2 changes: 1 addition & 1 deletion R/taxa.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ taxa <- function(x) {
dplyr::select("scientificName", dplyr::starts_with("taxon.")) %>%
dplyr::distinct() %>%
dplyr::rename_with(~ sub("^taxon.", "", .x)) %>%
dplyr::arrange(scientificName)
dplyr::arrange(.data$scientificName)

# Remove duplicates without taxonID
if ("taxonID" %in% names(taxa)) {
Expand Down
3 changes: 3 additions & 0 deletions R/taxonomic.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ taxonomic <- function(x) {
return(NULL)
}

# Replace NULL with NA
taxonomic_list <- replace_null_recursive(taxonomic_list)

# Convert list into a data.frame
taxa <-
purrr::map(
Expand Down
Loading
Loading