Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to retain non-standard tows when pulling data #156

Merged
merged 25 commits into from
Oct 16, 2024

Conversation

chantelwetzel-noaa
Copy link
Contributor

This pull request addresses #115 by:

  1. Add a function argument that allows users to specify standard_filtering = TRUE or to retain tow/bio/haul/specimen information from tows with poor performance, outside the standard depth range, and stations removed from the standard station list. This is done by removing the hard-coded filters applied in get_url() and create a function to do the filtering of pulled data by filter_pulls() when standard_filtering = TRUE.
  2. Use the cli package to provide information to the user on the filtered data when verbose = TRUE.
  3. Refine combine_tows() on how to combine data when pulling data from a single or multiple-species (e.g., rougheye and blackspotted rockfish).

adds statements about what number of positive record are removed and adds the ability for user to keep all tows despite bad performance or not being in the standard survey protocal.
Remove the default filters from the data pulling process, add messages about the data removed for the user, and a switch to turn off filtering.
An error was found in the pull_haul data filters that resulted in the end years not being included.  The code has been fixed and the test updated accordingly.
Function provides summary of the number of samples from tows with bad performance, outside depth bounds, or from stations removed for the standard stations with the option to remove or keep these samples
unction provides summary of the number of samples from tows with bad performance, outside depth bounds, or from stations removed for the standard stations with the option to remove or keep these samples
The table already included rougheye rockfish, but adding the additional naming alternatives helps ensure that the information is found depending upon the common_name.
Copy link
Contributor

@iantaylor-NOAA iantaylor-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
The additional information is helpful and I like the use of the {cli} package.

In addition to the somewhat trivial comments I made, the one other change I would suggest is to add messages about the total number of samples before and after the filtering. For instance, in pull_bio(), instead of the messages currently reported:

i Pulling biological data for Pacific ocean perch. This can take up to ~ 30 seconds (or more).
i There were 84 biological samples with non-satisfactory tow performance (e.g., no area swept estimate, net issues, etc.).
i There were 6 biological samples from stations that have been removed from the standard station list.

It might be nice to have something like

i Pulling biological data for Pacific ocean perch. This can take up to ~ 30 seconds (or more).
i There were 16910 biological samples extracted.
i There were 84 biological samples with non-satisfactory tow performance (e.g., no area swept estimate, net issues, etc.).
i There were 6 biological samples from stations that have been removed from the standard station list.
i There were 16820 biological samples remaining after applying standard filtering.

Where that last message would be dependent on the input choice and could instead read:
No biological samples were removed because standard_filtering = FALSE.

R/filter_pulls.R Outdated
#' Function to create messages on data that are outside the standard survey protocol
#' and to remove these samples if `standard_filtering` = TRUE. The data are checked
#' for tow performance, valid stations, and depth range. This function is called
#' within the pull functions, but can be called on pulled data frames is filtering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is filtering" should be "if filtering"

#'
#' @param data Data frame of pulled data created by the [pull_catch()], [pull_bio()],
#' [pull_haul()], or [pull_biological_samples].
#' @param data_type Character string to include within data filtering messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to add (such as "biological samples") to the description of this input.

Also, how about providing a default value of data_type = "samples" to simplify things if users apply this new function outside of the pull_* functions?

R/pull_catch.R Outdated
if (length(bad_sample_types) > 0) {
if (verbose) {
cli::cli_alert_info(
"There were {length(bad_sample_types)} positive tows where the sample type was not requested."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this message means. Is it letting the user know that their filter of sample type has excluded positive tows? Maybe the bigger issue is that I'm confused about sample_types are even after reading https://github.com/pfmc-assessments/nwfscSurvey/blob/main/man-roxygen/sample_types.R.

@chantelwetzel-noaa chantelwetzel-noaa merged commit 7307475 into main Oct 16, 2024
8 checks passed
@chantelwetzel-noaa chantelwetzel-noaa deleted the pull_filtering branch October 16, 2024 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants