-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to retain non-standard tows when pulling data #156
Conversation
adds statements about what number of positive record are removed and adds the ability for user to keep all tows despite bad performance or not being in the standard survey protocal.
Remove the default filters from the data pulling process, add messages about the data removed for the user, and a switch to turn off filtering.
Function provides summary of the number of samples from tows with bad performance, outside depth bounds, or from stations removed for the standard stations with the option to remove or keep these samples
unction provides summary of the number of samples from tows with bad performance, outside depth bounds, or from stations removed for the standard stations with the option to remove or keep these samples
The table already included rougheye rockfish, but adding the additional naming alternatives helps ensure that the information is found depending upon the common_name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
The additional information is helpful and I like the use of the {cli} package.
In addition to the somewhat trivial comments I made, the one other change I would suggest is to add messages about the total number of samples before and after the filtering. For instance, in pull_bio()
, instead of the messages currently reported:
i Pulling biological data for Pacific ocean perch. This can take up to ~ 30 seconds (or more).
i There were 84 biological samples with non-satisfactory tow performance (e.g., no area swept estimate, net issues, etc.).
i There were 6 biological samples from stations that have been removed from the standard station list.
It might be nice to have something like
i Pulling biological data for Pacific ocean perch. This can take up to ~ 30 seconds (or more).
i There were 16910 biological samples extracted.
i There were 84 biological samples with non-satisfactory tow performance (e.g., no area swept estimate, net issues, etc.).
i There were 6 biological samples from stations that have been removed from the standard station list.
i There were 16820 biological samples remaining after applying standard filtering.
Where that last message would be dependent on the input choice and could instead read:
No biological samples were removed because standard_filtering = FALSE
.
R/filter_pulls.R
Outdated
#' Function to create messages on data that are outside the standard survey protocol | ||
#' and to remove these samples if `standard_filtering` = TRUE. The data are checked | ||
#' for tow performance, valid stations, and depth range. This function is called | ||
#' within the pull functions, but can be called on pulled data frames is filtering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"is filtering" should be "if filtering"
#' | ||
#' @param data Data frame of pulled data created by the [pull_catch()], [pull_bio()], | ||
#' [pull_haul()], or [pull_biological_samples]. | ||
#' @param data_type Character string to include within data filtering messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to add (such as "biological samples")
to the description of this input.
Also, how about providing a default value of data_type = "samples"
to simplify things if users apply this new function outside of the pull_*
functions?
R/pull_catch.R
Outdated
if (length(bad_sample_types) > 0) { | ||
if (verbose) { | ||
cli::cli_alert_info( | ||
"There were {length(bad_sample_types)} positive tows where the sample type was not requested." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this message means. Is it letting the user know that their filter of sample type has excluded positive tows? Maybe the bigger issue is that I'm confused about sample_types
are even after reading https://github.com/pfmc-assessments/nwfscSurvey/blob/main/man-roxygen/sample_types.R.
This pull request addresses #115 by:
standard_filtering = TRUE
or to retain tow/bio/haul/specimen information from tows with poor performance, outside the standard depth range, and stations removed from the standard station list. This is done by removing the hard-coded filters applied inget_url()
and create a function to do the filtering of pulled data byfilter_pulls()
whenstandard_filtering = TRUE
.cli
package to provide information to the user on the filtered data whenverbose = TRUE
.combine_tows()
on how to combine data when pulling data from a single or multiple-species (e.g., rougheye and blackspotted rockfish).