Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix biological data pull for Triennial and AFSC.Slope if no otoliths were collected #147

Merged
merged 11 commits into from
Sep 3, 2024

Conversation

chantelwetzel-noaa
Copy link
Contributor

This pull request:

  1. Switches to the cli package to produce error and informational text for users.
  2. Moves the error message when no data are available from get_json() to pull_bio() since the main function pull_bio() is the function that should halt if these conditions are met.
  3. Add additional conditional statements that allows the function to proceed for the Triennial and AFSC.Slope survey if no otoliths were collected for the species of interest.
  4. Change the names of items in the data list for the Triennial and AFSC.Slope to be lower case and more informative.
  5. Clean up the covert conditional code to only convert column names.

@brianlangseth-NOAA can you review these changes for the issues you were having? Once you have reviewed the changes, I will coordinate merging this pull request into the main branch based on the other outstanding pull request #145.

The stop located within get_json() was not fully stopping the function leading to a confusing error about a missing column when there was actually no data in the data warehouse for a particular species.  Moving this here stops the function correctly.
- only apply the filters if the initial bio_pull exists
- add info print statements about what is being removed
- fix the data list for Triennial and AFSC.Slope to not remove the age data pull if there are no ages if otoliths were collected

The function was failing in the situations where no otoliths were collected from a species across all survey years.  However, if there was otolith data no ages the function worked but did not report the age data frame.  Even if there were no ages, one may want to know that there were otoliths collected so this has been fixed
- move the majority of the special if statements for the Triennial and AFSC.Slope to a single section where possible
- this also removes the info messages which are going to be put in a later pull request
@brianlangseth-NOAA
Copy link
Contributor

I will get to this this afternoon

Copy link
Contributor

@brianlangseth-NOAA brianlangseth-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Chantel! The no ages vs no otoliths is a tricky distinction so I appreciate you finding out what was causing the issue. Your current changes work for my quillback application.

However, I found that if the user makes a mistake in their species spelling or the range of years, your new cli abort catch isn't called, and the code gets hung up at a later line. This is my first comment, and is one I think should be fixed. My other comment is not critical, though would keep abort calls consistent across the various surveys, and also avoid saving "empty" rdata files.

R/pull_bio.R Outdated
Comment on lines 130 to 135
if (!(is.data.frame(bio_pull)) & !is.list(bio_pull)) {
cli::cli_abort(
"\n No data returned by the warehouse for the filters given.
\n Make sure the year range is correct (cannot include -Inf or Inf) for the project selected and the input name is correct,
\n otherwise there may be no data for this species from this project.\n
URL: {url_text}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not so sure this will ever be called. I tried on various renditions of quillback triennial, and wrong name spellings and bad years for the other surveys. If its not a data.frame then its a list so this condition never comes out true.

It looks like this is an abort you want to catch issues with name spelling or years. Therefore, I suggest using the following for line 130:

(!(is.data.frame(bio_pull)) & !survey %in% c("Triennial", "AFSC.Slope"))

This allows a bad entry for the Triennial and AFSC Slope to continue (to the next json pull) but catches bad entries for the other surveys that are not pulled later. This also resolves the current situation, where a bad name or year entry for example the NWFSC.Combo survey results in an error when the code reaches line 244 colnames(bio) <- firstup(colnames(bio)) in the event convert = TRUE, and an empty rdata file is saved if convert = FALSE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check works for NWFSC.Combo and NWFSC.Slope surveys since they should return a data frame. We don't want it to stop here for the Triennial or AFSC.Slope surveys if there are no otolith data.

Copy link
Contributor

@brianlangseth-NOAA brianlangseth-NOAA Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that it correctly wont trigger for triennial (or afsc.slope) issues, and it correctly wont trigger when data are pulled for combo and nwfsc.slope surveys, but when data are not pulled for the combo and nwfsc.slope surveys because there are user mistakes, it also wont trigger.

When I try the following I get an error down in line 252 (colnames(bio) <- firstup(colnames(bio))).

test = nwfscSurvey::pull_bio(common_name = 'qback rockfish',
                             survey = "NWFSC.Combo",
                             dir = NULL)

The abort isn't triggering because for this situation, bio_pull is a list and not a data frame.

When I try this but with convert = FALSE, I dont even get an error, but my output and saved rdata file (if I set dir) are blank. I still think this needs to be corrected.

R/pull_bio.R Outdated
}
if (verbose) {
message("Triennial & AFSC Slope data returned as a list: bio_data$length_data and bio_data$age_data\n")
cli::cli_alert_info("Triennial & AFSC Slope data returned as a list: bio$length_data and bio$age_data\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user enters bad years or bad spelling of the common name and no data are produced for the triennial there is no special warning issued like there is for the other surveys. Currently a rdata file is output even though no data are present. Im not so sure we need to account for every situation but since we do this for the other surveys but not for this one, maybe we should add an abort statement here to warn the user and avoid saving an empty rdata file?

IF we did want to do that we could replace if (verbose) (i.e. line 225) with the following

if (verbose & !(is.data.frame(bio$length_data) & is.data.frame(bio$age_data))) {
 cli::cli_abort(
      "\n No data returned by the warehouse for the filters given.
      \n Make sure the year range is correct (cannot include -Inf or Inf) for the project selected and the input name is correct,
      \n otherwise there may be no data for this species from this project.\n
      URL: {url_text}"
    )
} else {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abort message that you are looking for should have been added after the len_pull to include checks of both the bio_pull and len_pull for AFSC.Slope and Triennial data. I have add that. Please check to see if this works appropriately on your end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works appropriately. Good idea putting it up sooner.

@chantelwetzel-noaa
Copy link
Contributor Author

Once you have looked at the new change to deal with the issue you identified, haveconfirmed that it fixes the problem, and there are no other remaining requests, please submit your review approval to this pull request to allowing merging.

@chantelwetzel-noaa
Copy link
Contributor Author

The most recent commit should deal with the issue that you identified.

Copy link
Contributor

@brianlangseth-NOAA brianlangseth-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New changes address my concerns. Thanks Chantel. I am approving

Copy link
Contributor

@brianlangseth-NOAA brianlangseth-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now Im approving

@chantelwetzel-noaa chantelwetzel-noaa merged commit a769a37 into main Sep 3, 2024
8 checks passed
@chantelwetzel-noaa chantelwetzel-noaa deleted the afsc_bio_pull branch September 30, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants