Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize functions that are not vectorized #389

Open
pbchase opened this issue Jun 14, 2024 · 10 comments
Open

Vectorize functions that are not vectorized #389

pbchase opened this issue Jun 14, 2024 · 10 comments

Comments

@pbchase
Copy link

pbchase commented Jun 14, 2024

Please forgive me writing an issue like this, because it is light on detail, but my issue with redcapAPI is the uneven vectorization. I've had this problem multiple times in redcapAPI. Yet, I can't recall where I have seen it save for the last issue I had in February with redcapAPI::renameRecord. I documented that at OuhscBbmc/REDCapR#513.

For any API call, I would like it to accept a vector in a parameter wherever a vector makes sense. OuhscBbmc/REDCapR#513 shows what I think is a reasonable way to do that and a reasonable response from the function. I hope that helps. I'm happy to discuss this in more detail

@pbchase pbchase changed the title Vectorize functions are not vectorized Vectorize functions that are not vectorized Jun 14, 2024
@spgarbet
Copy link
Member

This is a wonderful idea. Our goals the last year was to stabilize the crashing, make it compatible with automated servers and address a lot of security practices. We've hit that goal.

@spgarbet
Copy link
Member

renameRecord(rcon, record_name, new_record_name, arm = NULL, ...) is the interface. It checks at present to have 1 record_name and 1 new_record_name. Should we allow n and n and vectorize? Can rcon be vectorized? Does that make sense? Looks like arm as well should be. The ... is going to the makeApiCall.

@nutterb
Copy link
Collaborator

nutterb commented Jun 17, 2024

I'm going to add some background information to the discussion. In doing so, I want to make sure I don't come across as advocating a hard "no" to this request. I do recognize that there be monsters lurking in these here waters, so figure it is good to lay out why redcapAPI operates the way it does before making decisions about how/whether to change it.

The functions in recapAPI adhere very closely to the structure of the actual API parameters. In the recent updates, we've tried to make a one-to-one mapping of REDCap API argument to redcapAPI function argument. If the REDCap API accepts an array as an argument, redcapAPI accepts a vector. (The only exceptions I can think of to this rule exist in some of the file repository functions).

Using renameRecord as an example, the API only accepts a single value into the record_name, and new_record_name argument. To reframe it in a way, the metric redcapAPI currently uses for "accept a vector in a parameter wherever a vector makes sense" is whether the API itself will accept that vector. For this function, even if we "vectorize" the argument, the internal code still has to manage creating multiple calls to the API.

Again, this isn't to say that vectorizing is a bad idea. Making things just a little bit easier for the end user is a good goal. It just needs some consideration in the design choices.

Some initial questions I would pose are

  1. What should happen if one of the calls to the API fails. Should it terminate at the failure, or catch the error and attempt to proceed with any further elements in the vector?
  2. How should recycling be handled? Should it be permitted?
  3. How should the results received from the API be captured for the user?

Managing vectorization in the meantime

Ultimately, as @pbchase demonstrated in the REDCapR thread, loops and/or apply statements are the way the vectorization needs to be handled. There's a neat little function, Vectorize, that can assist with that and turn single use functions into vectorized functions (it uses mapply internally). I think it would be extremely useful in this case:

library(redcapAPI)
renameRecord_v <- Vectorize(renameRecord, 
                            vectorize.args = c("record_name", 
                                               "new_record_name"))

renameRecord_v(rcon, 
               c("1", "2", "3"), 
               c("001", "002", "003"))

#### OR if using a data frame ####

RecordRenames <- 
  data.frame(old = c("1", "2", "3"), 
             new = c("001", "002", "003"))

renameRecord(rcon, 
             record_name = RecordRenames$old, 
             new_record_name = RecordRenames$new)

and if you like keep your workspace free of local objects and have a vendetta against readable code

library(redcapAPI)
Vectorize(
  renameRecord, 
  vectorize.args = c("record_name", 
                     "new_record_name"))(record_name = c("1, 2, 3"), 
                                         new_record_name = c("001", "002", "003"))

@nutterb
Copy link
Collaborator

nutterb commented Jun 17, 2024

renameRecord(rcon, record_name, new_record_name, arm = NULL, ...) is the interface. It checks at present to have 1 record_name and 1 new_record_name. Should we allow n and n and vectorize? Can rcon be vectorized? Does that make sense? Looks like arm as well should be. The ... is going to the makeApiCall.

My previous comments aside, I think renameRecord is probably one of the best candidates for vectorization. It has no argument validation outside of the length of the arguments, which means it doesn't have to deal with messy issues about validating against anything in the project. Consequently, we aren't going to inflate execution time by making a bunch of validation checks against the API (like we do with exporting records, for example)

  • Vectorize record_name, new_record_name, and arm? Yes
  • Vectorize rcon? My instinct says no. I think I'd be willing to deal with having to make a separate call per project to remove the risk of renaming a record in the wrong project because I miscounted how many rcons I needed (and I would do that a LOT)

My proposal for output would be a three column data frame with record_name, new_record_name, and success, with success being a logical indicating if the rename completed successfully. and perhaps a fourth column with any error message returned from the API?

@nutterb
Copy link
Collaborator

nutterb commented Jun 17, 2024

Other functions that might support vectorization:

  • exportFieldNames
  • exportFiles
  • importFiles
  • deleteFiles
  • exportLogging
  • exportSurveyLink
  • exportSurveyParticipants
  • exportSurveyQueueLink
  • exportSurveyReturnCode

@pbchase
Copy link
Author

pbchase commented Jun 17, 2024

The functions in recapAPI adhere very closely to the structure of the actual API parameters. In the recent updates, we've tried to make a one-to-one mapping of REDCap API argument to redcapAPI function argument. If the REDCap API accepts an array as an argument, redcapAPI accepts a vector.

In my conversations with Will about REDCapR, I have made similar arguments. It makes sense for an R package to reflect the behavior of the REDCap API.

Yet these are R packages. They exist to add value to what is otherwise a curl dialog. They exist in the expectations of R developers as well. R developers expect functions to be vectorized. When I realized I could only rename one record at a time, I was incredulous. My response was a visceral "NO". I have work to do. Thus my purrr-based snippet.

As to how it should behave, I vote it should process every rename it can and report back the successes and failures. That's what the purrr snippet does.

If you want to vectorize but are in the do-nothing-if-anything-might-fail camp, there's some work to do to ensure every rename will work. You might even need to make another API call. You could:

  1. verify each source record_id is unique.
  2. verify each target record_id is unique.
  3. verify the combined set of target and source record_ids is distinct.
  4. verify each source record_id exists in REDCap.

That would give you good odds of success in the rename, but not 100%. I think you'd still need to catch the failure on the Nth rename and report which ones worked and which did not.

If you want to reflect both REDCap-like behavior and a more R-centric behavior, you could add a parameter to allow only unit-length vectors. If you want REDCap-behavior, set that option to TRUE. If you want more vectorized behavior, set it to FALSE.

You could also add a parameter to fail the batch on the first failure.

@spgarbet
Copy link
Member

Really good insight @nutterb. In a recent refactor there is one really gnarlly piece of code dealing with multiple API calls in a loop in a function. It return a list of results of the calls, some could be errors. It was kind of clunky, but given the parameters you've described probably the best.

Maybe a vectorized helper function, 'renameRecords' that returns a list of success / failures. If we go down that road, then we should look for a consistent interface that works with continuations. I don't see the output of renameRecords being that useful in a continuation.

@nutterb
Copy link
Collaborator

nutterb commented Jun 18, 2024

I find myself very uncomfortable with having a renameRecord and also a renameRecords.

My inclination is to continue with renameRecord, adjust the arguments to accept vectors of the same length (for record_name and new_record_name. If arms is not NULL, it must also be of the same length). Then move the API call code into an unexported subroutine. Something like

renameRecord <- function(rcon, record_name, new_record_name, arm, ...){
  ## ARGUMENT VALIDATION ##

  ## PREP OUTPUT OBJECT CONTAINER ##
  output <- data.frame(...) # or whatever
  response <- vector("list", length(record_name))

  for (i in seq_along(record_name)){
    response[[i]] <- .renameRecordApiCall(rcon, record_name[i], new_record_name[i], arm[i], ...)
  }

  ## FINALIZE OUTPUT OBJECT CONTAINER ##
  return(output)
}

.renameRecordApiCall <- function(rcon, record_name, new_record_name, arm, ...) {
  body <- list(content = "record", 
               action = "rename", 
               record = record_name, 
               new_record_name = new_record_name, 
               arm = arm)

  ###################################################################
  # Call the API                                                 ####
  invisible('1' == as.character(makeApiCall(rcon, body, ...))) 
} 

spgarbet added a commit that referenced this issue Oct 21, 2024
@spgarbet
Copy link
Member

Completed and merged.

@spgarbet spgarbet reopened this Oct 21, 2024
@spgarbet
Copy link
Member

(well just renameRecords). It's on the edge version. 2.10.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants