Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta Issue #4

Open
pbchase opened this issue Jan 2, 2025 · 0 comments
Open

Meta Issue #4

pbchase opened this issue Jan 2, 2025 · 0 comments

Comments

@pbchase
Copy link
Contributor

pbchase commented Jan 2, 2025

Add functions to handle each of the REDCap text validation rules. REDCap data structures that enumerate all of the field types and text validation types are available in the REDCapR source code

Each handler function will need to return values that conform to the constraints of that validation type. Some validation types use the ancillary columns text_validation_min and text_validation_max that impose additional restrictions.

These fields need to be implemented first due to the needs of the project funding this work:

  • date_dmy
  • date_mdy
  • datetime_mdy
  • email
  • integer
  • number
  • phone
  • zipcode

Set default mean and sd

We must create at least a few handlers for text_validation_types to set reasonable default values for the functions that generate the field values. The handlers would be called by get_long_text_fields.R to set things like the mean and sd used to generate a numeric value.

Using mean and sd for date and date time fields

You could use a similar approach with date values with code like this:

# sd will be interpreted as days if the mean is a date
> as.Date(rnorm(5, mean=as.Date("2024-01-01"), sd=10))
[1] "2023-12-31" "2024-01-04" "2024-01-05" "2023-12-27" "2023-12-16"

With date-time values, the code is only slightly different:

# sd will be interpreted as seconds if the mean is a date-time object
> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01 12:00:00"), sd=3600))
[1] "2024-01-01 11:22:37 EST" "2024-01-01 11:27:13 EST" "2024-01-01 10:18:27 EST" "2024-01-01 12:03:41 EST" "2024-01-01 10:46:09 EST"

If you want a standard deviation more on the order of hours, days, or months, you'll likely need to do some math to set that large standard deviation in seconds:

> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01"), sd=10)) - as.POSIXct("2024-01-01")
Time differences in secs
[1]   0.8020122  11.3860621  -3.8754942 -14.2432177   6.3798811

> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01"), sd=3600)) - as.POSIXct("2024-01-01")
Time differences in mins
[1]   10.84744 -136.31071   13.12103   11.29350  -39.30224

> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01"), sd=3600*24)) - as.POSIXct("2024-01-01")
Time differences in hours
[1]   9.163598  44.022576 -30.236844  -8.684229  17.778474

> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01"), sd=3600*24*31)) - as.POSIXct("2024-01-01")
Time differences in days
[1] -18.95833  34.61146  -1.75344  43.80534 -10.16231

> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01"), sd=3600*24*31*12)) - as.POSIXct("2024-01-01")
Time differences in days
[1]  415.7454 -748.6212  411.1701  410.5675 -361.2164

Using mean and sd for time fields

The mean and sd approach could also work for time fields. Here's an example using the hms package to create a time difference object:

# install.packages("hms")
hms::as_hms(rnorm(5, mean=hms::as_hms("12:00:00"), sd=3600))
11:30:51.821478
12:26:59.494692
11:44:23.214369
11:59:06.512810
11:51:20.761300

Picking values for mean and sd

When we pick values for mean and sd, the only guides we have for reasonable values are any ranges applied to the field. Where we see ranges we should set mean = mean(max, min). We could set sd = (max - min)/6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant