Meta Issue #4

pbchase · 2025-01-02T15:50:45Z

Add functions to handle each of the REDCap text validation rules. REDCap data structures that enumerate all of the field types and text validation types are available in the REDCapR source code

Each handler function will need to return values that conform to the constraints of that validation type. Some validation types use the ancillary columns text_validation_min and text_validation_max that impose additional restrictions.

These fields need to be implemented first due to the needs of the project funding this work:

date_dmy
date_mdy
datetime_mdy
email
integer
number
phone
zipcode

Set default mean and sd

We must create at least a few handlers for text_validation_types to set reasonable default values for the functions that generate the field values. The handlers would be called by get_long_text_fields.R to set things like the mean and sd used to generate a numeric value.

Using mean and sd for date and date time fields

You could use a similar approach with date values with code like this:

# sd will be interpreted as days if the mean is a date
> as.Date(rnorm(5, mean=as.Date("2024-01-01"), sd=10))
[1] "2023-12-31" "2024-01-04" "2024-01-05" "2023-12-27" "2023-12-16"

With date-time values, the code is only slightly different:

# sd will be interpreted as seconds if the mean is a date-time object
> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01 12:00:00"), sd=3600))
[1] "2024-01-01 11:22:37 EST" "2024-01-01 11:27:13 EST" "2024-01-01 10:18:27 EST" "2024-01-01 12:03:41 EST" "2024-01-01 10:46:09 EST"

If you want a standard deviation more on the order of hours, days, or months, you'll likely need to do some math to set that large standard deviation in seconds:

> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01"), sd=10)) - as.POSIXct("2024-01-01")
Time differences in secs
[1]   0.8020122  11.3860621  -3.8754942 -14.2432177   6.3798811

> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01"), sd=3600)) - as.POSIXct("2024-01-01")
Time differences in mins
[1]   10.84744 -136.31071   13.12103   11.29350  -39.30224

> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01"), sd=3600*24)) - as.POSIXct("2024-01-01")
Time differences in hours
[1]   9.163598  44.022576 -30.236844  -8.684229  17.778474

> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01"), sd=3600*24*31)) - as.POSIXct("2024-01-01")
Time differences in days
[1] -18.95833  34.61146  -1.75344  43.80534 -10.16231

> as.POSIXct(rnorm(5, mean=as.POSIXct("2024-01-01"), sd=3600*24*31*12)) - as.POSIXct("2024-01-01")
Time differences in days
[1]  415.7454 -748.6212  411.1701  410.5675 -361.2164

Using mean and sd for time fields

The mean and sd approach could also work for time fields. Here's an example using the hms package to create a time difference object:

# install.packages("hms")
hms::as_hms(rnorm(5, mean=hms::as_hms("12:00:00"), sd=3600))
11:30:51.821478
12:26:59.494692
11:44:23.214369
11:59:06.512810
11:51:20.761300

Picking values for mean and sd

When we pick values for mean and sd, the only guides we have for reasonable values are any ranges applied to the field. Where we see ranges we should set mean = mean(max, min). We could set sd = (max - min)/6.

The text was updated successfully, but these errors were encountered:

pbchase added this to the Handle all text validation types milestone Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta Issue #4

Meta Issue #4

pbchase commented Jan 2, 2025 •

edited

Loading

Meta Issue #4

Meta Issue #4

Comments

pbchase commented Jan 2, 2025 • edited Loading

Set default mean and sd

Using mean and sd for date and date time fields

Using mean and sd for time fields

Picking values for mean and sd

pbchase commented Jan 2, 2025 •

edited

Loading