Skip to content

Latest commit

 

History

History
165 lines (137 loc) · 6.6 KB

README.md

File metadata and controls

165 lines (137 loc) · 6.6 KB

shrthnd tidyods package logo

R-CMD-check

Data is often published with shorthand and symbols, and regularly these tags are found in the same container (e.g. a spreadsheet/table cell) as the numeric value. The aim of {shrthnd} is to process character vectors of numerical data that also contain non-numeric shorthand and symbols, and to ensure both pieces of information can be easily retained and worked with.

Installation

{shrthnd} is not yet on CRAN, but binary versions can be installed from R-universe:

install.packages(
  "shrthnd",
  repos = c("https://mattkerlogue.r-universe.dev", "https://cran.r-project.org")
)

You can install the development version of shrthnd like so:

# install.packages("remotes")
remotes::install_github("mattkerlogue/shrthnd")

Usage

Use shrthnd_num() to convert a character vector to a shrthnd_num vector. In effect a shrthnd_num() is a pair of vectors, one numeric and a character vector to store the non-numeric components of the input vector. By default a shrthnd_num() will try to behave as a numeric vector, and can be explicitly coerced into a numeric vector with as.numeric(). You can use shrthnd_tags(), amongst other functions, to interact with the non-numeric (“tag”) component of the input vector. {shrthnd} also provides for the annotation of data.frames, specifically of the tibble::tibble() flavour.

Full usage details are available on the {shrthnd} documentation website.

library(shrthnd)

x <- c("12", "34.567", "[c]", "NA", "56.78 [e]", "78.9", "90.123[e]", 
       "321.09*", "987.564 \u2021", ".", "..")

sh_x <- shrthnd_num(x)

sh_x
#> <shrthnd_num[11]>
#>  [1]  12.00      34.57         NA [c]     NA      56.78 [e]  78.90    
#>  [7]  90.12 [e] 321.09 *   987.56 ‡       NA .       NA ..

shrthnd_list(sh_x)
#> <shrthnd_list[6]>
#> [c] (1 location): 3 
#> [e] (2 locations): 5, 7 
#> * (1 location): 8 
#> ‡ (1 location): 9 
#> . (1 location): 10 
#> .. (1 location): 11

tbl <- tibble::tibble(
  x = x,
  sh_x = sh_x,
  as_num = as.numeric(sh_x), 
  as_char = as.character(sh_x),
  tag = shrthnd_tags(sh_x), 
  as_shrthnd = as_shrthnd(sh_x), 
  as_shrthnd2 = as_shrthnd(sh_x, digits = 3)
)

tbl
#> # A tibble: 11 × 7
#>    x               sh_x as_num as_char tag   as_shrthnd as_shrthnd2
#>    <chr>       <sh_dbl>  <dbl> <chr>   <chr> <chr>      <chr>      
#>  1 12         12.00       12   12      <NA>  12.00      12.000     
#>  2 34.567     34.57       34.6 34.567  <NA>  34.57      34.567     
#>  3 [c]           NA [c]   NA   <NA>    [c]   NA [c]     NA [c]     
#>  4 NA            NA       NA   <NA>    <NA>  NA         NA         
#>  5 56.78 [e]  56.78 [e]   56.8 56.78   [e]   56.78 [e]  56.780 [e] 
#>  6 78.9       78.90       78.9 78.9    <NA>  78.90      78.900     
#>  7 90.123[e]  90.12 [e]   90.1 90.123  [e]   90.12 [e]  90.123 [e] 
#>  8 321.09*   321.09 *    321.  321.09  *     321.09 *   321.090 *  
#>  9 987.564 ‡ 987.56 ‡    988.  987.564 ‡     987.56 ‡   987.564 ‡  
#> 10 .             NA .     NA   <NA>    .     NA .       NA .       
#> 11 ..            NA ..    NA   <NA>    ..    NA ..      NA ..

sh_tbl <- shrthnd_tbl(
  tbl,
  title = "Example table",
  notes = c("Note 1", "Note 2"),
  source_note = "Shrthnd documentation, 2023"
)

sh_tbl
#> # Title:    Example table
#> # A tibble: 11 × 7
#>    x               sh_x as_num as_char tag   as_shrthnd as_shrthnd2
#>    <chr>       <sh_dbl>  <dbl> <chr>   <chr> <chr>      <chr>      
#>  1 12         12.00       12   12      <NA>  12.00      12.000     
#>  2 34.567     34.57       34.6 34.567  <NA>  34.57      34.567     
#>  3 [c]           NA [c]   NA   <NA>    [c]   NA [c]     NA [c]     
#>  4 NA            NA       NA   <NA>    <NA>  NA         NA         
#>  5 56.78 [e]  56.78 [e]   56.8 56.78   [e]   56.78 [e]  56.780 [e] 
#>  6 78.9       78.90       78.9 78.9    <NA>  78.90      78.900     
#>  7 90.123[e]  90.12 [e]   90.1 90.123  [e]   90.12 [e]  90.123 [e] 
#>  8 321.09*   321.09 *    321.  321.09  *     321.09 *   321.090 *  
#>  9 987.564 ‡ 987.56 ‡    988.  987.564 ‡     987.56 ‡   987.564 ‡  
#> 10 .             NA .     NA   <NA>    .     NA .       NA .       
#> 11 ..            NA ..    NA   <NA>    ..    NA ..      NA ..      
#> # ☰ Source: Shrthnd documentation, 2023
#> # ☰ There are 2 notes, use `annotations(x)` to view

annotations(sh_tbl)
#> ── Notes for `sh_tbl` ──────────────────────────────────────────────────────────
#> Title: Example table
#> Source: Shrthnd documentation, 2023
#> Notes:
#> • Note 1
#> • Note 2

Philosophy

Datasets, especially statistical data published by governments, international institutions and academia, often comes with symbols and markers to provide further details about the values: that a value is estimated, the reason for why a value is missing, or that a value has a given statistical significance level.

The most common approach to processing data that contains both numeric and non-numeric components is to scrub the non-numeric content, so that the input can be coerced into a numeric vector. However, this non-numeric content (“tags”) often convey useful information that it might be useful to retain. If you want to access this non-numeric content, you may need to re-import your dataset or change your processing. This creates opportunity for error and, critically, de-linking the numeric and non-numeric components. The shrthnd_num() data type builds on vctrs::new_rcrd() to separate, but keep linked, these numeric and non-numeric components of a vector.

Logo

The {shrthnd} package logo is a combination of the word “shorthand” written in Pitman shorthand alongside an asterisk. The image was drawn by hand with plot points then adjusted for plotting in {ggplot2}. The “shorthand” shape is based on the representation in Arthur Reynold’s Pitman’s English and Shorthand Dictionary, retrieved from the Internet Archive on 2023-05-11.