-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New package concept: parametric conversions #1
Comments
Cc @henningte |
Thanks for spawning this, @billdenney. Currently,
So if loading several units systems at the same time is not a requirement, I think that the current set of features of the units package are enough to implement such a set of requirements. Otherwise, a bit more work will be needed. But regardless of this, a separate package to implement such XMLs would be a nice starting point. We can host it here, of course. |
Thanks for the quick thoughts. I think that loading multiple unit systems at the same time is a requirement (at least it is for my use case of laboratory measurements). My initial thought of the implementation would look something like the following:
Beneath the surface, I was not thinking of trying to load XML. The way that I understand both UDUNITS and My thought was that the unit system table would be a data.frame that looks something like the following:
There are a couple of notable problems that will immediately show up:
Thanks for the offer to host it within |
Calling such packages |
That's a good point that they are not units like the BIPM SI units. They are more accurately described as unit conversions. I suggested the names of When I just did a brief look at the list of R packages, I think that looking to highlight "unit conversion" would likely get people to the right place. So, perhaps |
I think that is a good idea; the linking to "units" will be clear from the package dependency and from when it's source repository is in this GH organisation. |
The basic support for several units systems must be provided by the units package. Otherwise, such a "systems" package would basically need to reimplement units. So "systems" may or may not be required. To try to shape what's needed, I would need a sort but comprehensive example of the set of conversions that would be defined for a couple of analytes as well as the set of operations (within and across analytes). |
Because, @billdenney , when you say "515 units systems", you really mean 515 conversions between some form of unitless parametric quantity, such as moles, to some unit such as grams, right? Because that's not really a units system. ;) |
Also (I'm re-reading the previous discussion in r-quantities/units#134 ): it would be helpful to know your current workflow and what changed our what prompted you to raise this proposal. Because (I'm just thinking out loud here) if your current workflow works but it requires e.g. complicated parsing, that could just be abstracted in a separate package. But if you hit some fundamental limitation, then it would be helpful to know it. |
@Enchufa2, I was thinking that the "systems" package would do two main things (to accomplish the list of goals above described below), and by prevent rework needed to implement the jumps between analytes. I have been using the word "systems" because they are multiple disconnected sets of conversions, therein a "system". It seems like the word "system" is causing issues for this discussion, and I'm happy to choose a different word, but I don't know what that better word would be. My specific, typical workflow is that I receive data with three columns (among many more) where one column is the analyte (e.g. "LDL cholesterol" or "sodium"), one column is the measurement value as a number, and one column is the units. Some of the units may be the same when analytes are different (e.g. "150 mg/dL LDL cholesterol" and "130 mg/dL sodium", both have units of "mg/dL"). I then need to convert both to standard units of "mmol/L", but the conversion for each is different:
My current workflow is always one-off. Nothing has really changed other than the fact that I had some more thoughts about how to generalize the solution better than I previously had considered. My workflow to standardize the units for a measurement is that I look at the dataset and make individual
I then will relatively often use one measurement type and combine it with another. Such as, I may have the concentration of sodium (mg/mL) in urine and the total urine volume for the day (mL/24 h) and multiply them together then convert the units to mmol/day. And, I may have many analytes in the urine where I want to do this (sodium, potassium, glucose, albumin [a protein], etc.). The
I was suggesting that these two features would exist in a separate package because they could be shared across many inherited packages (e.g. (I think that I covered everything you just asked for, but if I missed something, please let me know.) |
I think that the main obstacle to this discussion is that we are merging low-level concepts and implementation details with requirements, and as a result we are going in circles here. Please forget for now about systems, boundaries, vectors, subclasses, and acyclic graphs, and let's talk about the workflow, the high-level interface. The code above is what you do now, so let's define what a better workflow should look like. Then we can assess what's available and what's missing, and what would be the best implementation. Also: I've changed the title because, if we want to generalize this, I think we should be talking about parametric conversions instead of systems. Correct me please if I'm wrong, but every single conversion you are dealing with involves some kind of parametric unit, such as the mole, that requires a different parametrization (e.g. mol/g) for different substances. |
Good point about starting with the requirements. Thanks. My high-level workflow is:
Does that clarify the workflow sufficiently? And yes, "parametric units" are what I'm talking about throughout this discussion. Thanks for helping clarify the terminology. |
Thanks for the clear specification of the workflow. Let's say that such columns are library(<new package>)
set_substance_conversions(<data frame of parametric conversions>)
df |>
mutate(src = set_substance_units(analyte, value, unit)) |>
mutate(new_unit = <specify the destination unit>) |>
mutate(dst = set_substance_units(analyte, src, new_unit)) EDIT: A more specific proposal, maybe more in-line with the units workflow: library(substances)
load_substances_df(<data frame of parametric conversions>)
df |>
mutate(src = set_substances(mixed_units(value, unit), analyte))
mutate(new_unit = <specify the destination unit>) |>
mutate(dst = set_units(src, new_unit, analyte)) # analyte here is optional |
(Thank you for the edit to use the I would hope that the code would be a little simpler than what you suggest:
I dropped the "analyte" from the second call to Another use would be
The methods would also need accessors to the attributes:
The above does not cover the conversion between analytes (e.g. "1 mole carbon dioxide" = "1 mole carbon" and "1 mole carbon dioxide" = "2 moles oxygen"), but that is a much bigger lift to get right and maybe it should not be included at this time. This suggestion is not simply interface-bloat; I do use that type of conversion between analytes. My specific use case for needing to convert between analytes is a medicine and its metabolite. I need to calculate the amount of a medicine that comes out in urine. I receive data like "10 mg simvastatin" was dosed; we measured "10 ng/mL beta-hydroxy simvastatin" in urine 500 mL urine. What fraction of the dose came out in urine as "beta-hydroxy simvastatin"? The process is
I think that would need another method like
|
We have a nice initial specification, so I have transferred the issue to a separate repo (thoughts on the name?). The MVP would be conversions within the same substance, so I'll try to address that first. We can iterate and address cross-substance conversions later. |
I like the name. Let's keep it! :) (I'm also happy to entertain other names; I can't think of a better one right now.) I agree that cross-substance conversions can come later. I wanted to make sure that they were considered throughout the process so that we don't end up making an API that can't work with the concept. The other thing that we should ensure is that we keep the naming as consistent as feasible. In the drafting discussion, we used several terms. I think that the terms we settled on are:
|
I'm tagging several people who had similar questions to the proposal above in case they have additional ideas that may help: |
so now i'm Cc i feel obliged to say something about my workflow: I'm not sure I would really need another package. What would it bring me other than a simpler way of converting (its not that often that I need to do it)? I also had a dataset recently where I had m^2/m^2, but still to very different areas, so the problem is not exclusive to substances. |
My aim here is to specifically Fractions of the same units are a different beast... When you define a parametric conversion (the aim here), you are referring to a single thing (e.g. g/mol or mol/g of atoms of oxygen). But when you define a fraction like g/g, you are referring to two different things. And unfortunately this is much much harder to handle based on a system like UDUNITS (or any units systems out there that I know for that matter, because not even the SI takes these things into consideration). I'll keep that in mind, but in principle, this is not the goal of this package proposal. BTW, @ilikegitlab, could you please comment further on that example of m^2/m^2? |
I think it is a good idea to collect conversions for some often-used parametric units in an own package. I'm not yet sure if I fully understand the scope and the sketched implementation of the planned package and the main reason certainly is that I'm not too familiar with formal definitions of unit systems or parametric units. In any case, I think that defining a naming scheme for the substances and limiting the scope of considered substances are important tasks because otherwise things may get too complicated if one considers the diversity of chemical substances alone (e.g. the same compound (same chemical formula and bonds) but with different charges, etc.). This is another reason why I'm unsure about the scope. One problem that came to my mind was how to avoid automatic conversion if one installs conversions like grams of CO2 to grams of O and grams of H2O to grams of O. Wouldn't it in such a case be possible to create ambiguities, e.g. that it is possible to compute 2 grams of CO2 + 8 grams of H2O = 12 grams of O which may be desired behavior in some cases, but not in others? I noticed that @billdenney said this may be too complicated to consider in a first sketch of the package, but perhaps similar things could happen in other contexts (e.g. conversion of mols of hydrated compounds to mols of water). I'm not sure at all how likely such things are, but if they are, it may be better to force explicit unit conversion. For example, one could do something like this (i.e., provide a table/list library(units)
#> udunits database from C:/Users/henni/AppData/Local/R/win-library/4.3/units/share/udunits/udunits2.xml
# units which need to be installed
install_unit("mol_CO2_")
install_unit("mol_water_") # I got an error that the unit is not defined with install_unit("mol_H2O_")
install_unit("mol_O")
# example for a conversion table holding conversion constants
conversion_constants <-
list(
CO2 =
list(O = units::set_units(2, mol_O/mol_CO2_)),
H2O =
list(O = units::set_units(1, mol_O/mol_water_))
)
# Then nonsense is avoided by default:
units::set_units(1, mol_CO2_) + units::set_units(1, mol_water_)
#> Error: cannot convert mol_water_ into mol_CO2_
# But you can make the conversions explicitly (this can certainly be simplified)
units::set_units(1, mol_CO2_) * conversion_constants$CO2$O + units::set_units(1, mol_water_) * conversion_constants$H2O$O
#> 3 [mol_O] Created on 2023-10-17 with reprex v2.0.2 Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.1 (2023-06-16 ucrt)
#> os Windows 11 x64 (build 22621)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate German_Germany.utf8
#> ctype German_Germany.utf8
#> tz Europe/Berlin
#> date 2023-10-17
#> pandoc 3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1)
#> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1)
#> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.1)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1)
#> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1)
#> htmltools 0.5.6 2023-08-10 [1] CRAN (R 4.3.1)
#> knitr 1.43 2023-05-25 [1] CRAN (R 4.3.1)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1)
#> Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.1)
#> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1)
#> rmarkdown 2.24 2023-08-14 [1] CRAN (R 4.3.1)
#> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)
#> units * 0.8-3 2023-09-06 [1] Github (billdenney/units@d57f54d)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.1)
#> xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
#>
#> [1] C:/Users/henni/AppData/Local/R/win-library/4.3
#> [2] C:/Program Files/R/R-4.3.1/library
#>
#> ────────────────────────────────────────────────────────────────────────────── Just a first thought from my side, in case it is useful. |
@Enchufa2: The aims sound sensible, but I wonder: in the case of grams of oxygen per m^3 of air, or mols of sugar per kg of water (Osmolality) would I not also be essential be referring to two different things? as for m^2/m^2 (or g/kg) this comes up in allometric relationships describing ratios of body parts of plants and animals. You are right they are two different things. Although it may make sense to simplify it to [1], in practice this breaks current math with udunits because: (g leaf)/(g plant) * (m^2 leaf/g leaf) = (m2 leaf)/(g plant) I agree one could go through the trouble of redefining a gleaf and gplant unit, but care should be taken not to make things too rigid or complex because then many people may just drop units at the earliest convenience (I admit I found myself wanting to write a dispense_units(math, reapply="units") method at some point, which I still have managed to avoid!) |
Related to r-quantities/units#134 (and others)
I'm not sure how to best discuss this. In the end, I don't think that it will be part of the
units
library, but I would like to engage both @Enchufa2, @edzer, and others to get this solution right. I hope that you think it's okay to have (or at least start) the discussion here.The issue of units that are often not convertible often comes up as evidenced in r-quantities/units#134 and several other issues linking there. It signals a need to have a method of keeping different unit conversions separated. A typical example is mass-to-moles conversions that happen in many fields. For the data I work with (often laboratory measurements of blood tests), other types of conversions can exist like activity to molar conversions (e.g. conversion of 1 mole per hour of X is means that the concentration of Y is Z moles/L).
To accomplish this, I think that the best method would be the creation of a new package that would enable the following:
Are there other features that should be supported?
The text was updated successfully, but these errors were encountered: