Remove dataset wrappers in prototype core.py #39

eric-czech · 2020-06-16T15:43:47Z

We would like to move towards an API of only functions that act on or create Xarray datasets. The wrapper classes in core.py should be removed and the conversion functions in them moved elsewhere.

A few problems that the wrappers were, at least in part, intended to solve are:

What conventions should I/O readers adhere to when building datasets?
- Should they have default coordinates? This helps facilitate indexing/selecting data but it could be left up to the user.
- Should the strategy for representing missing values in floating point data be the same as the strategy from other readers of integer types? If we go the scikit-allele sentinel + boolean mask route then probably not but if we use masked Dask arrays then it likely makes more sense for the reader to be responsible for creating them.
- How should we represent phased genotypes? As far as I've seen, phasing could be specific to only variants, variants + samples, or an entire dataset so it may make sense for readers to return a 1D array, a 2D array, or global attributes (whatever is most appropriate).
How do we assert dimensions and dtypes on datasets? Maybe we shouldn't do this at all, or there could be a functions to do this at the beginning of method functions (like scikit-learn).
How do we standardize naming conventions for fields like contig, pos, alleles, GT, etc.?
How do we make it clear which kinds of datasets can be converted to others? It is probably best to have functions for things like computing dosages, hard calls, GWAS encodings, allele counts etc. that take arrays/datasets and leave it up to users not to pass them anything that doesn't make sense.

The text was updated successfully, but these errors were encountered:

eric-czech added the pydata prototype label Jun 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove dataset wrappers in prototype core.py #39

Remove dataset wrappers in prototype core.py #39

eric-czech commented Jun 16, 2020

Remove dataset wrappers in prototype core.py #39

Remove dataset wrappers in prototype core.py #39

Comments

eric-czech commented Jun 16, 2020