You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to move towards an API of only functions that act on or create Xarray datasets. The wrapper classes in core.py should be removed and the conversion functions in them moved elsewhere.
A few problems that the wrappers were, at least in part, intended to solve are:
What conventions should I/O readers adhere to when building datasets?
Should they have default coordinates? This helps facilitate indexing/selecting data but it could be left up to the user.
Should the strategy for representing missing values in floating point data be the same as the strategy from other readers of integer types? If we go the scikit-allele sentinel + boolean mask route then probably not but if we use masked Dask arrays then it likely makes more sense for the reader to be responsible for creating them.
How should we represent phased genotypes? As far as I've seen, phasing could be specific to only variants, variants + samples, or an entire dataset so it may make sense for readers to return a 1D array, a 2D array, or global attributes (whatever is most appropriate).
How do we assert dimensions and dtypes on datasets? Maybe we shouldn't do this at all, or there could be a functions to do this at the beginning of method functions (like scikit-learn).
How do we standardize naming conventions for fields like contig, pos, alleles, GT, etc.?
How do we make it clear which kinds of datasets can be converted to others? It is probably best to have functions for things like computing dosages, hard calls, GWAS encodings, allele counts etc. that take arrays/datasets and leave it up to users not to pass them anything that doesn't make sense.
The text was updated successfully, but these errors were encountered:
We would like to move towards an API of only functions that act on or create Xarray datasets. The wrapper classes in
core.py
should be removed and the conversion functions in them moved elsewhere.A few problems that the wrappers were, at least in part, intended to solve are:
contig
,pos
,alleles
,GT
, etc.?The text was updated successfully, but these errors were encountered: