-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ancestral vs. derived allele #585
Comments
I'd also suggest |
I think you can safely assume that for a given dataset, there is only ever 1 ancestral allele for a site (or it is unknown). edit - A more sophisticated version could have a probability of each allele being the ancestral one, but I think that's needless complexity for most people. |
I'd vote for a variant of @eric-czech's suggestion above: |
I think this is addressed now in SGkit, isn't it @benjeffery ? So I can close this? |
There is no explicit support, and nothing has been added to the data model as far as I'm aware. I think this stays open for now. |
Motivated by @hyanwong at https://github.com/pystatgen/sgkit/discussions/580
Some popgen methods need to know which allele at each site is the ancestral allele (cf. https://biology.stackexchange.com/questions/19159/ancestral-allele-explanation).
We should augment our data model to optionally store allele state.
A verbose approach would be to add a Boolean data variable named
variant_allele_ancestral_state
along the(variants, alleles)
dimensions that would beTrue
when the allele is ancestral andFalse
otherwise.Given that the
alleles
dimension has a fixed length which is often > 2, I suppose it’s a design decision to determine how to distinguish a derived allele from a null allele at a site with fewer alleles than the length of thealleles
dimension.This also makes me think we should rename the
variants
dimension tosites
, but that can be addressed in a separate issue.The text was updated successfully, but these errors were encountered: