Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyData prototype genetics method implementations #30

Open
7 of 9 tasks
eric-czech opened this issue May 18, 2020 · 0 comments
Open
7 of 9 tasks

PyData prototype genetics method implementations #30

eric-czech opened this issue May 18, 2020 · 0 comments

Comments

@eric-czech
Copy link
Collaborator

eric-czech commented May 18, 2020

This issue tracks progress on method implementations with a focus on those mentioned in #16 (or things needed by these methods).

Progress:

  • axis_intervals
  • maximal_independent_set
    • Somewhat similar to hail.maximal_independent_set but with chromosome partitioned sequential algorithm for compatibility with PLINK/skallel results
      • My rationale for doing this instead is that users would be less skeptical if results were identical to other tools, rather than rolling a more scalable but less credible heuristic for this from the start
  • ld_matrix
    • Very similar to hail.ld_matrix
    • There are CPU and GPU implementations for this now
  • ld_prune (PyData prototype LD prune implementation #26)
  • GRM/RRM
    • GRM
      • Center variants (in rows) by subtracting nanmean
      • Divide by binomial variance for variant under HWE (Patterson 2006)
      • Compute XX^t (for X as n_samples by n_variants)
    • RRM
      • Same as GRM except that empirical variance is used rather than binomial variance under HWE
      • This IS pearson correlation up to a constant factor
    • This gist has GRM calc, which uses same scaling as default preprocessor to PCA in scikit-allel
  • HWE (https://github.com/pystatgen/sgkit/pull/76)
    • For axis reductions in variant/sample QC as well as PCA normalization
  • PC-Relate (PyData PC-Relate Integration #35)
  • PCA (https://github.com/pystatgen/sgkit/pull/262)
  • LMM and LFM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant