Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disease-Gene pipeline: Identify phenotypes caused by various genes #173

Open
joeflack4 opened this issue Nov 25, 2024 · 0 comments
Open
Assignees
Labels
bug Something isn't working needs discussion omim

Comments

@joeflack4
Copy link
Contributor

joeflack4 commented Nov 25, 2024

Overview

There are some phenotypes in that are caused by various genes, and we generally (always?) do not want to include these in the pipeline; they should not be marked as having causal germline mutation w/ associated gene in morbidmap.txt. The problem is that this information about whether or not they are caused by various genes is not contained in morbidmap.txt or the other data files. It is found in the first paragraph in the "Text" subsection in omim.org/entry pages.

Possible solution

a. Web scrape, and include that paragraph of text in the spreadsheet
b. Web scrape, and maybe look for specific phrases like “various genes”, and toggle a boolean column when such phrases are found.
c. Do nothing, and import these anyway even though we ideally should/would not.

Background

@joeflack4 joeflack4 self-assigned this Nov 25, 2024
@joeflack4 joeflack4 added bug Something isn't working omim needs discussion labels Nov 25, 2024
@joeflack4 joeflack4 assigned twhetzel and unassigned joeflack4 Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs discussion omim
Projects
None yet
Development

No branches or pull requests

2 participants