-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to create or find "genes.refGene" file for hg19 and hg38 #39
Comments
@anopperl Could you post the code you are trying to run that caused the error shown above? Might help in coming up with a solution. Thanks. |
There is already a genes.refGene in the directory pyhgvs/data of this repository. It is simply old but working. "genes.refGene" is not directly available at UCSC. However, you should be able to created one with latest tables from UCSC database. Refer #26 for an example for hg19. For hg38, you simply need to replace hg19 by hg38 in the example command. (Or download the refGene.txt.gz from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz) |
I used the following command to get the "genes.refGene" file, run the Example usage, still still have the above error 。 |
This is a dupe, see other issue for scripts: #26 (comment) |
I've made a Python package that provides ~800k transcripts (both RefSeq and Ensembl) for PyHGVS You can either download a JSON.gz file, or use a REST service. To use it:
|
Hi,@davmlaw, I want to get HGVS cdot from REF/ALT using pyhgvs(not parse hgvs string), how could I use the cdot package to help me? Could you show me some example of scripts? Thanks |
First, install cdot:
Then get the cdot data:
This is based off the pyhgvs README - basically we need to load cdot data files to populate the pyhgvs "transcript" record (they wrote their own get_transcript() method in the README)
|
@davmlaw Thank you very much! And I wonder is there any way to get all the relevant transcripts based on chrom and offset? |
The easiest way to do that is probably using bedtools on a GTF then pulling out transcripts from the cut down GTF Another way would be to use another Python library for doing genomic type stuff that I wrote called pyreference untested code would be:
|
@davmlaw Hi, your work is great! I used your package and pyhgvs to generate HGVS name. Then, I used another package: hgvs to parse the variant, but it raised HGVSParserError: |
Hi, HGVS is very broad spec and each library only parses a subset of it. I think the issue is the 12 at the end (this is redundant, and you could work it out from the difference between positions) you need to regenerate them without the numbers or strip them off |
how to create or find "genes.refGene" file for hg19, hg38.
i have got "genes.refGene" file from USSC but these are not working for my case
error shows :
Traceback (most recent call last):
File "first_py.py", line 38, in
hgvs_name, genome, get_transcript=get_transcript)
File "build/bdist.linux-x86_64/egg/pyhgvs/init.py", line 1356, in parse_hgvs_name
ValueError: transcript is required
The text was updated successfully, but these errors were encountered: