Use Snakemake to build transcripts #70

davmlaw · 2024-02-19T21:49:13Z

At the moment we have file existence tests instead of proper dependency management

davmlaw · 2024-03-07T08:53:12Z

Would be good to automate uploading releases as this is pretty tedious, could do:

gh release create <tag> --title "<release title>" --notes "<release notes>"

gh release upload <tag> <path/to/your/files/*>

davmlaw · 2024-03-13T01:01:10Z

Made a script "generate_transcript_data/github_release_upload.sh" which makes a release easier

davmlaw · 2024-08-30T00:43:44Z

Looking at the bash scripts, a lot of the complexity is due to looping over URLs and dealing with RefSeq URLs having identical file names, eg:

"https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/105.20190906/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz"
"https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/105.20201022/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz"
"https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/9606/105.20220307/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.gff.gz"

So it's not so easy to just download it and carry on. I think with SnakeMake we should just explicitly list everything out in YAML files, and use that config to run a pipeline common between everything

We could make urls a dictionary, and then have the "nice name" for it as a key. That would allow us to move code into config which would be a lot nicer

…names (handle RefSeq's duplicated filenames)

davmlaw · 2024-08-30T07:51:23Z

ok, I have started on this (in generate_transcript_data)

I wanted to run the code with different config files, but couldn't work out a way to do it. I think SnakeMake seems to only want 1 config file. I thus combined everything in "config/*.yaml" into "cdot_transcripts.yaml"

having an issue at the moment with ambiguous rules for downloading files

davmlaw · 2024-09-03T04:47:36Z

@tedil @holtgrewe - I've finished v1 of the SnakeMake pipeline - if you could check it out as it's the first one I ever wrote:

https://github.com/SACGF/cdot/blob/main/generate_transcript_data/Snakefile
https://github.com/SACGF/cdot/blob/main/generate_transcript_data/cdot_transcripts.yaml

Happy to hear feedback / if I should have structured it a different way etc.

tedil · 2024-09-03T11:35:38Z

Great, thank you! I will have a look when I am back from vacation

davmlaw · 2024-09-03T23:44:37Z

Sure, no hurry, enjoy your time off

davmlaw added a commit that referenced this issue Mar 13, 2024

issue #70 - made script to upload release data

40d9423

davmlaw added a commit that referenced this issue Aug 29, 2024

issue #70 - start of SnakeMake pipeline

78a0342

davmlaw added a commit that referenced this issue Aug 30, 2024

issue #70 - snakemake - make urls dictionaries so we can give unique …

380b102

…names (handle RefSeq's duplicated filenames)

davmlaw added a commit that referenced this issue Aug 30, 2024

issue #70 - SnakeMake

1fece6e

davmlaw added a commit that referenced this issue Aug 30, 2024

issue #70 - SnakeMake

39c139d

davmlaw added a commit that referenced this issue Sep 2, 2024

issue #70 - SnakeMake

f8cac4e

davmlaw added a commit that referenced this issue Sep 3, 2024

issue #70 - SnakeMake - remove old unused SnakeMake files

d8a21ef

davmlaw added a commit that referenced this issue Sep 3, 2024

issue #70 - remove debug logging

55ffefa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Snakemake to build transcripts #70

Use Snakemake to build transcripts #70

davmlaw commented Feb 19, 2024

davmlaw commented Mar 7, 2024

davmlaw commented Mar 13, 2024

davmlaw commented Aug 30, 2024

davmlaw commented Aug 30, 2024

davmlaw commented Sep 3, 2024

tedil commented Sep 3, 2024

davmlaw commented Sep 3, 2024

Use Snakemake to build transcripts #70

Use Snakemake to build transcripts #70

Comments

davmlaw commented Feb 19, 2024

davmlaw commented Mar 7, 2024

davmlaw commented Mar 13, 2024

davmlaw commented Aug 30, 2024

davmlaw commented Aug 30, 2024

davmlaw commented Sep 3, 2024

tedil commented Sep 3, 2024

davmlaw commented Sep 3, 2024