Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate detection & export of new ensembl releases #1

Closed
dhimmel opened this issue Oct 11, 2021 · 5 comments
Closed

Automate detection & export of new ensembl releases #1

dhimmel opened this issue Oct 11, 2021 · 5 comments

Comments

@dhimmel
Copy link
Member

dhimmel commented Oct 11, 2021

@cthoyt tweeted:

Why not automate even further? Have it check on a daily basis if Ensembl has been updated since the last release of your artifacts so even if you don’t personally manage this anymore, it can continue on. I was thinking about this a lot lately and have been accumulating scripts for checking database versions in https://github.com/biopragmatics/bioversions. I just added one for ensembl, feel free to rely on that package or deconstruct the parts that are important and include directly in your source

This is a great idea and would reduce future maintenance. Happy to use bioversions for this.

We will need to detect if an output already exists. Should be able to do this by looking at the git branches.

Sometimes exports will fail, for example if a release changes the schema. These changes take a non-trivial amount of effort to fix. For this reason I lean towards weekly scheduled jobs, so when this is failing it becomes a weekly and not daily annoyance.

@cthoyt
Copy link

cthoyt commented Oct 11, 2021

@dhimmel thanks for making an issue. I would have done so myself but I was on the run when I tweeted at you. Here's a little more context:

The code you'd need after doing pip install bioversions is:

import bioversions

ensembl_version = bioversions.get_version("ensembl")

This code executes a live request to the Ensembl website and does some HTML parsing/traversal to pick out the version number. This actually runs on a nightly build (along with all of the other version getter functions in Bioversions) that writes to a YAML file on the Bioversions GitHub repository, so you can use this alternative code that doesn't actually rely on Bioversions as a Python dependency:

import requests
import yaml

url = "https://raw.githubusercontent.com/biopragmatics/bioversions/main/docs/_data/versions.yml"
res = requests.get(url)
res_yaml = yaml.safe_load(res.text)
versions = {
    entry["prefix"]: entry["releases"][-1]["version"]
    for entry in res_yaml["database"]
    if "prefix" in entry
}
ensembl_version = versions["ensembl"]

@cthoyt
Copy link

cthoyt commented Oct 11, 2021

Note: I forgot that the single source of truth for the daily updated data is natively stored in JSON at https://raw.githubusercontent.com/biopragmatics/bioversions/main/src/bioversions/resources/versions.json. A better way, that doesn't rely on a YAML parser would be:

import requests

url = "https://raw.githubusercontent.com/biopragmatics/bioversions/main/src/bioversions/resources/versions.json"
res_json = requests.get(url).json()
versions = {
    entry["prefix"]: entry["releases"][-1]["version"]
    for entry in res_json["database"]
    if "prefix" in entry
}
ensembl_version = versions["ensembl"]

dhimmel added a commit that referenced this issue Oct 11, 2021
@dhimmel
Copy link
Member Author

dhimmel commented Oct 11, 2021

#3 added the JSON request approach to get the latest version. Still haven't created the scheduled CI builds. Slightly dependence on #2

dhimmel added a commit that referenced this issue Oct 12, 2021
dhimmel added a commit that referenced this issue Oct 12, 2021
dhimmel added a commit that referenced this issue Oct 12, 2021
@dhimmel
Copy link
Member Author

dhimmel commented Oct 12, 2021

Okay I added scheduled export builds in b75c893 along with an overwrite option for whether to re-export if an output branch exists.

Both scheduled and dispatch jobs now default to overwrite=false. Must set overwrite=true on an dispatch to overwrite.

@dhimmel dhimmel closed this as completed Oct 12, 2021
dhimmel added a commit that referenced this issue Oct 12, 2021
@dhimmel
Copy link
Member Author

dhimmel commented Oct 12, 2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants