Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensembl release 109 seq_region table needs repair #21

Closed
dhimmel opened this issue Jul 17, 2023 · 3 comments
Closed

Ensembl release 109 seq_region table needs repair #21

dhimmel opened this issue Jul 17, 2023 · 3 comments

Comments

@dhimmel
Copy link
Member

dhimmel commented Jul 17, 2023

When running ensembl_genes datasets --release=109, I'm getting the following error:

DatabaseError: (mysql.connector.errors.DatabaseError) 1194 (HY000): Table 'seq_region' is marked as crashed and should be 
repaired

This error occurred for when connecting to mysql+mysqlconnector://[email protected]:3306/homo_sapiens_core_109_38. See query causing error below:

Expand for query
SELECT
  gene.stable_id AS ensembl_gene_id,
  gene.version AS ensembl_gene_version,
  -- gene symbol methods https://github.com/cogent3/ensembldb3/issues/7
  -- Release 104 retired clone-based gene symbols,
  -- leading to ensembl genes without a symbol. Fill with the stable ID,
  -- as per https://www.ensembl.info/2021/03/15/retirement-of-clone-based-gene-names/
  COALESCE(xref.display_label, gene.stable_id) AS gene_symbol,
  external_db.db_name AS gene_symbol_source_db,
  xref.dbprimary_acc AS gene_symbol_source_id,
  gene.biotype AS gene_biotype,
  gene.description AS gene_description,
  gene.source AS ensembl_source,
  gene.created_date AS ensembl_created_date,
  gene.modified_date AS ensembl_modified_date,
  coord_system.version AS coord_system_version,
  coord_system.name AS coord_system,
  -- get chromosome: refs internal Related Sciences issue 606.
  CASE WHEN coord_system.name = "chromosome"
       THEN COALESCE(exc_seq_region.name, seq_region.name)
       END AS chromosome,
  assembly_exception.exc_type AS seq_region_exc_type,
  seq_region.name AS seq_region,
  gene.seq_region_start AS seq_region_start,
  gene.seq_region_end AS seq_region_end,
  gene.seq_region_strand AS seq_region_strand,
  assembly_exception.exc_seq_region_id IS NULL AS primary_assembly
FROM gene
LEFT JOIN xref ON xref.xref_id = gene.display_xref_id
LEFT JOIN external_db ON xref.external_db_id = external_db.external_db_id
LEFT JOIN seq_region ON gene.seq_region_id = seq_region.seq_region_id
LEFT JOIN coord_system ON seq_region.coord_system_id = coord_system.coord_system_id
LEFT JOIN assembly_exception ON seq_region.seq_region_id = assembly_exception.seq_region_id
  -- keep exc_type in (PATCH_FIX, PATCH_NOVEL, HAP)
  -- refs internal Related Sciences issue 606.
  AND NOT assembly_exception.exc_type <=> "PAR"
LEFT JOIN seq_region AS exc_seq_region ON assembly_exception.exc_seq_region_id = exc_seq_region.seq_region_id
WHERE
  -- all genes were current when query was written, ensure this is always the case
  gene.is_current AND
  -- refs internal Related Sciences issue 289.
  gene.biotype != "LRG_gene"
ORDER BY ensembl_gene_id

I believe this is an upstream issue entirely out of our hands, but wanted to document and report it.

@jgtate
Copy link

jgtate commented Jul 18, 2023

I can confirm that this was an issue with Ensembl, @dhimmel, caused by a background process that was running over on MySQL server. We're not entirely sure whether the process was responsible for actually crashing the table or if it was just giving that appearance, but we're chasing it down with our DBAs. As of right now the seq_region table seems to be fixed and usable. Please let us know via the Ensembl website if you still see problems though.

@dhimmel
Copy link
Member Author

dhimmel commented Jul 18, 2023

Awesome! Thanks for the info @jgtate. Confirming that we're no longer getting this error so the table is healthy.

Sounds like if we see this error in the future on other tables, it might be worth waiting a bit for it to automatically resolve if its due to an ongoing background process.

@dhimmel dhimmel closed this as completed Jul 18, 2023
@jgtate
Copy link

jgtate commented Jul 18, 2023

We'll look at moving this step so it doesn't happen at this point in the release process – we've not seen this behaviour before but it's something we should be able to avoid by not running it too close to the release. As a rule of thumb, however, things can be a bit rocky on release day itself. If you see issues like this it's worth waiting 24 hours if you can, then trying again. If it's still broken at that point by all means let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants