Skip to content

Releases: cov-lineages/hedgehog

hedgehog v1.6

16 Oct 14:06
Compare
Choose a tag to compare

Release notes

  • Update to lineage set rf model for pango-designation version 1.23
  • Note, only lineages with >5 unique sequences have been added to the training for spike
  • Updated precision calculation code that increases the precision call for complex recombinant lineage sets

hedgehog v1.5.1

12 Oct 13:31
06e1630
Compare
Choose a tag to compare

Release notes

Update to MRCA code for precision set description to handle complex recombinant set composition.

Pango designation release v1.22.

hedgehog v1.5

25 Aug 11:09
Compare
Choose a tag to compare

Release notes

  • Update to lineage set rf model for pango-designation version 1.22
  • Note, only lineages with >5 unique sequences have been added to the training for spike, meaning lineage BA.2.86 was not included in the training at this time. Will update model again when more sequences have come through to GISAID.

hedgehog v1.4.2

17 Aug 10:42
Compare
Choose a tag to compare

Release notes

  • Fix for set designation key error message from private communication
  • Update to mrca code to check whether to expand simple cases and, if no expansion, check at end whether alias needs expanding

hedgehog v1.4.1

15 Aug 13:37
b26d43d
Compare
Choose a tag to compare

Release notes

  • A number of major modifications to training pipeline producing the machine learning model
  • Return to random forest classifier with new parameters num_estimators=12, max_features=0.05 and min_samples_split=10

Curation description

  • Match designations with gisaid
  • Extract spike sequences from whole genomes
  • Identify spike sequences with no ambiguity
  • Call spike variants (SNPs, insertions and deletions)
  • Get variant counts per lineage
  • Calculate which variants occur at a frequency of 0.60 per lineage
  • Merge lineages into sets based on overlapping mutation thresholds
  • Calculate set names and precision (updated code, now calculates recombinant lineage precision accurately)
  • Translate amino acid mutations to nucleotide positions
  • If any minor mutations occur at a position that conflicts with a consensus spike haplotype for another lineage set, mask it out
  • Create a sequence hash to only supply unique sequences to the model for training
  • If any sequences contain fewer than 60% of the CSH mutations for a given lineage set, do not put them forward for training
  • After all the filtering steps and sequence hashing, remove any lineage sets from training that have less than 5 representative sequences
  • Run random forest training on final set of lineage sets

hedgehog v1.4

14 Aug 15:14
Compare
Choose a tag to compare

Release notes

  • A number of major modifications to training pipeline producing the machine learning model
  • Return to decision tree classifier as a temporary fix for solving random forest misassignments

Curation description

  • Match designations with gisaid
  • Extract spike sequences from whole genomes
  • Identify spike sequences with no ambiguity
  • Call spike variants (SNPs, insertions and deletions)
  • Get variant counts per lineage
  • Calculate with variants occur at a frequency of 0.60 per lineage
  • Merge lineages into sets based on overlapping mutation thresholds
  • Calculate set names and precision (updated code, now calculates recombinant lineage precision accurately)
  • Translate amino acid mutations to nucleotide positions
  • If any minor mutations occur at a position that conflicts with a consensus spike haplotype for another lineage set, mask it out
  • Create a sequence hash to only supply unique sequences to the model for training
  • If any sequences contain fewer than 60% of the CSH mutations for a given lineage set, do not put them forward for training
  • If after all the filtering steps and sequence hashing, remove any lineage sets from training that have less than 5 representative sequences
  • Run decision tree training

hedgehog v1.3.3

27 Jul 12:11
c1faf70
Compare
Choose a tag to compare

Release notes

  • Patch for fixing snakemake update conflict. Version of snakemake pinned.

hedgehog v1.3.2

12 Jul 16:22
Compare
Choose a tag to compare

Release notes

  • Patch for pinning scikit-learn version to fix install breaks, pinned to v1.2.2
  • Pango version remains same as hedgehog v1.3 (Pango v1.21)

hedgehog v1.3.1

12 Jul 15:38
03f1fd0
Compare
Choose a tag to compare

Release notes

  • Patch for updating error message when dependencies are not correctly installed.
  • Pango version remains same as hedgehog v1.3 (Pango v1.21)

hedgehog v1.3

12 Jul 15:26
Compare
Choose a tag to compare

Release notes

  • Update to hedgehog trained on data from pango-designation release v1.21
  • Hedgehog sets mutational threshold X% now set to 60% to better capture Omicron lineages
  • Inference model is random forest model rather than decision tree model
  • Input data to training model has been conflict masked to improve internal decisions within the model