Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XEC spike diversity issue #2088

Open
2 of 31 tasks
xz-keg opened this issue Sep 25, 2024 · 80 comments
Open
2 of 31 tasks

XEC spike diversity issue #2088

xz-keg opened this issue Sep 25, 2024 · 80 comments

Comments

@xz-keg
Copy link
Contributor

xz-keg commented Sep 25, 2024

It seems that there is, and will be a lot of spike diversity in XEC. Better gather them in one issue.
Only count for branches with more than 5 seqs from 2 places, or from 3 or more places (NEW)

GPT Model trained on August data(seqs before XEC appears) predicts 4 spike mutations for XEC on top potential mutation list: S:T572I, S:R346T, S:N185D, S:A688V.

Tasks

@FedeGueli
Copy link

FedeGueli commented Sep 25, 2024

Very good idea, i will try to help. do u prefer direct editing or highlighting in the comments?

I would require also that the lineage has to be sampled at least once in September ( or last 30 days) to avoid dead ends
If you agree .

@xz-keg
Copy link
Contributor Author

xz-keg commented Sep 25, 2024

Very good idea, i will try to help. do u prefer direct editing or highlighting in the comments?

I would require also that the lineage has to be sampled at least once in September ( or last 30 days) to avoid dead ends If you agree .

Sure, you can do direct editing.

@xz-keg xz-keg pinned this issue Sep 26, 2024
@cvejris
Copy link

cvejris commented Sep 26, 2024

Branch 7: XEC+Orf1a:A599T+S:S680F (furin), 5seqs (1xFrance, 1xCanada, 3xUS). Arose 2x independently (all US sequences are separated on the Orf1a:T2274I subbranch). All seqs less than 1 month old. Query: G2060A, C23601T

@Mydtlwn
Copy link

Mydtlwn commented Sep 27, 2024

@corneliusroemer

@cvejris
Copy link

cvejris commented Sep 28, 2024

Branch 9: XEC+Orf1a:A599T+S:E1202Q, 2 seqs, France + Ireland. Query: G2060A,C11020T,G25166C

@FedeGueli
Copy link

Br.7 went to 8 with three GBW samples from Peru' (2 patients)

@cvejris
Copy link

cvejris commented Oct 1, 2024

Branch 10/11: XEC+S:W152R (defining for Centaurus) arose convergently: Branch 10 via T22016A (1xNL, 1xFR). Branch 11 via T22016C (1xIR, 1xUSA). All seqs sampled in September

@cvejris
Copy link

cvejris commented Oct 1, 2024

Branch 12: XEC+S:T678I (C23595T), furin-adjacent. 4xCAN, 2xFR, 2xUSA. Convergent: one on the Orf1a:I1367L, the rest on the Orf1a:A599T polytomy. All seqs sampled in September

@cvejris
Copy link

cvejris commented Oct 1, 2024

Branch 13: XEC+S:P1263Q(C25350A): 3xSW,1xFR,1xNL. Convergent (Swedish seqs on the C27630T subbranch of Orf1a:A599T, the rest on the Orf1a:A599T polytomy).
Interestingly, most XEC with P1263Q harbor other "promising" mutations
image

@xz-keg
Copy link
Contributor Author

xz-keg commented Oct 1, 2024

Branch 10: XEC+S:W152R (defining for Centaurus) arose convergently: i. via T22016A (1xNL, 1xFR) ii. via T22016C (1xIR, 1xUSA). All seqs sampled in September

Please separate these branches and ensure every branch is monoplyetic.

@cvejris
Copy link

cvejris commented Oct 1, 2024

Branch 14: XEC+S:P1263L(C25350T), same site as branch 13. 1xSW,1xPL.

@xz-keg
Copy link
Contributor Author

xz-keg commented Oct 2, 2024

Branch 13: XEC+S:P1263Q(C25350A): 3xSW,1xFR,1xNL. Interestingly, most XEC with P1263Q harbor other "promising" mutations image

Please check with usher before proposing. They seem to be on different usher branches.
https://genome-test.gi.ucsc.edu/cgi-bin/hgPhyloPlace

If they are on different usher branches it is likely they emerge separately and shouldn't be treated as one. Unless you provide reason (artifact, usher-flip flop, convergent other mutations like branch 7, etc. ) to merge them.

@cvejris
Copy link

cvejris commented Oct 2, 2024

Branch 13: XEC+S:P1263Q(C25350A): 3xSW,1xFR,1xNL. Interestingly, most XEC with P1263Q harbor other "promising" mutations image

Please check with usher before proposing. They seem to be on different usher branches. https://genome-test.gi.ucsc.edu/cgi-bin/hgPhyloPlace

If they are on different usher branches it is likely they emerge separately and shouldn't be treated as one. Unless you provide reason (artifact, usher-flip flop, convergent other mutations like branch 7, etc. ) to merge them.

IMO, at present the relatively low number of XEC seqs makes it hard to correctly assess the phylogeny, the resolution is still insufficient. The total XEC Usher tree still places most seqs on polytomy.
My contributions should not be treated as lineage proposals. I look for mutations which may be beneficial for the virus. They might be i. founder mutations for a monophyletic lineage which spread to different countries, or ii. the same AA substitution arising independently on different background in different countries. It does not make that much difference - both scenarios may in theory indicate selective advantage conferred by the mutations.
I edited my contributions by adding notes about convergence for the mutations where I think there was one :)

@FedeGueli
Copy link

@aviczhl2 i think soon we will be force to raise the parameters to three places and 10 seqs, i tell this because in my previous experience with spike diversity issues it rapidly becomes very mess or too long with the opposite effect to risk hiding something fast instead of highlighting it. It is not yet the moment but we shopuld think about it.

@cvejris
Copy link

cvejris commented Oct 2, 2024

Branch 15: XEC+Orf1a:A599T+S:K182N. 3xNL, 1xENG, 1xCanary Islands (with additional S:M153I). Monophyletic, query: C829T, A22108T

@FedeGueli
Copy link

FedeGueli commented Oct 3, 2024

Not a Branch but worth tracking: XEC + Orf1a:A599T+ S:A475V
Query:T8416C,C22986T,T3565C
Samples: 3
Places: Netherlands, 2 regions (Gelderman, Zuit Holland)

@FedeGueli
Copy link

FedeGueli commented Oct 3, 2024

Solved 5 now from Br.8 also from Germany it looks interesting

Deleted for mistake the query of BR.8 i m re building it ( i fear it is not monophyletic though)

@cvejris
Copy link

cvejris commented Oct 4, 2024

Branch 16: XEC+Orf1a:A599T+S:G72R. 5 seqs, 5 countries: Denmark, Netherlands, France (with S:K113R), Canada, GBW from Mexico. Query: G2060A, G21776A,T3565C (edited)

@FedeGueli
Copy link

Branch 16: XEC+Orf1a:A599T+S:G72R. 5 seqs, 5 countries: Denmark, Netherlands, France (with S:K113R), Canada, GBW from Mexico. Query: G2060A, G21776A,T3565C (edited)

i ve added T3565C to exclude old samples.

@xz-keg
Copy link
Contributor Author

xz-keg commented Oct 6, 2024

G2060A, G21776A,T3565C

You can directly edit the task list. But be cautious on cvejris proposals that may not be monophyletic.

@FedeGueli
Copy link

FedeGueli commented Oct 6, 2024

G2060A, G21776A,T3565C

You can directly edit the task list. But be cautious on cvejris proposals that may not be monophyletic.

yeah i m a bit confused about the branch numbering . @cvejris i suggest just to add the lineage you find without a branch number

@FedeGueli
Copy link

XEC+ Orf1a:A599T + S:P561H (C23244A)
Query: C23244A,C18657T, C25006T,
Samples: 2
Countries: 2 Scotland , Sweden

@FedeGueli
Copy link

FedeGueli commented Oct 9, 2024

Important (likely): XEC got S:I68F (by Gisiad correctly) ( read as S:-70F by USher, Covspectrum and Nextclade ) :
Xec> C583T > S:I68F (G21770T)
Query: C583T,G21770T,T3565C
Samples: 4
Countries 4 France, Rep Czech, Sweden , Egypt (via GBW)
Tree:
Screenshot 2024-10-09 alle 11 50 30
https://nextstrain.org/fetch/genome-test.gi.ucsc.edu/trash/ct/subtreeAuspice1_genome_test_18030_6500e0.json?c=userOrOld&label=id:node_11692317

Now added as branch 13

Ping @corneliusroemer here something to watch

@FedeGueli
Copy link

Important (likely): XEC got S:I68F (by Gisiad correctly) ( read as S:-70F by USher, Covspectrum and Nextclade ) : Xec> C583T > S:I68F (G21770T) Query: C583T,G21770T,T3565C Samples: 4 Countries 4 France, Rep Czech, Sweden , Egypt (via GBW)
Now added as branch 13

Jumped to 7 with a batch from France, 2 different provinces

@xz-keg
Copy link
Contributor Author

xz-keg commented Oct 12, 2024

Important (likely): XEC got S:I68F (by Gisiad correctly) ( read as S:-70F by USher, Covspectrum and Nextclade ) : Xec> C583T > S:I68F (G21770T) Query: C583T,G21770T,T3565C Samples: 4 Countries 4 France, Rep Czech, Sweden , Egypt (via GBW)
Now added as branch 13

Jumped to 7 with a batch from France, 2 different provinces

+3 GBW from Turkey, you can propose it to main.

@xz-keg
Copy link
Contributor Author

xz-keg commented Oct 23, 2024

add branch 18(S:H49Y) and branch 19(E1202K)
apply a new rule limiting branches with >=5 seqs and >=2 places, or >=3 places only.

@Over-There-Is
Copy link

Q52L

@FedeGueli
Copy link

Q52L

i am tracking one with it from Slovenia with A21717T ,T23542C , 3 seqs is it the same?

@cvejris
Copy link

cvejris commented Oct 23, 2024

branch 7 is 22 now, @cvejris you can propose it to main.

Upon closer look, S680F does not appear monophyletic, but heavily convergent. Many sequences are part of larger lineages (defined by C7086T or C7086T+C28093T), which originally lacked S680F. For sequences on the Orf1a:A599T polytomy, I would guess these are convergent too.

@xz-keg
Copy link
Contributor Author

xz-keg commented Oct 23, 2024

branch 7 is 22 now, @cvejris you can propose it to main.

Upon closer look, S680F does not appear monophyletic, but heavily convergent. Many sequences are part of larger lineages (defined by C7086T or C7086T+C28093T), which originally lacked S680F. For sequences on the Orf1a:A599T polytomy, I would guess these are convergent too.

C7086T is a very convergent mutation. It could be first getting S680F then get C7086T.

@FedeGueli
Copy link

FedeGueli commented Oct 25, 2024

446I in Wales and Australia
Screenshot 2024-10-25 alle 10 38 39
https://nextstrain.org/fetch/genome-test.gi.ucsc.edu/trash/ct/subtreeAuspice2_genome_test_15964_b57db0.json?label=id:node_11721902
still not matching the threshold.
Query:G22899T,C18657T, C19716T,

@cvejris
Copy link

cvejris commented Oct 25, 2024

You too waiting for RBD mutations? :) I´m tracking V445R (3-nuc) - still only 3 seqs from 3 countries, but convergent, not monophyletic

@FedeGueli
Copy link

You too waiting for RBD mutations? :) I´m tracking V445R (3-nuc) - still only 3 seqs from 3 countries, but convergent, not monophyletic

Ah good to know it is spreading further i saw that too but it was a singlet

@xz-keg
Copy link
Contributor Author

xz-keg commented Oct 26, 2024

add branch 20, G75R on XEC.2

@FedeGueli
Copy link

Br.7 up to 29 @cvejris please verify if still splitted in two and if so which one is growing

@cvejris
Copy link

cvejris commented Oct 27, 2024

Br.7 up to 29 @cvejris please verify if still splitted in two and if so which one is growing

More than two. See the distribution of mutations:
image

There appear two clusters which imo could be considered branches, the rest is either convergent or "phylogenetically illegible":

  1. XEC + Orf1a: A599T (G2060A) + Orf1a: T2274I (C7086T) + S:S680F (C23601T). 9 seqs: 5xGuadeloupe, 2x US, 1xFrance via GBW, 1xWales
  2. XEC + Orf1a: A599T (G2060A) + M: I52V (A26676G) + S:S680F (C23601T. 5 seqs: 3xSpain, 1xNetherlands, 1xJapan

Which should I propose?

@FedeGueli
Copy link

Br.7 up to 29 @cvejris please verify if still splitted in two and if so which one is growing

More than two. See the distribution of mutations: image

There appear two clusters which imo could be considered branches, the rest is either convergent or "phylogenetically illegible":

  1. XEC + Orf1a: A599T (G2060A) + Orf1a: T2274I (C7086T) + S:S680F (C23601T). 9 seqs: 5xGuadeloupe, 2x US, 1xFrance via GBW, 1xWales
  2. XEC + Orf1a: A599T (G2060A) + M: I52V (A26676G) + S:S680F (C23601T. 5 seqs: 3xSpain, 1xNetherlands, 1xJapan

Which should I propose?

none of them i will separate it in two branches

@FedeGueli
Copy link

Br.17 up to 8 fastly

@FedeGueli
Copy link

  1. XEC + Orf1a: A599T (G2060A) + M: I52V (A26676G) + S:S680F (C23601T. 5 seqs: 3xSpain, 1xNetherlands, 1xJapan

This has been added as Branch 22 now

@cvejris
Copy link

cvejris commented Oct 27, 2024

S:S31F (C21654T): 5seqs total - 2x XEC.2 (US, Denmark), 3x Orf1a:A599T branch (Germany, France 2x)

@xz-keg
Copy link
Contributor Author

xz-keg commented Oct 29, 2024

add branch 23(V1228L)
Branch 24(S940F)
Branch 25(L822I)
Branch 26(G181V)
Branch 27(W152R via 22016C)
Branch 28(H1101Y)
Branch 29(P681H)
Branch 30(V1176F)
Branch 31(L1143F)

@xz-keg
Copy link
Contributor Author

xz-keg commented Oct 29, 2024

Branch 16: XEC+Orf1a:A599T+S:G72R. 5 seqs, 5 countries: Denmark, Netherlands, France (with S:K113R), Canada, GBW from Mexico. Query: G2060A, G21776A,T3565C (edited)

This one(branch 11) is 21 now, please propose it,

@cvejris
Copy link

cvejris commented Oct 29, 2024

Can you please add XEC+ Orf1a:A599T+S:T678I to the list above? Curently 16 seqs (6xCanada, 3xNetherlands, 3xFrance, 2xAustralia, 1xNew Zealand, 1xJapan). Query G2060A, C23595T

@xz-keg
Copy link
Contributor Author

xz-keg commented Oct 29, 2024

Can you please add XEC+ Orf1a:A599T+S:T678I to the list above? Curently 16 seqs (6xCanada, 3xNetherlands, 3xFrance, 2xAustralia, 1xNew Zealand, 1xJapan). Query G2060A, C23595T

image

not homoplastic

@Over-There-Is
Copy link

Over-There-Is commented Oct 29, 2024

K150R,K529E,S704L 4, Czech rep.

@Over-There-Is
Copy link

Q173R

@Over-There-Is
Copy link

G20062A,C20946T,G22599C,C29535T R346T

@Over-There-Is
Copy link

Over-There-Is commented Oct 29, 2024

S256L

16 now C18657T, C22329T, C24382T

@Over-There-Is
Copy link

V705I

@cvejris
Copy link

cvejris commented Oct 29, 2024

K150R,K529E,S704L 4, Czech rep.

proposed already #2182

@NkRMnZr
Copy link

NkRMnZr commented Oct 30, 2024

there're several S:A222V branches, major one right under the polytomy (4, 3 from Canada, 1 from USA)

@FedeGueli
Copy link

FedeGueli commented Oct 30, 2024

Also emergence of S:V308L and (in another branch) S:S704L on XEC.2 backbone in Canada cluster as S:Q173R mentioned by @Over-There-Is

thx @NkRMnZr i ve seen those 222V ones too.

@FedeGueli
Copy link

There is a singlet in Singapore with S:I68V and S:A475V : C22986T, C29303T,T6451C
pay attenttion that all the tools read it as 68-69del V70V .

@FedeGueli
Copy link

S:T716I also Canadian Cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants