-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assign and use EC codes from model.eccodes
#175
Comments
In GECKO 1/2, I remember that the maximum kcat among all multiple EC codes of a reaction is selected for the reaction. Is it possible to just follow this in GECKO 3? |
In my view, the addition and curation of EC codes is under total control of the modellers. Therefore, I would keep GECKO clear of interfering with the EC codes, especially thinking of the cases when models already come with their own. It would be very confusing for the modeller to have one definition of the EC codes, and for GECKO to completely sidestep that. @Yu-sysbio I thought the decision was to support a single kcat per reaction, so the check for the maximum value would not be needed. |
Selecting maximum kcat from multiple EC models is possible, as the Note that the single-kcat-per-reaction is once the kcat is included in the model ( @mihai-sysbio Your suggestion seems to match having |
This is only required for the GECKO1&2 legacy fuzzy kcat matching. We'll just continue using this approach, refactored for the GECKO3 model format, as implemented in #188. |
Generally, EC codes are reaction specific, and can therefore be defined for each reaction. GECKO1/2 uses EC numbers to parse BRENDA, but this is currently extracted from a UniProt file.
Instead, it would be ideal to use an
eccodes
field that is part of the model, for the reasons:model.eccodes
also contains EC numbers, but it might be preferred to have a separatemodel.ec.eccodes
field, so not to intefere too much with the original GEM (it might for instance be annotated to multiple EC codes, but this is not good for GECKO, see below), and havingmodel.ec
containing all information that is essential for the enzyme-constraint extension. However, this is also (potentially) duplicating information, so perhaps usingmodel.eccodes
is the best solution anyway.Many models have no EC numbers annotated. There can be multiple ways to gather such information:
getEnzymeCodes
. This could be somewhat modified to parse UniProt and/or KEGG to extract protein specific EC codes.One consideration is single EC codes should be defined for each reaction. E.g. UniProt can include multiple EC codes per enzyme, which can be explained by:It is an isozyme, able to catalyze the conversion of multiple different substrates, having multiple different cofactors, or even catalyzing completely different reactions. But each reaction should only have one EC code.One of the EC codes is less specific than the others. E.g. 1.1.1.2 is an alcohol dehydrogenase, while 1.1.1.21 is an alditol dehydrogenase (and alditol is an alcohol), while 1.1.1.14 is an iditol dehydrogenase (and iditol is an alditol). If the specific reaction involves iditol it should only be assigned to 1.1.1.14, but if the enzyme can also catalyze dehydrogenases of other alditols, they should be annotated with 1.1.1.21 instead.So, main points:
model.eccodes
(and curate it to contain single EC numbers per reaction), or have a separatemodel.ec.eccodes
field?addECcodes
function that can parse EC numbers from different input (or have separate functions), including UniProt. Possibly repurposinggetEnzymeCodes
.Have a check included to make sure that are only single EC codes per reaction. This is only relevant for finding kcat values from BRENDA, it is not required for e.g. DLKcat, so should not necessarily be always enforced.Edit: single ec-codes is likely not preferred, as having multiple (with decreasing substate specificity) can help to e.g. match alternative kcat values in BRENDA
The text was updated successfully, but these errors were encountered: