-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: fuzzy kcat matching #188
Conversation
|
Set this as draft as I'm currently refactoring the work, including the Uniprot and KEGG database files. I'm not working on updating the |
And directly add eccodes to model.ec.eccodes.
To be done in separate function.
And rename getECCodes to getECfromDatabase
I'll soon rebase this PR to a separate branch, to avoid having additional The original refactoring as done by @johan-gson is in the Based on the implementation by @johan-gson, I've made the following changes:
Not addressed is the actual fuzzy matching, or updated |
2b44678
to
ad3f39a
Compare
Regarding the updated |
Points 1) and 2) can be discussed in #157. |
# Conflicts: # src/geckomat/change_model/makeEcModel.m # src/geckomat/gather_kcats/fuzzyKcatMatching.m # src/geckomat/gather_kcats/writeDLKcatInput.m # src/geckomat/get_enzyme_data/findECInDB.m # src/geckomat/get_enzyme_data/findInDBOld._obsolete.m # src/geckomat/get_enzyme_data/findInDB_obsolete.m # src/geckomat/get_enzyme_data/getECfromDatabase.m # src/geckomat/get_enzyme_data/getECfromGEM.m # src/geckomat/get_enzyme_data/getEnzymeCodes.m # src/geckomat/get_enzyme_data/loadBRENDAdata.m # src/geckomat/get_enzyme_data/loadDatabases.m # src/geckomat/get_enzyme_data/updateDatabases.m # src/geckomat/utilities/loadDatabases.m # userData/ecYeastGEM/data/ProtDatabase.mat # userData/ecYeastGEM/data/uniprot.tab
Ready for review! This can now generate a BRENDA fuzzy matching based ec-model. Some code is available in Note that protein databases (KEGG, UniProt) are not distributed with GECKO but are rather reconstructed with To do in next PR:
|
if nargin<2 | ||
selectDatabase = 'both'; | ||
end | ||
geckoPath = findGECKOroot(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize this code is just copied from my previous code, but I think it would be better to move the retrieval of this path to the base model, so it can be overridden. This way, we could create a small test taxon placed in the test folder when we do test cases, and a TestModelAdapter would then point to that folder instead of the standard one. That also makes it possible to provide a custom such database if desired.
|
||
%% Uniprot | ||
if any(strcmp(selectDatabase,{'uniprot','both'})) | ||
filePath = fullfile(geckoPath,'databases','uniprot',[num2str(taxonID) '.tsv']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking then that this would be replaced by something like
filePath = fullfile(modelAdapter.getUniprotPath(),[num2str(taxonID) '.tsv']);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in the base adapter for example (not tested):
function path = getUniprotPath(obj)
geckoPath = findGECKOroot();
path = fullfile(geckoPath,'databases','uniprot');
end
Someone should think through if there should be just one function for all paths like I did now (yes I was lazy :) ) or one function per path, I don't have a strong opinion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way, that function could be overridden in a TestModelAdapter
|
||
%% KEGG | ||
if any(strcmp(selectDatabase,{'kegg','both'})) | ||
filePath = fullfile(geckoPath,'databases','kegg',[keggID '.tsv']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here.
Good changes to this code Ed! I had a comment about moving some things to the adapter - we can either do that now or later. Since we start with the old GECKO code and gradually improve it, we cannot expect to fix all things at once, everything that is an improvement passes as I see it! But in the final perfect product, we should have thought through how to be able to test everything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I accidentally left my comments outside of the review, but see them as optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done Ed! Looks good in general. I ran the script tutorials/protocol.m
smoothly. Besides that, I set ub for the total protein pool reaction at 0.1 and got an infeasible solution. This is due to extremely low kcat values in the model as when increasing from 0.1 to 10 I got 1e-4 growth rate. This means that we indeed need tuning in the next stage.
I actually got some errors or found some bugs when making ecGEMs for other organisms e.g., E. coli. These might be not so relevant here and we could discuss them later maybe in test/debug processes?
@Yu-sysbio Yes, I'd prefer to merge this PR, so that further work can built on this code already, while separately making smaller bug-fixes. @johan-gson Excellent suggestion, will implement this in later PR. Just one thing, you seemed to have commented on an earlier commit, not on the final code: GECKO/src/geckomat/get_enzyme_data/loadDatabases.m Lines 30 to 38 in b2f72b1
Your argument still holds though! |
Main improvements in this PR:
Added the classical GECKO 2 behavior to GECKO 3:
I hereby confirm that I have:
devel
as a target branch (top left drop-down menu)