-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add find_active_reactions.py to find active reactions quicker #841
base: devel
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## devel #841 +/- ##
==========================================
- Coverage 84.39% 83.93% -0.47%
==========================================
Files 47 48 +1
Lines 4205 4433 +228
Branches 978 1044 +66
==========================================
+ Hits 3549 3721 +172
- Misses 423 458 +35
- Partials 233 254 +21
Continue to review full report at Codecov.
|
It seems that you describe a full loop-removal procedure in the paper. If that does not have the same drawbacks as CycleFreeFlux it may be worthwhile to switch to yours. Could you use it for loopless FVA as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good style, some minor initial comments for now 😄
|
||
def find_active_reactions(model, bigM=10000, zero_cutoff=None, | ||
relax_bounds=True, solve="lp", verbose=False): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using a verbose flag and print, use the logging module as done in other modules and output the additional messages with INFO or DEBUG. That allows user to set the verbosity globally rather than individually for each function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the docstring is not in the right format. Needs a small description right after """"
.
else: | ||
eps = normalize_cutoff(model, zero_cutoff) | ||
|
||
eps = max(eps, model.solver.configuration.tolerances.feasibility * 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may still be good to use `model.tolerance' here. Not all tolerances (integrality, feasibility, optimality) are implemented for all solvers (for instance GLPK has no optimality tolerance, they use different measures).
|
||
|
||
def find_reactions_in_cycles(model, bigM=10000, zero_cutoff=1e-1, | ||
relax_bounds=True, solve="lp", verbose=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is exported it should get a docstring.
return active_rxns | ||
|
||
|
||
def find_reactions_in_cycles(model, bigM=10000, zero_cutoff=1e-1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does that zero_cutoff not default to model.tolerance as before?
large_model.solver = all_solvers | ||
# solve LPs | ||
rxns_in_cycles_lp = find_reactions_in_cycles(large_model) | ||
# solving MILP or Fast-SNP may not be feasible for some solvers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make tests conditional on the available solvers. We can help you with that or you could look through some of the other tests for some examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you refer me to one of the examples? Have made other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_solvers
is a fixture that will test with all available solvers. Should be good as is.
From what I understand, CycleFreeFlux is a post-optimization loop-removal procedure, which does not guarantee optimality. I guess it is particularly true for the reactions in loops. The method in my paper guarantee optimality as the original loop-law constraints and usually with much less binary variables, thus faster. The functions here are only for the first step of the whole localized-loopless-FVA procedure, i.e., to find a minimal feasible null space. The rest of the implementation will need some significant coding, which I hope I can do some time later. |
Any idea why the test is stuck at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have the time yet to go through the entire code. Out of curiosity, how is your algorithm affected by (how does it treat) reactions that are unbounded in one direction? Are they then also intended to be bounded by big M?
|
||
Notes | ||
----- | ||
The optmization problem solved is as follow: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be really neat to use the Sphinx directives so that this is rendered as proper math. See here.
They are bounded by either big M (if relax_bounds=True) or the original bounds (if relax_bounds=False). The additional constraints for irreversible reactions are actually similar to fastcc while for the reversible reactions are a little different. |
Good news: the Fast-SNP which I also implemented in this pull request, can be readily used to largely reduce the time for loopless FVA already. I reorganized the newly added functions and added a |
This sounds great. If I find the time I will give it a go with the problematic example from #698. If it handles that well I think the way to go would be to drop CycleFreeFlux in favor of your method. |
@cdiener great. I am pretty sure that it will work fine. By the way, if you have time, could you advise me what goes wrong in the test? It just froze after test_fastcc. I couldn't figure why. I don't think I have changed anything related to that. And I could finish the run if I ran pytest locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to read your paper in detail but made some comments on the code already. Looks promising 😃
switch_constrs.append(prob.Constraint(r.flux_expression + | ||
coeff_neg * z_neg[r], ub=-eps)) | ||
|
||
model.add_cons_vars([z for r, z in z_pos.items()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model.add_cons_vars([z for r, z in z_pos.items()]) | |
model.add_cons_vars([z for z in z_pos.values()]) |
coeff_neg * z_neg[r], ub=-eps)) | ||
|
||
model.add_cons_vars([z for r, z in z_pos.items()]) | ||
model.add_cons_vars([z for r, z in z_neg.items()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model.add_cons_vars([z for r, z in z_neg.items()]) | |
model.add_cons_vars([z for z in z_neg.values()]) |
model.add_cons_vars([z for r, z in z_neg.items()]) | ||
model.add_cons_vars(switch_constrs) | ||
model.objective = prob.Objective(Zero, sloppy=True, direction="min") | ||
model.objective.set_linear_coefficients({z: 1.0 for r, z in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above you can use the dict.values
method.
model.objective = prob.Objective(Zero, sloppy=True, direction="min") | ||
model.objective.set_linear_coefficients({z: 1.0 for r, z in | ||
z_pos.items()}) | ||
model.objective.set_linear_coefficients({z: 1.0 for r, z in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
if solve == "milp": | ||
try: | ||
# ensure bigM*z << eps at integrality tolerance limit | ||
eps = max(eps, model.solver.configuration.tolerances.integrality * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eps = max(eps, model.solver.configuration.tolerances.integrality * | |
eps = max(eps, model.tolerance * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. But here it is important for eps
to be at least 10x larger then the integrality tolerance if solving MILP. Otherwise there could be integrality violation to satisfy the constraint so that some reactions do not necessarily have flux.
And I was not sure if it is always defined for every solver (i.e. model.configurationas.tolerances.integrality
is always an attribute), so I added try
/except
here hoping to see if that might help pass the test. But it was probably not the issue and is not needed, correct? Is model.tolerance
not less than integrality tolerance if it is used? Any suggestion? Just want to make sure it is large enough when solving MILP and when integrality is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. We have simplified the tolerance interface in cobrapy. So setting the model.tolerance
attempts to set all available tolerances given by the solver. Thus it is also expected that the model.tolerance
value is the same one for integrality, feasiblity, etc. So yes, using model.tolerance
should do the right thing and also not raise any exceptions.
{r.reverse_variable: 1.0 for r in model.reactions}) | ||
|
||
iter = 0 | ||
while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a potentially infinite loop. It'd be safer to introduce a function parameter max_iterations
(or similar name) with a default value and then run a loop instead.
while True: | |
for i in range(max_iterations): |
cobra/flux_analysis/loopless.py
Outdated
sol = model.optimize() | ||
|
||
if sol.status == "optimal": | ||
x = sol.fluxes.to_numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the following also works rather than making a copy?
x = sol.fluxes.to_numpy() | |
x = sol.fluxes.values |
cobra/flux_analysis/loopless.py
Outdated
sol = model.optimize() | ||
|
||
if sol.status == "optimal": | ||
y = sol.fluxes.to_numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
cobra/flux_analysis/loopless.py
Outdated
constr_proj.ub = bigM | ||
constr_proj.lb = eps | ||
constr_proj.ub = None | ||
sol = model.optimize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model.optimize()
may return an OptimizationError
, should that be handled?
large_model.solver = all_solvers | ||
# solve LPs | ||
rxns_in_cycles_lp = find_reactions_in_cycles(large_model) | ||
# solving MILP or Fast-SNP may not be feasible for some solvers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_solvers
is a fixture that will test with all available solvers. Should be good as is.
3ef066f
to
900f132
Compare
I think I have addressed all comments and have restructured the code so that it is more pythonic. |
Sorry for the delay, just saw it and will go through it again. Will probably take me a little since it is a lot of code changes. From a quick look I don't see any changes to flux_variability_analysis as mentioned above and we will need more tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay. Most of it being compliant with pep8 and numpy docstring format as well as rebasing on the latest devel branch. However, the method also does not seem to pass its own tests. Also the tests should also compare your implementation to the reference implementation (method="original" in add_loopless) for a smaller model.
Some functionality seems to overlap with the already existing fastcc function. In particular the find_active_reactions
function. It seems to be very similar things.
However, in_cycles
version looks very useful and it would be great to add it to FVA. Is it guranteed to remove all cycles from the solution?
I can help out with fixing the the style issues if you want but would leave the methods to you if that is all right.
|
||
|
||
def fastSNP(model, bigM=1e4, zero_cutoff=None, eps=1e-3, N=None): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should follow the numpy docstring format. So it needs a short imperative description after """
.
x, y = None, None | ||
try: | ||
sol = model.optimize() | ||
except OptimizationError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OptimizationError
has not been imported. Also the handling does not seem to be correct. If there is no feasible solution you should probably raise an error as well.
@@ -0,0 +1,486 @@ | |||
|
|||
""" | |||
Find all active reactions by solving a single MILP problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to specify what "active" means. There is already a find_blocked_reactions
but this module does not seem to do the opposite (identifying any reactions that can carry flux).
LOGGER = logging.getLogger(__name__) | ||
|
||
|
||
def find_active_reactions(model, bigM=10000, zero_cutoff=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming might be confusing. Maybe find_loopless_reactions
or something.
|
||
def find_active_reactions(model, bigM=10000, zero_cutoff=None, | ||
relax_bounds=True, solve="lp", verbose=False): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the docstring is not in the right format. Needs a small description right after """"
.
model.add_cons_vars(constr_min_flux) | ||
|
||
n_lp_solved = 3 | ||
feas_tol = model.tolerance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable is never used.
constr_min_flux.lb = -min_flux | ||
constr_min_flux.ub = -min_flux | ||
constr_min_flux.lb = None | ||
feas_tol = model.tolerance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable is never used.
constr_proj.lb = None | ||
try: | ||
sol = model.optimize() | ||
except OptimizationError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@@ -249,3 +270,156 @@ def loopless_fva_iter(model, reaction, solution=False, zero_cutoff=None): | |||
best = reaction.flux | |||
model.objective.direction = objective_dir | |||
return best | |||
|
|||
|
|||
def fastSNP(model, bigM=1e4, zero_cutoff=None, eps=1e-3, N=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also live in find_active_reactions.py
.
benchmark(find_reactions_in_cycles, model) | ||
|
||
|
||
def test_find_active_reactions(model, all_solvers): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails for me with all solvers:
def test_find_reactions_in_cycles(large_model, all_solvers):
"""Test find_reactions_in_cycles."""
large_model.solver = all_solvers
# solve LPs
rxns_in_cycles_lp = find_reactions_in_cycles(large_model)
rxns_in_cycles = ['ABUTt2pp', 'ACCOAL', 'ACKr', 'ACS', 'ACt2rpp',
'ACt4pp', 'ADK1', 'ADK3', 'ADNt2pp', 'ADNt2rpp',
'ALATA_L', 'ALAt2pp', 'ALAt2rpp', 'ALAt4pp', 'ASPt2pp',
'ASPt2rpp', 'CA2t3pp', 'CAt6pp', 'CRNDt2rpp',
'CRNt2rpp', 'CRNt8pp', 'CYTDt2pp', 'CYTDt2rpp',
'FOMETRi', 'GLBRAN2', 'GLCP', 'GLCP2', 'GLCS1',
'GLCtex', 'GLCtexi', 'GLDBRAN2', 'GLGC', 'GLUABUTt7pp',
'GLUt2rpp', 'GLUt4pp', 'GLYCLTt2rpp', 'GLYCLTt4pp',
'GLYt2pp', 'GLYt2rpp', 'GLYt4pp', 'HPYRI', 'HPYRRx',
'ICHORS', 'ICHORSi', 'INDOLEt2pp', 'INDOLEt2rpp',
'INSt2pp', 'INSt2rpp', 'NAt3pp', 'NDPK1', 'PPAKr',
'PPCSCT', 'PPKr', 'PPM', 'PROt2rpp', 'PROt4pp',
'PRPPS', 'PTA2', 'PTAr', 'R15BPK', 'R1PK', 'SERt2rpp',
'SERt4pp', 'SUCOAS', 'THFAT', 'THMDt2pp', 'THMDt2rpp',
'THRt2rpp', 'THRt4pp', 'TRSARr', 'URAt2pp', 'URAt2rpp',
'URIt2pp', 'URIt2rpp', 'VALTA', 'VPAMTr']
> assert set(rxns_in_cycles_lp) == set(rxns_in_cycles)
E AssertionError: assert {'ABUTt2pp', ...'ACt4pp', ...} == {'ABUTt2pp', '...'ACt4pp', ...}
E Extra items in the left set:
E 'ICHORS_copy1'
E 'INSt2pp_copy2'
E 'ALAt2pp_copy1'
E 'ICHORS_copy2'
E 'GLYt2pp_copy2'
E 'GLCtex_copy2'...
E
E ...Full output truncated (43 lines hidden), use '-vv' to show
Add
find_active_reactions.py
and atest_find_active_reactions.py
.Had some discussion with @ChristianLieven and @Midnighter before. Said it might be nice to add these functions for finding active reactions (primarily for finding reactions in loops) which solve an MILP or a few LPs instead of doing FVA. From tests on a few different models, comparing to using
cobra.flux_analysis.find_blocked_reactions.py
, it can be 3 - 5x faster. Would be good to suggest ways to streamline the code and improve speed. I have been usingcobrapy
from time to time but not very into its infrastructure. The test may need changes too.