From 4321e4ef47c3061401052ad334b0785c43687bda Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Sat, 24 Aug 2024 21:55:08 +0300 Subject: [PATCH 01/18] Update blog --- _posts/2024-08-24-GSoC2024-dingo.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 _posts/2024-08-24-GSoC2024-dingo.md diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md new file mode 100644 index 0000000..c22187c --- /dev/null +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -0,0 +1,27 @@ +--- +layout: single +title: "Enhancing dingo python package: from metabolic models reduction to pathways identification" +date: 2024-08-24 +author: Sotiris Touliopoulos +author_profile: true +read_time: true +comments: true +share: true +related: true +hidden: false +--- + + +# Enhancing dingo python package: from metabolic models reduction to pathways identification + +> #### A contribution for the Google Summer of Code 2024 program + +## Overall + +#### A summary of the implemented methods, merged into the dingo library: + +- preprocess for the reduction of metabolic models. +- inference of pairwise correlated reactions. +- visualization of a steady-states correlation matrix. +- clustering of a steady-states correlation matrix. +- construction of a weighted graph of the model's reactions with the correlation coefficients as weights. \ No newline at end of file From 9ddeb069080cacbb9bfd410f5764b01bb3d6ded6 Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Sat, 24 Aug 2024 22:39:48 +0300 Subject: [PATCH 02/18] Update blog --- _posts/2024-08-24-GSoC2024-dingo.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index c22187c..8e8c7da 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -24,4 +24,9 @@ hidden: false - inference of pairwise correlated reactions. - visualization of a steady-states correlation matrix. - clustering of a steady-states correlation matrix. -- construction of a weighted graph of the model's reactions with the correlation coefficients as weights. \ No newline at end of file +- construction of a weighted graph of the model's reactions with the correlation coefficients as weights. + + +
+
+
\ No newline at end of file From 5b3dd1a3cad422990c1d64119d4aa98983d7cf62 Mon Sep 17 00:00:00 2001 From: Sotiris Touliopoulos <109972702+SotirisTouliopoulos@users.noreply.github.com> Date: Sun, 25 Aug 2024 01:40:35 +0300 Subject: [PATCH 03/18] Update 2024-08-24-GSoC2024-dingo.md --- _posts/2024-08-24-GSoC2024-dingo.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index 8e8c7da..e95dc70 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -27,6 +27,4 @@ hidden: false - construction of a weighted graph of the model's reactions with the correlation coefficients as weights. -
-
-
\ No newline at end of file +![reduction_concept](https://github.com/SotirisTouliopoulos/dingo/blob/gh-pages/img/reduction.png) From f3c9572b55d946f8fcd0208ac8fc16b40cc22306 Mon Sep 17 00:00:00 2001 From: Sotiris Touliopoulos <109972702+SotirisTouliopoulos@users.noreply.github.com> Date: Sun, 25 Aug 2024 01:50:59 +0300 Subject: [PATCH 04/18] Update 2024-08-24-GSoC2024-dingo.md --- _posts/2024-08-24-GSoC2024-dingo.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index e95dc70..7f44ec1 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -27,4 +27,4 @@ hidden: false - construction of a weighted graph of the model's reactions with the correlation coefficients as weights. -![reduction_concept](https://github.com/SotirisTouliopoulos/dingo/blob/gh-pages/img/reduction.png) +![reduction_concept](https://raw.githubusercontent.com/SotirisTouliopoulos/dingo/blob/gh-pages/img/reduction.png) From 5d2d8f0b3b69594bc2ec7e2f777c937677a3d62e Mon Sep 17 00:00:00 2001 From: Sotiris Touliopoulos <109972702+SotirisTouliopoulos@users.noreply.github.com> Date: Sun, 25 Aug 2024 01:52:19 +0300 Subject: [PATCH 05/18] Update 2024-08-24-GSoC2024-dingo.md --- _posts/2024-08-24-GSoC2024-dingo.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index 7f44ec1..e95dc70 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -27,4 +27,4 @@ hidden: false - construction of a weighted graph of the model's reactions with the correlation coefficients as weights. -![reduction_concept](https://raw.githubusercontent.com/SotirisTouliopoulos/dingo/blob/gh-pages/img/reduction.png) +![reduction_concept](https://github.com/SotirisTouliopoulos/dingo/blob/gh-pages/img/reduction.png) From d391d7fe8c5dc1cb34067303104a29e5b67518cf Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Tue, 27 Aug 2024 02:18:27 +0300 Subject: [PATCH 06/18] Update blog --- _posts/2024-08-24-GSoC2024-dingo.md | 2 +- _posts/figures/graph_plot.html | 14 ++++++++++++++ 2 files changed, 15 insertions(+), 1 deletion(-) create mode 100644 _posts/figures/graph_plot.html diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index e95dc70..1be7e67 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -27,4 +27,4 @@ hidden: false - construction of a weighted graph of the model's reactions with the correlation coefficients as weights. -![reduction_concept](https://github.com/SotirisTouliopoulos/dingo/blob/gh-pages/img/reduction.png) +![graph](./figures/graph_plot.html) diff --git a/_posts/figures/graph_plot.html b/_posts/figures/graph_plot.html new file mode 100644 index 0000000..bfe1d07 --- /dev/null +++ b/_posts/figures/graph_plot.html @@ -0,0 +1,14 @@ + + + +
+
+ + \ No newline at end of file From a10d723b88c4376330b1bb0f2b87704a14383f9b Mon Sep 17 00:00:00 2001 From: Sotiris Touliopoulos <109972702+SotirisTouliopoulos@users.noreply.github.com> Date: Tue, 27 Aug 2024 02:33:17 +0300 Subject: [PATCH 07/18] Update 2024-08-24-GSoC2024-dingo.md --- _posts/2024-08-24-GSoC2024-dingo.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index 1be7e67..1f2afd4 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -27,4 +27,4 @@ hidden: false - construction of a weighted graph of the model's reactions with the correlation coefficients as weights. -![graph](./figures/graph_plot.html) +![graph](https://github.com/SotirisTouliopoulos/geomscale.github.io/blob/gsoc-2024-flux-sampling/_posts/figures/graph_plot.html) From dbb44c18749839b17f54ac0ff054e3be9de07c6d Mon Sep 17 00:00:00 2001 From: Sotiris Touliopoulos <109972702+SotirisTouliopoulos@users.noreply.github.com> Date: Tue, 27 Aug 2024 02:35:04 +0300 Subject: [PATCH 08/18] Update 2024-08-24-GSoC2024-dingo.md --- _posts/2024-08-24-GSoC2024-dingo.md | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index 1f2afd4..910e5ce 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -14,17 +14,14 @@ hidden: false # Enhancing dingo python package: from metabolic models reduction to pathways identification -> #### A contribution for the Google Summer of Code 2024 program +> #### A contribution to the Google Summer of Code 2024 program -## Overall +## Overview #### A summary of the implemented methods, merged into the dingo library: -- preprocess for the reduction of metabolic models. -- inference of pairwise correlated reactions. -- visualization of a steady-states correlation matrix. -- clustering of a steady-states correlation matrix. -- construction of a weighted graph of the model's reactions with the correlation coefficients as weights. - - -![graph](https://github.com/SotirisTouliopoulos/geomscale.github.io/blob/gsoc-2024-flux-sampling/_posts/figures/graph_plot.html) +- Preprocess for the reduction of metabolic models. +- Inference of pairwise correlated reactions. +- Visualization of a steady-states correlation matrix. +- Clustering of a steady-states correlation matrix. +- Construction of a weighted graph of the model's reactions with the correlation coefficients as weights. From 14a5d4a825db9bf7e7afd7998d1772b39d6e2169 Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Tue, 27 Aug 2024 02:36:13 +0300 Subject: [PATCH 09/18] Update blog --- _posts/figures/graph_plot.html | 14 -------------- 1 file changed, 14 deletions(-) delete mode 100644 _posts/figures/graph_plot.html diff --git a/_posts/figures/graph_plot.html b/_posts/figures/graph_plot.html deleted file mode 100644 index bfe1d07..0000000 --- a/_posts/figures/graph_plot.html +++ /dev/null @@ -1,14 +0,0 @@ - - - -
-
- - \ No newline at end of file From ff5bc0e973a4bb5823df05cd1ffb0f297fcbd82d Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Thu, 29 Aug 2024 03:27:02 +0300 Subject: [PATCH 10/18] Update blog --- _posts/2024-08-24-GSoC2024-dingo.md | 32 ++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index 910e5ce..01e23c7 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -1,6 +1,6 @@ --- layout: single -title: "Enhancing dingo python package: from metabolic models reduction to pathways identification" +title: "Enhancing dingo: from metabolic models reduction to pathways identification" date: 2024-08-24 author: Sotiris Touliopoulos author_profile: true @@ -12,16 +12,42 @@ hidden: false --- -# Enhancing dingo python package: from metabolic models reduction to pathways identification +# Enhancing dingo: from metabolic models reduction to pathways identification > #### A contribution to the Google Summer of Code 2024 program + ## Overview -#### A summary of the implemented methods, merged into the dingo library: +#### A summary of the implemented methods, integrated into dingo: - Preprocess for the reduction of metabolic models. - Inference of pairwise correlated reactions. - Visualization of a steady-states correlation matrix. - Clustering of a steady-states correlation matrix. - Construction of a weighted graph of the model's reactions with the correlation coefficients as weights. + + +## Preprocess + +Large metabolic models contain numerous reactions and metabolites. +Sampling the flux space of such models requires significant computational +time due to the high dimensionsionality. Model preprocessing can mitigate +this issue by removing certain reactions, thus reducing the dimensional space. + +A `PreProcess` class was implemented. After initializing an object from this class, +users can call the `reduce` function to remove 3 types of reactions: + +- Blocked reactions: cannot carry a flux in any condition. +- Zero-flux reactions: cannot carry a flux while maintaining at least 90% of the maximum growth rate. +- Metabolically less-efficient reactions: require a reduction in growth rate if used. + +Users can choose to remove an additional set of reactions, by setting the `extend` parameter +of the `reduce` function to `True`. These reactions do not affect the value of the objective function when removed. + +Reduction with the `PreProcess` class has been tested with various, some of which are included in dingo’s publication article too. Figures below show the number of remained reactions, after calling the `reduce` function. + +
+
+
+
\ No newline at end of file From c6122676b78c1b3a980af816126bccd189feab47 Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Thu, 29 Aug 2024 03:41:49 +0300 Subject: [PATCH 11/18] Update blog --- _posts/2024-08-24-GSoC2024-dingo.md | 33 ++++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index 01e23c7..853d081 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -48,6 +48,33 @@ of the `reduce` function to `True`. These reactions do not affect the value of t Reduction with the `PreProcess` class has been tested with various, some of which are included in dingo’s publication article too. Figures below show the number of remained reactions, after calling the `reduce` function.
-
-
-
\ No newline at end of file +

+

+ + + +## Correlated Reactions + +- Reactions in biochemical pathways can be positively correlated, negatively correlated, or uncorrelated. +- Positive correlation: if reaction A is active, then reaction B is also active and vice versa. +- Negative correlation: if reaction A is active, then reaction B is inactive and vice versa. +- Zero correlation: The status of reaction A is independent of the status of reaction B and vice versa. + +A `correlated_reactions` function that calculates reactions steady states using dingo's `PolytopeSampler` class and creates a correlation matrix was implemented. The correlation matrix is based on the pearson correlation coefficient between pairwise reactions. This function also calculates a copula indicator to filter correlations greater than the pearson cutoff. + +A `plot_corr_matrix` function to visualize the correlation matrix as a heatmap plot was implemented too. + +This figure illustrates a heatmap from a symmetrical correlation matrix without pearson or indicator filtering: + +
+

+
+ +This figure illistrates a heatmap from a triangular correlation matrix with both pearson and indicator filtering: + +
+

+
+ + +## Clustering \ No newline at end of file From 9dd10bdebae92934eb49e062a337d4425e611379 Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Sat, 7 Sep 2024 14:13:12 +0300 Subject: [PATCH 12/18] Update blog --- _posts/2024-08-24-GSoC2024-dingo.md | 59 ++++++++++++++++++++++++----- 1 file changed, 49 insertions(+), 10 deletions(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index 853d081..047487f 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -12,7 +12,7 @@ hidden: false --- -# Enhancing dingo: from metabolic models reduction to pathways identification +# Enhancing dingo: from metabolic models reduction to pathways prediction > #### A contribution to the Google Summer of Code 2024 program @@ -30,10 +30,9 @@ hidden: false ## Preprocess -Large metabolic models contain numerous reactions and metabolites. -Sampling the flux space of such models requires significant computational -time due to the high dimensionsionality. Model preprocessing can mitigate -this issue by removing certain reactions, thus reducing the dimensional space. +- Large metabolic models contain numerous reactions and metabolites. +- Sampling the flux space of such models requires significant computational time due to the high dimensionsionality. +- Model preprocessing can mitigate this issue by removing certain reactions, thus reducing the dimensional space. A `PreProcess` class was implemented. After initializing an object from this class, users can call the `reduce` function to remove 3 types of reactions: @@ -45,7 +44,7 @@ users can call the `reduce` function to remove 3 types of reactions: Users can choose to remove an additional set of reactions, by setting the `extend` parameter of the `reduce` function to `True`. These reactions do not affect the value of the objective function when removed. -Reduction with the `PreProcess` class has been tested with various, some of which are included in dingo’s publication article too. Figures below show the number of remained reactions, after calling the `reduce` function. +Reduction with the `PreProcess` class has been tested with various models, some of which are included in dingo’s publication article too. Figures below show the number of remained reactions, after calling the `reduce` function.


@@ -65,16 +64,56 @@ A `correlated_reactions` function that calculates reactions steady states using A `plot_corr_matrix` function to visualize the correlation matrix as a heatmap plot was implemented too. This figure illustrates a heatmap from a symmetrical correlation matrix without pearson or indicator filtering: -


-This figure illistrates a heatmap from a triangular correlation matrix with both pearson and indicator filtering: - +This figure illustrates a heatmap from a triangular correlation matrix with both pearson and indicator filtering:


-## Clustering \ No newline at end of file +## Clustering + +- Clustering based on a dissimilarity matrix reveals groups of reactions with similar correlation values. +- Reactions within the same cluster may contribute to the same pathways. + +A `cluster_corr_reactions` function that hierarchically clusters a correlation matrix +alongside a `plot_dendrogram` function that plots a dendrogram were implemented. + +This figure illustrates a dendrogram created from a filtered correlation matrix: +
+

+
+ +Distinct clusters are observed. Graphs will reveal if these clusters interact with other clusters or reactions. + + +## Graphs + +- Graphs creation can reveal networks of correlated reactions, potentially corresponding to metabolic pathways. + +A `graph_corr_matrix` function that creates graphs from a correlation matrix +alongside a `plot_graph` function that plots the graphs were implemented. + +This figure illustrates a graph created from a correlation matrix without pearson or indicator filtering: +
+

+
+ +This figure illustrates a subgraph created from a filtered correlation matrix: +
+

+
+ +This subgraph has 9 nodes that correspond to 9 reactions close to each other in the topology of the E. coli core model. These reactions are: `PGI, G6PDH2R, PGL, GND, RPE, RPI, TKT1, TALA, TKT2`. + +The topology of these reactions can be seen in the figure below from `ESCHER`: +
+

+
+ +We observe that `PGI` seems to contribute to a different pathway. However it shares a common metabolite with `G6PDH2r`. +If we apply a stricter pearson cutoff (e.g., 0.99999), this reaction is removed from this subgraph, leaving only the remaining reactions. This is an important observation: looser cutoffs lead to wider sets of connected reactions, forming larger metabolic pathways. + From 6bd9d229f2317dbce292c7e892ca4e5256dddaab Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Tue, 1 Oct 2024 20:32:03 +0300 Subject: [PATCH 13/18] update blog --- _posts/2024-08-24-GSoC2024-dingo.md | 206 +++++++++++++++++++++------- 1 file changed, 153 insertions(+), 53 deletions(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index 047487f..a82708a 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -1,6 +1,6 @@ --- layout: single -title: "Enhancing dingo: from metabolic models reduction to pathways identification" +title: "Enhancing dingo: from metabolic models reduction to pathways prediction" date: 2024-08-24 author: Sotiris Touliopoulos author_profile: true @@ -14,106 +14,206 @@ hidden: false # Enhancing dingo: from metabolic models reduction to pathways prediction -> #### A contribution to the Google Summer of Code 2024 program +> #### A contribution to the Google Summer of Code 2024 program. Mentors of this project: Vissarion Fisikopoulos, Apostolos Chalkis and Haris Zafeiropoulos. -## Overview -#### A summary of the implemented methods, integrated into dingo: +## Goal of the project -- Preprocess for the reduction of metabolic models. -- Inference of pairwise correlated reactions. -- Visualization of a steady-states correlation matrix. -- Clustering of a steady-states correlation matrix. -- Construction of a weighted graph of the model's reactions with the correlation coefficients as weights. +The goal of this project is to enhance dingo by incorporating pre- and post-sampling features to leverage the increased statistical value of flux sampling. +A preprocessing class, which reduces metabolic networks by removing specific reactions, is integrated into dingo, along with a post-processing function that calculates correlation metrics from pairwise reaction fluxes. -## Preprocess +Additionally, functions for clustering and generating graphs from the correlation matrix are implemented to group reactions and predict metabolic pathways. -- Large metabolic models contain numerous reactions and metabolites. -- Sampling the flux space of such models requires significant computational time due to the high dimensionsionality. -- Model preprocessing can mitigate this issue by removing certain reactions, thus reducing the dimensional space. +Separate functions for visualizing matrices, dendrograms, and graphs are also provided. -A `PreProcess` class was implemented. After initializing an object from this class, -users can call the `reduce` function to remove 3 types of reactions: + +## Preprocess for metabolic models reduction + +- Large metabolic models contain numerous reactions and metabolites. Sampling the flux space of such models requires a significant computational time due to the high dimensionsionality of the convex polytope. +- Model preprocessing can mitigate this issue by removing specific reactions, thereby reducing the dimensional space and computational time. +- Reduced models have decreased complexity and can help researchers better understand basic principles of metabolism. The reduction process is automated and unbiased when a specific objective function is set. + +When an object is initialized from the `PreProcess` class, users can call the `reduce` function on this object to remove three types of reactions: - Blocked reactions: cannot carry a flux in any condition. - Zero-flux reactions: cannot carry a flux while maintaining at least 90% of the maximum growth rate. -- Metabolically less-efficient reactions: require a reduction in growth rate if used. +- Metabolically less-efficient reactions: require a reduction in growth rate if used [1]. + +Users can choose to remove an additional set of reactions, by setting the `extend` parameter of the `reduce` function to `True`. These reactions do not affect the value of the objective function when removed. + +Code that illustrates how to use this class: +```python +cobra_model = load_json_model("ext_data/e_coli_core.json") -Users can choose to remove an additional set of reactions, by setting the `extend` parameter -of the `reduce` function to `True`. These reactions do not affect the value of the objective function when removed. +obj = PreProcess(cobra_model, + # tolerance value to identify significant flux changes + tol = 1e-6, + # boolean variable that define whether to open all exchange reactions to very high flux ranges. + open_exchanges = False) -Reduction with the `PreProcess` class has been tested with various models, some of which are included in dingo’s publication article too. Figures below show the number of remained reactions, after calling the `reduce` function. +removed_reactions, reduced_dingo_model = obj.reduce(extend = False) +``` + +Reduction with the `PreProcess` class has been tested with various models from the BiGG database [2]. Figures below show the number of remained reactions, after applying the `reduce` function:




+This process significantly reduces model complexity. In some cases, it yields core models, which are useful for examining essential metabolic pathways. + -## Correlated Reactions +## Identification of correlated reactions - Reactions in biochemical pathways can be positively correlated, negatively correlated, or uncorrelated. -- Positive correlation: if reaction A is active, then reaction B is also active and vice versa. -- Negative correlation: if reaction A is active, then reaction B is inactive and vice versa. -- Zero correlation: The status of reaction A is independent of the status of reaction B and vice versa. +- Positive correlation means that if reaction A is active, then reaction B is also active and vice versa. +- Negative correlation means that if reaction A is active, then reaction B is inactive and vice versa. +- Zero correlation means that the status of reaction A is independent of the status of reaction B and vice versa. + +Users can sample the flux space of a selected metabolic model using dingo's `PolytopeSampler` class. Then, the `correlated_reactions` function can help them identify possible correlations in pairs of reactions. This function generates a correlation matrix based on the pearson correlation coefficient of pairwise reactions fluxes. + +Users can apply cutoff values to filter the matrix, replacing all values that do not meet the threshold with 0. For instance, if a pearson cutoff is set to 0.80, all correlation values below this are replaced with 0. However, pearson is not the only available correlation metric. -A `correlated_reactions` function that calculates reactions steady states using dingo's `PolytopeSampler` class and creates a correlation matrix was implemented. The correlation matrix is based on the pearson correlation coefficient between pairwise reactions. This function also calculates a copula indicator to filter correlations greater than the pearson cutoff. +Users can also filter correlated reactions using copulas. If you're unfamiliar, copulas allow you to examine how two random variables (in this case, reaction fluxes) are related. You can read more about them [here]((https://waterprogramming.wordpress.com/2017/11/11/an-introduction-to-copulas/)). Below is an example plot of a copula showing the relationship between two random variables: -A `plot_corr_matrix` function to visualize the correlation matrix as a heatmap plot was implemented too. +![](https://www.researchgate.net/publication/362369405/figure/fig4/AS:11431281085484712@1663777261143/Distribution-of-five-types-of-copulas-a-PDF-of-Gaussian-Copula-b-PDF-of-t-Copula.jpg) -This figure illustrates a heatmap from a symmetrical correlation matrix without pearson or indicator filtering: +Copulas are a useful statistical tool for validating significant correlations. However, manually inspecting copula plots can introduce user bias and be time-consuming given the number of paired reactions. An alternative to manual curation is using a metric called the "indicator," which sums the probability values across each copula's diagonal and divides them. The value of the indicator, whether positive or negative, helps validate or reject a possible correlation. Users can set an indicator cutoff that functions similarly to the pearson cutoff. + +In addition to calculating correlations, a `plot_corr_matrix` function is available for visualizing the correlation matrix as a heatmap. This visualization helps users easily identify significant correlations. + +Code that illustrates how to use these functions: +```python +dingo_model = MetabolicNetwork.from_json("ext_data/e_coli_core.json") + +sampler = PolytopeSampler(dingo_model) +steady_states = sampler.generate_steady_states() + +corr_matrix, indicator_dict = correlated_reactions( + steady_states, + pearson_cutoff = 0.99, + indicator_cutoff = 2, + # number of cells to compute the copula + cells = 10, + # value that defines the width of the copula’s diagonal + cop_coeff = 0.3, + # boolean variable that when True, keeps only the lower triangular matrix + lower_triangle = False) + +plot_corr_matrix(corr_matrix, + # list containing reactions’ names + reactions, + # parameter that defines image saving format + format = "svg") +``` + +Here’s an example of a heatmap created from a symmetrical correlation matrix without cutoffs:


-This figure illustrates a heatmap from a triangular correlation matrix with both pearson and indicator filtering: -
-

-
+We can focus on correlations from reactions that belong to the glycolytic and pentose pathways to get some insights from the heatmap. If you're unfamiliar with these reactions' topology, you can visit [ESCHER](https://escher.github.io/#/) [3] and load the core metabolism map of E. coli. +We observe strong pairwise correlations among all reactions in the pentose phosphate pathway (`G6PDH2R, PGL, GND, RPE, RPI, TKT1, TALA, TKT2`). However, for the glycolytic pathway (`PGI, PFK, FBA, TPI, GAPD, PGK, PGM, ENO, PYK`), two reactions, `PFK` and `PYK`, show decreased correlation with the others. -## Clustering +From the literature, it is known that `PFK` and `PYK` are catalyzed by enzymes that regulate glycolysis. These enzymes do not contribute to gluconeogenesis (the reverse pathway of glycolysis) because their reactions are unidirectional. When gluconeogenesis occurs, these enzymes have little or no activity, while other reactions function in reverse. This explains their lower correlation compared to other glycolytic reactions. -- Clustering based on a dissimilarity matrix reveals groups of reactions with similar correlation values. -- Reactions within the same cluster may contribute to the same pathways. +The gluconeogenic reactions that convert the products of `PFK` and `PYK` back into their substrates are `FBP` and `PPS`, respectively. These reactions may become active when there is an excess of `D-fructose-1,6-bisphosphate` or `pyruvate`, and the model must reduce their concentrations to achieve a steady-state condition. -A `cluster_corr_reactions` function that hierarchically clusters a correlation matrix -alongside a `plot_dendrogram` function that plots a dendrogram were implemented. +This example shows how examining the correlation matrix can help researchers study metabolic pathways. However, this process can be challenging in genome-scale models due to the large number of reactions and pathways. Clustering and graph analysis can help automate this process. -This figure illustrates a dendrogram created from a filtered correlation matrix: + +## Clustering of the correlation matrix + +- Clustering based on a dissimilarity matrix reveals groups of reactions with similar correlation values across the matrix. +- Reactions within the same cluster may contribute to the same pathways. + +The `cluster_corr_reactions` function hierarchically clusters the correlation matrix, and the `plot_dendrogram` function plots the resulting dendrogram. + +Code that illustrates how to use these functions: +```python +dissimilarity_matrix, labels, clusters = cluster_corr_reactions( + corr_matrix, + # list containing reactions’ names + reactions, + # defines the type of clustering linkage (options: single, average, complete, ward) + linkage = "ward", + # defines a threshold to cut the dendrogram at a specific height + t = 10.0, + # A boolean variable; if True, the dissimilarity matrix is calculated by subtracting absolute values from 1 + correction = True) + +plot_dendrogram(dissimilarity_matrix, + reactions, + # specifies whether reaction names will appear on the x-axis + plot_labels = True, + t = 10.0, + linkage = "ward") +``` + +Here is an example of a dendrogram created from a correlation matrix with a pearson cutoff of 0.9999:


-Distinct clusters are observed. Graphs will reveal if these clusters interact with other clusters or reactions. +We observe well-separated and distinct clusters. Some clusters contain reactions from the glycolytic and pentose phosphate pathways, as discussed earlier. Additionally, there are clusters with reactions from the citric acid cycle and other lesser-known pathways. + +In the cluster representing the pentose phosphate pathway, we notice that `PGI`, a glycolytic reaction, is included. This is likely because it shares a common metabolite (`D-glucose-6-phosphate`) with `G6PDH2r`, the first reaction in the pentose phosphate pathway. + +However, applying a stricter pearson cutoff (e.g., 0.99999) removes `PGI` from this cluster, leaving only the reactions of the pentose phosphate pathway. + +This is an early but promising indication that clustering the correlation matrix can reveal groups of reactions that contribute to the same pathways. + +In the future, other clustering methods such as `k-means`, `HDBSCAN`, and `biclustering` at the sample level will be explored to identify more specific clusters. Let me briefly explain: Some pathways have distinct phases, each serving a unique role in metabolism. Both glycolysis and the pentose phosphate pathways can be divided into two phases. + +For example, the pentose phosphate pathway has an oxidative and a non-oxidative phase. Reactions such as `G6PDH2R`, `PGL`, and `GND` belong to the oxidative phase, which produces `NADPH`. Hierarchical clustering may not distinguish these phases, but using alternative methods not solely based on the correlation matrix could reveal two distinct clusters representing the two phases. -## Graphs +## Graphs from the correlation matrix -- Graphs creation can reveal networks of correlated reactions, potentially corresponding to metabolic pathways. +- Creating graphs can reveal networks of correlated reactions, which may correspond to metabolic pathways. +- These graphs are generated from the correlation matrix, where reactions are represented as nodes and correlation coefficients as edges. -A `graph_corr_matrix` function that creates graphs from a correlation matrix -alongside a `plot_graph` function that plots the graphs were implemented. +The `graph_corr_matrix` function generates graphs from the correlation matrix, and the `plot_graph` function plots these graphs. -This figure illustrates a graph created from a correlation matrix without pearson or indicator filtering: +Code that illustrates how to use these functions: +```python +graphs, layouts = graph_corr_matrix(corr_matrix, + reactions, + # boolean value; if True, transforms the correlation matrix values into absolute values + correction = True, + # nested list containing sublists of reactions within the same cluster, created by the `cluster_corr_reactions` function. + clusters = clusters) + +plot_graph( + # graph object returned by the graph_corr_matrix function + graph, + # layout corresponding to the given graph object + layout) + +``` +Here’s an example of a graph created from a correlation matrix without cutoffs:


-This figure illustrates a subgraph created from a filtered correlation matrix: -
-

-
+The width of each edge represents the correlation strength, scaled from 0 to 1. In addition to the initial graph, users can create and plot subgraphs that represent reaction subnetworks. These subnetworks can serve as supplementary information to clustering results, allowing researchers to identify similarities and differences between the two methods and adjust the structure of desired clusters accordingly. -This subgraph has 9 nodes that correspond to 9 reactions close to each other in the topology of the E. coli core model. These reactions are: `PGI, G6PDH2R, PGL, GND, RPE, RPI, TKT1, TALA, TKT2`. +However, the insights gained from graphs extend beyond pathway prediction. The more connected a node (reaction) is, the more essential its role likely is in metabolism. Conversely, nodes with fewer connections may represent reactions that can be removed without significantly affecting the system. + + +## Conclusion + +Flux sampling offers significant statistical value, enhancing research in metabolic models. In this blog, we explored the pre- and post-sampling features integrated into the dingo package. From model reduction to pathway prediction, these features are designed to assist researchers in studying fundamental aspects of metabolic function. -The topology of these reactions can be seen in the figure below from `ESCHER`: -
-

-
-We observe that `PGI` seems to contribute to a different pathway. However it shares a common metabolite with `G6PDH2r`. -If we apply a stricter pearson cutoff (e.g., 0.99999), this reaction is removed from this subgraph, leaving only the remaining reactions. This is an important observation: looser cutoffs lead to wider sets of connected reactions, forming larger metabolic pathways. +## References +- [1] Lewis NE, Hixson KK, Conrad TM, et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome‐scale models. Molecular Systems Biology. 2010;6(1). doi:https://doi.org/10.1038/msb.2010.47 +‌ +- [2] King ZA, Lu J, Dräger A, et al. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Research. 2015;44(D1):D515-D522. doi:https://doi.org/10.1093/nar/gkv1049 +‌ +- [3] King ZA, Dräger A, Ebrahim A, Sonnenschein N, Lewis NE, Palsson BO. Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways. Gardner PP, ed. PLOS Computational Biology. 2015;11(8):e1004321. doi:https://doi.org/10.1371/journal.pcbi.1004321 \ No newline at end of file From d262f4673761656d6c6baf2ba77dcf3c647e7dff Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Tue, 1 Oct 2024 20:47:18 +0300 Subject: [PATCH 14/18] update blog --- _posts/2024-08-24-GSoC2024-dingo.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index a82708a..4888ed4 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -77,7 +77,7 @@ Users can sample the flux space of a selected metabolic model using dingo's `Pol Users can apply cutoff values to filter the matrix, replacing all values that do not meet the threshold with 0. For instance, if a pearson cutoff is set to 0.80, all correlation values below this are replaced with 0. However, pearson is not the only available correlation metric. -Users can also filter correlated reactions using copulas. If you're unfamiliar, copulas allow you to examine how two random variables (in this case, reaction fluxes) are related. You can read more about them [here]((https://waterprogramming.wordpress.com/2017/11/11/an-introduction-to-copulas/)). Below is an example plot of a copula showing the relationship between two random variables: +Users can also filter correlated reactions using copulas. If you're unfamiliar, copulas allow you to examine how two random variables (in this case, reaction fluxes) are related. You can read more about them [here](https://waterprogramming.wordpress.com/2017/11/11/an-introduction-to-copulas/). Below is an example plot of a copula showing the relationship between two random variables: ![](https://www.researchgate.net/publication/362369405/figure/fig4/AS:11431281085484712@1663777261143/Distribution-of-five-types-of-copulas-a-PDF-of-Gaussian-Copula-b-PDF-of-t-Copula.jpg) @@ -110,6 +110,8 @@ plot_corr_matrix(corr_matrix, format = "svg") ``` +The `e_coli_core.json` model we use represents the core metabolism of the E. coli. You can find the model and related data [here](http://bigg.ucsd.edu/models/e_coli_core). + Here’s an example of a heatmap created from a symmetrical correlation matrix without cutoffs:


From 32bb5294d5e4db39b73e852f62f7a5ac4721f9ee Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Tue, 1 Oct 2024 20:54:08 +0300 Subject: [PATCH 15/18] update blog --- _posts/2024-08-24-GSoC2024-dingo.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index 4888ed4..1681049 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -15,7 +15,16 @@ hidden: false # Enhancing dingo: from metabolic models reduction to pathways prediction -> #### A contribution to the Google Summer of Code 2024 program. Mentors of this project: Vissarion Fisikopoulos, Apostolos Chalkis and Haris Zafeiropoulos. +
+
+ Sotiris Touliopoulos +
+ Contributor to Google Summer of Code 2024 with GeomScale +
+
+ + +> #### Mentors of this project: Vissarion Fisikopoulos, Apostolos Chalkis and Haris Zafeiropoulos. ## Goal of the project From 44433b17a591b6501c15ff244e9d18040b87ed7d Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Thu, 24 Oct 2024 17:12:47 +0300 Subject: [PATCH 16/18] update blog --- _posts/2024-08-24-GSoC2024-dingo.md | 111 +++++++++++++++++++++------- 1 file changed, 83 insertions(+), 28 deletions(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index 1681049..e5352cb 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -29,7 +29,7 @@ hidden: false ## Goal of the project -The goal of this project is to enhance dingo by incorporating pre- and post-sampling features to leverage the increased statistical value of flux sampling. +The goal of this project is to enhance [dingo](https://github.com/GeomScale/dingo) by incorporating pre- and post-sampling features to leverage the increased statistical value of `flux sampling`. For an introduction to dingo and metabolic networks you can refer to this [blog](https://geomscale.github.io/metabolic-networks/). A preprocessing class, which reduces metabolic networks by removing specific reactions, is integrated into dingo, along with a post-processing function that calculates correlation metrics from pairwise reaction fluxes. @@ -40,32 +40,44 @@ Separate functions for visualizing matrices, dendrograms, and graphs are also pr ## Preprocess for metabolic models reduction -- Large metabolic models contain numerous reactions and metabolites. Sampling the flux space of such models requires a significant computational time due to the high dimensionsionality of the convex polytope. -- Model preprocessing can mitigate this issue by removing specific reactions, thereby reducing the dimensional space and computational time. -- Reduced models have decreased complexity and can help researchers better understand basic principles of metabolism. The reduction process is automated and unbiased when a specific objective function is set. +- Large metabolic models contain numerous reactions and metabolites. When sampling the flux space of such complex models computational intractability may occur [1] [2]. +- Model reduction can mitigate this issue by removing specific reactions, thereby decreasing model complexity and computational time. For an estimate of the relationship between number of reactions and computational time you can see figure 1 in dingo's article [3]. +- Reduced models have decreased complexity and can help researchers study basic principles of metabolism too [4]. +- Core models may also find applications in biotechnology, where minimal cells with reduced functionality are designed to boost the production of specific chemicals. https://doi.org/10.1016/j.compchemeng.2011.05.006 [1]. -When an object is initialized from the `PreProcess` class, users can call the `reduce` function on this object to remove three types of reactions: +The implemented [Preprocess](https://github.com/GeomScale/dingo/blob/develop/dingo/preprocess.py) class we present here, aims to reduce metabolic models by removing 3 types of reactions: - Blocked reactions: cannot carry a flux in any condition. - Zero-flux reactions: cannot carry a flux while maintaining at least 90% of the maximum growth rate. -- Metabolically less-efficient reactions: require a reduction in growth rate if used [1]. +- Metabolically less-efficient reactions: require a reduction in growth rate if used [5]. -Users can choose to remove an additional set of reactions, by setting the `extend` parameter of the `reduce` function to `True`. These reactions do not affect the value of the objective function when removed. +After an object is initialized from the `PreProcess` class, given a cobra model as input, users can call the `reduce` function on this object to remove the 3 types of reactions. This function sets the lower and upper bounds of these reactions to 0. -Code that illustrates how to use this class: +Users can also choose to remove an additional set of reactions, by setting the `extend` parameter of the `reduce` function to `True`. These reactions do not affect the value of the objective function when removed. The `reduce` function returns a reduced dingo model, in order the user can directly apply the classes and functions of the `dingo` package on this model. + +The additional reactions are removed based on a priority order until the growth flux is significantly altered. When this happens, the algorithm stops the reduction. The priority order is formed by calculating the sum of correlations each reaction has with the rest. Reactions with the smaller sum are removed first. The function that calculates the correlation metrics will be analyzed later in this blog. + +Now let's see how to use the `Preprocess` class and the `reduce` function: ```python +# load a cobra model cobra_model = load_json_model("ext_data/e_coli_core.json") -obj = PreProcess(cobra_model, +obj = PreProcess(# cobra model as input + cobra_model, # tolerance value to identify significant flux changes tol = 1e-6, # boolean variable that define whether to open all exchange reactions to very high flux ranges. open_exchanges = False) +# list of removed reactions, dingo reduced model removed_reactions, reduced_dingo_model = obj.reduce(extend = False) ``` -Reduction with the `PreProcess` class has been tested with various models from the BiGG database [2]. Figures below show the number of remained reactions, after applying the `reduce` function: +For more information on the `open_exchanges` variable you can refer to the [cobrapy](https://cobrapy.readthedocs.io/en/devel/autoapi/cobra/flux_analysis/index.html#cobra.flux_analysis.find_blocked_reactions) package. + +The `e_coli_core.json` model we use represents the core metabolism of the E. coli. You can find the model and related data [here](http://bigg.ucsd.edu/models/e_coli_core). + +Reduction with the `PreProcess` class has been tested with various models from the BiGG database [6]. Figures below show the number of remained reactions, after applying the `reduce` function:


@@ -86,24 +98,28 @@ Users can sample the flux space of a selected metabolic model using dingo's `Pol Users can apply cutoff values to filter the matrix, replacing all values that do not meet the threshold with 0. For instance, if a pearson cutoff is set to 0.80, all correlation values below this are replaced with 0. However, pearson is not the only available correlation metric. -Users can also filter correlated reactions using copulas. If you're unfamiliar, copulas allow you to examine how two random variables (in this case, reaction fluxes) are related. You can read more about them [here](https://waterprogramming.wordpress.com/2017/11/11/an-introduction-to-copulas/). Below is an example plot of a copula showing the relationship between two random variables: +Users can also filter correlated reactions using copulas. Copulas allow you to examine how two random variables (in this case, reaction fluxes) are related. You can read more about them [here](https://waterprogramming.wordpress.com/2017/11/11/an-introduction-to-copulas/). Below is an example plot of a copula showing the relationship between two random variables: ![](https://www.researchgate.net/publication/362369405/figure/fig4/AS:11431281085484712@1663777261143/Distribution-of-five-types-of-copulas-a-PDF-of-Gaussian-Copula-b-PDF-of-t-Copula.jpg) Copulas are a useful statistical tool for validating significant correlations. However, manually inspecting copula plots can introduce user bias and be time-consuming given the number of paired reactions. An alternative to manual curation is using a metric called the "indicator," which sums the probability values across each copula's diagonal and divides them. The value of the indicator, whether positive or negative, helps validate or reject a possible correlation. Users can set an indicator cutoff that functions similarly to the pearson cutoff. -In addition to calculating correlations, a `plot_corr_matrix` function is available for visualizing the correlation matrix as a heatmap. This visualization helps users easily identify significant correlations. - -Code that illustrates how to use these functions: +The following code shows how you can sample steady states of a metabolic network: ```python dingo_model = MetabolicNetwork.from_json("ext_data/e_coli_core.json") sampler = PolytopeSampler(dingo_model) steady_states = sampler.generate_steady_states() +``` +And here you can see how to use the steady states to identify correlated reactions with the `correlated_reactions` function: +```python corr_matrix, indicator_dict = correlated_reactions( + # the reactions steady states steady_states, + # cutoff to remove correlation below this pearson value pearson_cutoff = 0.99, + # cutoff to remove correlations below this indicator value indicator_cutoff = 2, # number of cells to compute the copula cells = 10, @@ -111,24 +127,35 @@ corr_matrix, indicator_dict = correlated_reactions( cop_coeff = 0.3, # boolean variable that when True, keeps only the lower triangular matrix lower_triangle = False) +``` + +Except from the correlation matrix (corr_matrix), a dictionary object (indicator_dict) is also returned from this function. This dictionary contains all the pearson and indicator values for each set of reactions. -plot_corr_matrix(corr_matrix, +In addition to calculating correlations, a `plot_corr_matrix` function is available for visualizing the correlation matrix as a heatmap. This visualization helps users easily identify significant correlations. +```python +plot_corr_matrix( + # the correlation matrix + corr_matrix, # list containing reactions’ names reactions, # parameter that defines image saving format format = "svg") ``` -The `e_coli_core.json` model we use represents the core metabolism of the E. coli. You can find the model and related data [here](http://bigg.ucsd.edu/models/e_coli_core). - Here’s an example of a heatmap created from a symmetrical correlation matrix without cutoffs:


-We can focus on correlations from reactions that belong to the glycolytic and pentose pathways to get some insights from the heatmap. If you're unfamiliar with these reactions' topology, you can visit [ESCHER](https://escher.github.io/#/) [3] and load the core metabolism map of E. coli. +Each position in the heatmap represents the pearson correlation value between a set of reactions, with different rows and columns corresponding to different reactions. The color code indicates pearson correlation values on a scale from -1 to +1. Values near +1 (strong positive correlation) are shown in red, values near 0 in white, and values near -1 (strong negative correlation) in blue. -We observe strong pairwise correlations among all reactions in the pentose phosphate pathway (`G6PDH2R, PGL, GND, RPE, RPI, TKT1, TALA, TKT2`). However, for the glycolytic pathway (`PGI, PFK, FBA, TPI, GAPD, PGK, PGM, ENO, PYK`), two reactions, `PFK` and `PYK`, show decreased correlation with the others. +`Dingo` utilizes the `plotly` package for heatmap visualization, allowing you to easily find each correlation value by hovering over a specific position when the heatmap is loaded in a tab. + +Keep in mind that larger models may produce heatmaps where reactions names will not be visible. You don't have to zoom in the heatmap to identify the desired pearson values. The `indicator_dict` object contains all the information you need to identify correlations of interest. + +We can now examine the previous heatmap and focus on correlations from reactions that belong to the glycolytic and pentose pathways to get some insights. You can visit [ESCHER](https://escher.github.io/#/) [7] and load the core metabolism map of E. coli to examine the reactions topology. + +We observe strong pairwise correlations among all reactions in the pentose phosphate pathway (`G6PDH2R, PGL, GND, RPE, RPI, TKT1, TALA, TKT2`). However, for the glycolytic pathway (`PGI, PFK, FBA, TPI, GAPD, PGK, PGM, ENO, PYK`), two reactions, `PFK` and `PYK`, show decreased correlation with the others. Their corresponding correlation values are 0.81 and 0.30. From the literature, it is known that `PFK` and `PYK` are catalyzed by enzymes that regulate glycolysis. These enzymes do not contribute to gluconeogenesis (the reverse pathway of glycolysis) because their reactions are unidirectional. When gluconeogenesis occurs, these enzymes have little or no activity, while other reactions function in reverse. This explains their lower correlation compared to other glycolytic reactions. @@ -144,9 +171,10 @@ This example shows how examining the correlation matrix can help researchers stu The `cluster_corr_reactions` function hierarchically clusters the correlation matrix, and the `plot_dendrogram` function plots the resulting dendrogram. -Code that illustrates how to use these functions: +Code that illustrates how to use the `cluster_corr_reactions` function: ```python dissimilarity_matrix, labels, clusters = cluster_corr_reactions( + # the correlation matrix corr_matrix, # list containing reactions’ names reactions, @@ -156,12 +184,21 @@ dissimilarity_matrix, labels, clusters = cluster_corr_reactions( t = 10.0, # A boolean variable; if True, the dissimilarity matrix is calculated by subtracting absolute values from 1 correction = True) +``` + +In addition to the `dissimilarity_matrix` that is calculated by subtracting each pearson value from 1, two other objects are returned. The `labels` object is a list object, containing the index labels of the reactions that correspond to a specific cluster. The `clusters` object is a nested list containing sublists of reactions within the same cluster. -plot_dendrogram(dissimilarity_matrix, +Code that illustrates how to use the `plot_dendrogram` function: +```python +plot_dendrogram(# the dissimilarity matrix + dissimilarity_matrix, + # list with reactions names reactions, - # specifies whether reaction names will appear on the x-axis + # variable that specifies whether reaction names will appear on the x-axis plot_labels = True, + # threshold to cut the dendrogram t = 10.0, + # variable that defines the linkage type for clustering linkage = "ward") ``` @@ -170,7 +207,9 @@ Here is an example of a dendrogram created from a correlation matrix with a pear

-We observe well-separated and distinct clusters. Some clusters contain reactions from the glycolytic and pentose phosphate pathways, as discussed earlier. Additionally, there are clusters with reactions from the citric acid cycle and other lesser-known pathways. +The leaves of the dendrogram represent individual reactions. We observe several well-separated, distinct clusters of reactions, indicating that within each cluster, there are groups of reactions with similar flux distributions, while reactions outside the cluster exhibit significant differences. These distinct clusters may contain reactions that contribute to the same pathways. + +Some clusters contain reactions from the glycolytic and pentose phosphate pathways, as discussed earlier. Additionally, there are clusters with reactions from the citric acid cycle and other lesser-known pathways. In the cluster representing the pentose phosphate pathway, we notice that `PGI`, a glycolytic reaction, is included. This is likely because it shares a common metabolite (`D-glucose-6-phosphate`) with `G6PDH2r`, the first reaction in the pentose phosphate pathway. @@ -192,7 +231,9 @@ The `graph_corr_matrix` function generates graphs from the correlation matrix, a Code that illustrates how to use these functions: ```python -graphs, layouts = graph_corr_matrix(corr_matrix, +graphs, layouts = graph_corr_matrix(# the correlation matrix + corr_matrix, + # list with reactions names reactions, # boolean value; if True, transforms the correlation matrix values into absolute values correction = True, @@ -215,16 +256,30 @@ The width of each edge represents the correlation strength, scaled from 0 to 1. However, the insights gained from graphs extend beyond pathway prediction. The more connected a node (reaction) is, the more essential its role likely is in metabolism. Conversely, nodes with fewer connections may represent reactions that can be removed without significantly affecting the system. +A potential example of an essential node is the one corresponding to the `NADH16` reaction. In addition to reactions that are topologically close to `NADH16`, this node is also linked to reactions from the glycolytic pathway, even though they are not directly connected in the topology. `NADH16` represents the enzyme `NADH dehydrogenase`, which produces the cofactor `NAD`. + +`NAD` is the oxidized form of `NADH`, with `NADH` being the reduced form. Catabolic pathways, like glycolysis, consume `NAD` and generate `NADH`. This creates a need for `NAD` to be regenerated to allow glycolysis and other catabolic processes to continue. Therefore, `NADH dehydrogenase` plays a critical role in the glycolytic pathway, even though it may not appear to be directly connected in the topology of the E. coli core model. + +In contrast, two nodes with strong connections between each other but weak connections with the rest are those corresponding to `FRD7` and `SUCDi`. These reactions form a loop in the topology, with the products of one serve as the substrates for the other. The activity of each reaction depends on the other, with minimal influence from the remaining reactions. + ## Conclusion -Flux sampling offers significant statistical value, enhancing research in metabolic models. In this blog, we explored the pre- and post-sampling features integrated into the dingo package. From model reduction to pathway prediction, these features are designed to assist researchers in studying fundamental aspects of metabolic function. +Flux sampling offers significant statistical value, enhancing research in metabolic models. In this blog, we explored the pre- and post-sampling features integrated into the `dingo` package. From model reduction to pathway prediction, these features are designed to assist researchers in studying fundamental aspects of metabolic function. ## References -- [1] Lewis NE, Hixson KK, Conrad TM, et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome‐scale models. Molecular Systems Biology. 2010;6(1). doi:https://doi.org/10.1038/msb.2010.47 +- [1] Jonnalagadda, S., Balaji Balagurunathan and Srinivasan, R. (2011). Graph theory augmented math programming approach to identify minimal reaction sets in metabolic networks. Computers & Chemical Engineering, 35(11), pp.2366–2377. doi:https://doi.org/10.1016/j.compchemeng.2011.05.006. + +- [2] Erdrich, P., Steuer, R. and Steffen Klamt (2015). An algorithm for the reduction of genome-scale metabolic network models to meaningful core models. BMC Systems Biology, 9(1). doi:https://doi.org/10.1186/s12918-015-0191-x. + +‌- [3] Apostolos Chalkis, Vissarion Fisikopoulos, Tsigaridas, E. and Haris Zafeiropoulos (2024). dingo: a Python package for metabolic flux sampling. Bioinformatics Advances, 4(1). doi:https://doi.org/10.1093/bioadv/vbae037. + +‌- [4] ‌Meric Ataman, Hernandez, D.F., Georgios Fengos and Vassily Hatzimanikatis (2017). redGEM: Systematic reduction and analysis of genome-scale metabolic reconstructions for development of consistent core metabolic models. PLoS Computational Biology, 13(7), pp.e1005444–e1005444. doi:https://doi.org/10.1371/journal.pcbi.1005444. + +- [5] Lewis NE, Hixson KK, Conrad TM, et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome‐scale models. Molecular Systems Biology. 2010;6(1). doi:https://doi.org/10.1038/msb.2010.47 ‌ -- [2] King ZA, Lu J, Dräger A, et al. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Research. 2015;44(D1):D515-D522. doi:https://doi.org/10.1093/nar/gkv1049 +- [6] King ZA, Lu J, Dräger A, et al. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Research. 2015;44(D1):D515-D522. doi:https://doi.org/10.1093/nar/gkv1049 ‌ -- [3] King ZA, Dräger A, Ebrahim A, Sonnenschein N, Lewis NE, Palsson BO. Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways. Gardner PP, ed. PLOS Computational Biology. 2015;11(8):e1004321. doi:https://doi.org/10.1371/journal.pcbi.1004321 \ No newline at end of file +- [7] King ZA, Dräger A, Ebrahim A, Sonnenschein N, Lewis NE, Palsson BO. Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways. Gardner PP, ed. PLOS Computational Biology. 2015;11(8):e1004321. doi:https://doi.org/10.1371/journal.pcbi.1004321 From 9071e7c828a18dee1c71fa8f248b41e10abcd86b Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Thu, 24 Oct 2024 17:20:24 +0300 Subject: [PATCH 17/18] update blog --- _posts/2024-08-24-GSoC2024-dingo.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index e5352cb..f91988c 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -41,15 +41,15 @@ Separate functions for visualizing matrices, dendrograms, and graphs are also pr ## Preprocess for metabolic models reduction - Large metabolic models contain numerous reactions and metabolites. When sampling the flux space of such complex models computational intractability may occur [1] [2]. -- Model reduction can mitigate this issue by removing specific reactions, thereby decreasing model complexity and computational time. For an estimate of the relationship between number of reactions and computational time you can see figure 1 in dingo's article [3]. -- Reduced models have decreased complexity and can help researchers study basic principles of metabolism too [4]. +- Model reduction can mitigate this issue by removing specific reactions, thereby decreasing model complexity and computational time. For an estimate of the relationship between number of reactions and computational time you can see figure 1 in dingo's article [3]. +- Reduced models have decreased complexity and can help researchers study basic principles of metabolism too [4]. - Core models may also find applications in biotechnology, where minimal cells with reduced functionality are designed to boost the production of specific chemicals. https://doi.org/10.1016/j.compchemeng.2011.05.006 [1]. The implemented [Preprocess](https://github.com/GeomScale/dingo/blob/develop/dingo/preprocess.py) class we present here, aims to reduce metabolic models by removing 3 types of reactions: - Blocked reactions: cannot carry a flux in any condition. - Zero-flux reactions: cannot carry a flux while maintaining at least 90% of the maximum growth rate. -- Metabolically less-efficient reactions: require a reduction in growth rate if used [5]. +- Metabolically less-efficient reactions: require a reduction in growth rate if used [5]. After an object is initialized from the `PreProcess` class, given a cobra model as input, users can call the `reduce` function on this object to remove the 3 types of reactions. This function sets the lower and upper bounds of these reactions to 0. @@ -77,7 +77,7 @@ For more information on the `open_exchanges` variable you can refer to the [cobr The `e_coli_core.json` model we use represents the core metabolism of the E. coli. You can find the model and related data [here](http://bigg.ucsd.edu/models/e_coli_core). -Reduction with the `PreProcess` class has been tested with various models from the BiGG database [6]. Figures below show the number of remained reactions, after applying the `reduce` function: +Reduction with the `PreProcess` class has been tested with various models from the BiGG database [6]. Figures below show the number of remained reactions, after applying the `reduce` function:


@@ -153,7 +153,7 @@ Each position in the heatmap represents the pearson correlation value between a Keep in mind that larger models may produce heatmaps where reactions names will not be visible. You don't have to zoom in the heatmap to identify the desired pearson values. The `indicator_dict` object contains all the information you need to identify correlations of interest. -We can now examine the previous heatmap and focus on correlations from reactions that belong to the glycolytic and pentose pathways to get some insights. You can visit [ESCHER](https://escher.github.io/#/) [7] and load the core metabolism map of E. coli to examine the reactions topology. +We can now examine the previous heatmap and focus on correlations from reactions that belong to the glycolytic and pentose pathways to get some insights. You can visit [ESCHER](https://escher.github.io/#/) [7] and load the core metabolism map of E. coli to examine the reactions topology. We observe strong pairwise correlations among all reactions in the pentose phosphate pathway (`G6PDH2R, PGL, GND, RPE, RPI, TKT1, TALA, TKT2`). However, for the glycolytic pathway (`PGI, PFK, FBA, TPI, GAPD, PGK, PGM, ENO, PYK`), two reactions, `PFK` and `PYK`, show decreased correlation with the others. Their corresponding correlation values are 0.81 and 0.30. @@ -274,12 +274,12 @@ Flux sampling offers significant statistical value, enhancing research in metabo - [2] Erdrich, P., Steuer, R. and Steffen Klamt (2015). An algorithm for the reduction of genome-scale metabolic network models to meaningful core models. BMC Systems Biology, 9(1). doi:https://doi.org/10.1186/s12918-015-0191-x. -‌- [3] Apostolos Chalkis, Vissarion Fisikopoulos, Tsigaridas, E. and Haris Zafeiropoulos (2024). dingo: a Python package for metabolic flux sampling. Bioinformatics Advances, 4(1). doi:https://doi.org/10.1093/bioadv/vbae037. +‌- [3] Apostolos Chalkis, Vissarion Fisikopoulos, Tsigaridas, E. and Haris Zafeiropoulos (2024). dingo: a Python package for metabolic flux sampling. Bioinformatics Advances, 4(1). doi:https://doi.org/10.1093/bioadv/vbae037. -‌- [4] ‌Meric Ataman, Hernandez, D.F., Georgios Fengos and Vassily Hatzimanikatis (2017). redGEM: Systematic reduction and analysis of genome-scale metabolic reconstructions for development of consistent core metabolic models. PLoS Computational Biology, 13(7), pp.e1005444–e1005444. doi:https://doi.org/10.1371/journal.pcbi.1005444. +‌- [4] ‌Meric Ataman, Hernandez, D.F., Georgios Fengos and Vassily Hatzimanikatis (2017). redGEM: Systematic reduction and analysis of genome-scale metabolic reconstructions for development of consistent core metabolic models. PLoS Computational Biology, 13(7), pp.e1005444–e1005444. doi:https://doi.org/10.1371/journal.pcbi.1005444. -- [5] Lewis NE, Hixson KK, Conrad TM, et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome‐scale models. Molecular Systems Biology. 2010;6(1). doi:https://doi.org/10.1038/msb.2010.47 +- [5] Lewis NE, Hixson KK, Conrad TM, et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome‐scale models. Molecular Systems Biology. 2010;6(1). doi:https://doi.org/10.1038/msb.2010.47 ‌ -- [6] King ZA, Lu J, Dräger A, et al. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Research. 2015;44(D1):D515-D522. doi:https://doi.org/10.1093/nar/gkv1049 +- [6] King ZA, Lu J, Dräger A, et al. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Research. 2015;44(D1):D515-D522. doi:https://doi.org/10.1093/nar/gkv1049 ‌ -- [7] King ZA, Dräger A, Ebrahim A, Sonnenschein N, Lewis NE, Palsson BO. Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways. Gardner PP, ed. PLOS Computational Biology. 2015;11(8):e1004321. doi:https://doi.org/10.1371/journal.pcbi.1004321 +- [7] King ZA, Dräger A, Ebrahim A, Sonnenschein N, Lewis NE, Palsson BO. Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways. Gardner PP, ed. PLOS Computational Biology. 2015;11(8):e1004321. doi:https://doi.org/10.1371/journal.pcbi.1004321 From c6cb069d7d4c09c372f6db6b0a544c6c1b945c6a Mon Sep 17 00:00:00 2001 From: SotirisTouliopoulos Date: Thu, 24 Oct 2024 17:22:27 +0300 Subject: [PATCH 18/18] fix references --- _posts/2024-08-24-GSoC2024-dingo.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/_posts/2024-08-24-GSoC2024-dingo.md b/_posts/2024-08-24-GSoC2024-dingo.md index f91988c..a8d10a1 100644 --- a/_posts/2024-08-24-GSoC2024-dingo.md +++ b/_posts/2024-08-24-GSoC2024-dingo.md @@ -274,12 +274,13 @@ Flux sampling offers significant statistical value, enhancing research in metabo - [2] Erdrich, P., Steuer, R. and Steffen Klamt (2015). An algorithm for the reduction of genome-scale metabolic network models to meaningful core models. BMC Systems Biology, 9(1). doi:https://doi.org/10.1186/s12918-015-0191-x. -‌- [3] Apostolos Chalkis, Vissarion Fisikopoulos, Tsigaridas, E. and Haris Zafeiropoulos (2024). dingo: a Python package for metabolic flux sampling. Bioinformatics Advances, 4(1). doi:https://doi.org/10.1093/bioadv/vbae037. +- [3] Apostolos Chalkis, Vissarion Fisikopoulos, Tsigaridas, E. and Haris Zafeiropoulos (2024). dingo: a Python package for metabolic flux sampling. Bioinformatics Advances, 4(1). doi:https://doi.org/10.1093/bioadv/vbae037. -‌- [4] ‌Meric Ataman, Hernandez, D.F., Georgios Fengos and Vassily Hatzimanikatis (2017). redGEM: Systematic reduction and analysis of genome-scale metabolic reconstructions for development of consistent core metabolic models. PLoS Computational Biology, 13(7), pp.e1005444–e1005444. doi:https://doi.org/10.1371/journal.pcbi.1005444. +- [4] Meric Ataman, Hernandez, D.F., Georgios Fengos and Vassily Hatzimanikatis (2017). redGEM: Systematic reduction and analysis of genome-scale metabolic reconstructions for development of consistent core metabolic models. PLoS Computational Biology, 13(7), pp.e1005444–e1005444. doi:https://doi.org/10.1371/journal.pcbi.1005444. - [5] Lewis NE, Hixson KK, Conrad TM, et al. Omic data from evolved E. coli are consistent with computed optimal growth from genome‐scale models. Molecular Systems Biology. 2010;6(1). doi:https://doi.org/10.1038/msb.2010.47 -‌ + - [6] King ZA, Lu J, Dräger A, et al. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Research. 2015;44(D1):D515-D522. doi:https://doi.org/10.1093/nar/gkv1049 -‌ + - [7] King ZA, Dräger A, Ebrahim A, Sonnenschein N, Lewis NE, Palsson BO. Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways. Gardner PP, ed. PLOS Computational Biology. 2015;11(8):e1004321. doi:https://doi.org/10.1371/journal.pcbi.1004321 +