Merge branch 'development' of https://github.com/opencobra/Medusa int…

…o development
opencobra · Jul 28, 2020 · d4d85fb · d4d85fb
2 parents 51cd01c + 6e91477
commit d4d85fb
Show file tree

Hide file tree

Showing 17 changed files with 6,061 additions and 149 deletions.
diff --git a/docs/benchmark_iter_gapfill.svg b/docs/benchmark_iter_gapfill.svg
diff --git a/docs/benchmark_iterative_gapfill.ipynb b/docs/benchmark_iterative_gapfill.ipynb
diff --git a/docs/benchmarking.ipynb → docs/benchmark_mem_cpu.ipynb b/docs/benchmarking.ipynb → docs/benchmark_mem_cpu.ipynb
@@ -7,13 +7,13 @@
     "# Ensemble Size and Speed Benchmarking\n",
     "\n",
     "`Ensembles` are specifically designed for optimal usability, memory usage, and computational speed. In this tutorial we explore the size and speed related characteristics of `Ensembles` compared to using the equivalent individual models. We aim to begin to answer the following questions: \n",
-    "- How much RAM does an ensemble use when working with it compared to working with the equivalent individual models?\n",
-    "- How much memory is used to store ensembles compared to the equivalent individual models?\n",
+    "- How much memory does an ensemble use when working with it compared to working with the equivalent individual models?\n",
+    "- How much disk space is used to store ensembles compared to the equivalent individual models?\n",
     "- How long does it take to run FBA for all members of an ensemble compared to the equivalent individual models?\n",
     "\n",
     "## Ensemble memory requirements during use and when saved\n",
     "\n",
-    "`Ensembles` are structured to minimize the amount of RAM required when loaded and when being saved. One of the major challenges when working with ensembles of models is having all of the models readily available in RAM while conducting analyses. With efficient packaging of the features that are different between members of an ensemble, we were able to significantly reduce the amount of RAM and hard drive space required for working with ensembles of models. "
+    "`Ensembles` are structured to minimize the amount of memory required when loaded and when being saved. One of the major challenges when working with ensembles of models is having all of the models readily available in memory while conducting analyses. With efficient packaging of the features that are different between members of an ensemble, we were able to significantly reduce the amount of memory and hard drive space required for working with ensembles of models. "
    ]
   },
   {
@@ -26,6 +26,7 @@
     "import os\n",
     "import psutil\n",
     "import medusa\n",
+    "import numpy\n",
     "from medusa.test import create_test_ensemble"
    ]
   },
@@ -38,7 +39,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "63.58 MB\n"
+      "57.82 MB\n"
      ]
     }
    ],
@@ -85,7 +86,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "19.23 MB\n"
+      "17.50 MB\n"
      ]
     }
    ],
@@ -115,8 +116,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "19230.47 MB or\n",
-      "18.78 GB\n"
+      "17500.00 MB or\n",
+      "17.09 GB\n"
      ]
     }
    ],
@@ -150,7 +151,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "6.61 MB for a 1000 member ensemble\n"
+      "6.67 MB for a 1000 member ensemble\n"
      ]
     }
    ],
@@ -174,9 +175,9 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "1.17 MB per model\n",
-      "1171.96 MB for 1000 individual model files.\n",
-      "1.14 GB for 1000 individual model files.\n"
+      "1.07 MB per model\n",
+      "1070.01 MB for 1000 individual model files.\n",
+      "1.04 GB for 1000 individual model files.\n"
      ]
     }
    ],
@@ -200,7 +201,7 @@
    "source": [
     "## Flux analysis speed testing\n",
     "\n",
-    "Running FBA requires a relatively short amount of time to for a single model, however when working with ensembles of 1000s of models, the simple optimization problems can add up to significant amounts of time. Here we explore the expected timeframes for an ensemble and how that compares to using the equivalent number of individual models. It is important to note that during this benchmarking, we assume that the computer being used is capable to loading all individual modelings into the RAM, this may not be the case for many laptop computers. "
+    "Running FBA requires a relatively short amount of time for a single model, however when working with ensembles of 1000s of models, the simple optimization problems can add up to significant amounts of time. Here we explore the expected timeframes for FBA with an ensemble and how that compares to using the equivalent number of individual models. It is important to note that during this benchmarking, we assume that the computer being used is capable to loading all individual modelings into the RAM; this may not be the case for many modern laptop computers (e.g., ~16GB spare memory required)."
    ]
   },
   {
@@ -222,22 +223,25 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "1 processors: 142.41587114334106 seconds for entire ensemble\n",
-      "2 processors: 79.16171908378601 seconds for entire ensemble\n",
-      "4 processors: 44.92253303527832 seconds for entire ensemble\n",
-      "8 processors: 34.65370845794678 seconds for entire ensemble\n"
+      "1 processors: 87.24728102684021 seconds for entire ensemble\n",
+      "2 processors: 44.09945402145386 seconds for entire ensemble\n",
+      "3 processors: 32.84902577400207 seconds for entire ensemble\n",
+      "4 processors: 27.70060839653015 seconds for entire ensemble\n"
      ]
     }
    ],
    "source": [
     "# Time required to run FBA on a 1000 member ensemble using the innate Medusa functions.\n",
     "runtimes = {}\n",
-    "for num_processes in [1,2,4,8]:\n",
-    "    t0 = time.time()\n",
-    "    flux_balance.optimize_ensemble(ensemble, num_processes = num_processes)\n",
-    "    t1 = time.time()\n",
-    "    runtimes[num_processes] = t1-t0\n",
-    "    print(str(num_processes) + ' processors: ' + str(t1-t0) + ' seconds for entire ensemble')"
+    "trials = 5\n",
+    "for num_processes in [1,2,3,4]:\n",
+    "    runtimes[num_processes] = []\n",
+    "    for trial in range(0,trials):\n",
+    "        t0 = time.time()\n",
+    "        flux_balance.optimize_ensemble(ensemble, num_processes = num_processes)\n",
+    "        t1 = time.time()\n",
+    "        runtimes[num_processes].append(t1-t0)\n",
+    "    print(str(num_processes) + ' processors: ' + str(numpy.mean(runtimes[num_processes])) + ' seconds for entire ensemble')"
    ]
   },
   {
@@ -249,43 +253,54 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "79.50 seconds for 1000 models\n"
+      "35.06 seconds for 1000 models\n",
+      "34.51 seconds for 1000 models\n",
+      "34.49 seconds for 1000 models\n",
+      "34.62 seconds for 1000 models\n",
+      "34.37 seconds for 1000 models\n",
+      "34.61 second average for 1000 models\n"
      ]
     }
    ],
    "source": [
     "# Time required to run FBA on 1000 individual models using a single processor.\n",
     "# This is the equivalent time that would be required if all 1000 models were pre-loaded in RAM.\n",
     "\n",
-    "t_total = 0\n",
-    "for member in ensemble.members:\n",
-    "    # Set the member state \n",
-    "    ensemble.set_state(member.id)\n",
-    "    # Start the timer to capture only time required to run FBA on each model\n",
-    "    t0 = time.time()\n",
-    "    solution = ensemble.base_model.optimize()\n",
-    "    t1 = time.time()\n",
-    "    t_total = t1-t0 + t_total\n",
-    "print(\"%.2f\" % (t_total) ,'seconds for 1000 models')"
+    "trial_total = []\n",
+    "for trial in range(0,trials):\n",
+    "    t_total = 0\n",
+    "    for member in ensemble.members:\n",
+    "        # Set the member state\n",
+    "        ensemble.set_state(member.id)\n",
+    "        # Start the timer to capture only time required to run FBA on each model\n",
+    "        t0 = time.time()\n",
+    "        solution = ensemble.base_model.optimize()\n",
+    "        t1 = time.time()\n",
+    "        t_total = t1-t0 + t_total\n",
+    "    print(\"%.2f\" % (t_total) ,'seconds for 1000 models')\n",
+    "    trial_total.append(t_total)\n",
+    "print(\"%.2f\" % (numpy.mean(trial_total)) ,'second average for 1000 models')"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Loading individual models is about twice as fast as using Medusa ensembles (ignoring the time it takes to load all of the models), however requires about 300 times as much RAM. "
+    "Using individual models stored in memory is faster than an equivalent ensemble with 1-2 processors,  but Medusa is faster with an increasing number of processors. Keep in mind, however, that this comparison doesn't consider the time it takes to load all of the models (\\~200x faster in Medusa for an ensemble this size), make any modifications to the media conditions for an ensemble (one operation in Medusa; 1000 independent operations with individual models), and that using individual models requires far more memory (\\~300x in this case).\n",
+    "\n",
+    "This comparison also doesn't factor in the time required for the first optimization performed with any COBRApy model. When a model is optimized once, the solver maintains the solution as a starting point for future optimization steps, substantially reducing the time required for future simulations. Medusa intrinsically takes advantage of this by only using one COBRApy model to represent the entire ensemble; the solution is recycled from member to member during ensemble FBA in Medusa. In contrast, the first optimization step for every individual model loaded into memory will be more computationally expensive, as seen by the timing in the cell below."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 13,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "436.84 seconds for 1000 models\n"
+      "192.96 seconds for 1000 models\n"
      ]
     }
    ],
@@ -307,9 +322,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "medusa",
+   "display_name": "medusa_devel",
    "language": "python",
-   "name": "medusa"
+   "name": "medusa_devel"
   },
   "language_info": {
    "codemirror_mode": {
@@ -321,9 +336,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.7.6"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/docs/conf.py b/docs/conf.py
@@ -25,7 +25,7 @@
 # The short X.Y version
 version = ''
 # The full version, including alpha/beta/rc tags
-release = '0.1.3'
+release = '0.2.0'
 
 
 # -- General configuration ---------------------------------------------------

diff --git a/docs/creating_ensemble.ipynb b/docs/creating_ensemble.ipynb
@@ -815,9 +815,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "medusa_dev_1",
+   "display_name": "medusa_devel",
    "language": "python",
-   "name": "medusa_dev_1"
+   "name": "medusa_devel"
   },
   "language_info": {
    "codemirror_mode": {
@@ -829,9 +829,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.5.2"
+   "version": "3.7.6"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 4
 }
diff --git a/docs/index.rst b/docs/index.rst
@@ -50,6 +50,8 @@ Index
 * :doc:`creating_ensemble`
 * :doc:`simulating`
 * :doc:`io`
+* :doc:`benchmark_mem_cpu`
+* :doc:`benchmark_iterative_gapfill`
 * :doc:`faq`
 
 .. toctree::
@@ -61,6 +63,8 @@ Index
     creating_ensemble
     simulating
     io
+    benchmark_mem_cpu
+    benchmark_iterative_gapfill
     faq