Merge pull request #45 from tldr-group/paper

Paper
tldr-group · Oct 10, 2024 · 6b315c6 · 6b315c6
2 parents 2643d1f + 2ba7f79
commit 6b315c6
Show file tree

Hide file tree

Showing 16 changed files with 1,326 additions and 462 deletions.
diff --git a/CITATION.cff b/CITATION.cff
@@ -5,22 +5,26 @@ authors:
     orcid: 0000-0003-0142-8597
   - name: Ronan Docherty
     orcid: 0000-0002-7332-0924
+  - name: Steve Kench
+    orcid: 0000-0002-7263-6724
   - name: Samuel J. Cooper
     orcid: 0000-0003-4055-6903
-title: "Predicting Microstructural Representativity from a Single Image"
+title: "Prediction of Microstructural Representativity from a Single Image"
 doi: ARXIV_DOI
-url: "https://github.com/tldr-group/Representativity"
+url: "https://github.com/tldr-group/ImageRep"
 preferred-citation:
   type: article
   authors:
   - name: Amir Dahari
     orcid: 0000-0003-0142-8597
   - name: Ronan Docherty
     orcid: 0000-0002-7332-0924
+  - name: Steve Kench
+    orcid: 0000-0002-7263-6724
   - name: Samuel J. Cooper
     orcid: 0000-0003-4055-6903
   doi: ARXIV_DOI
   journal: "arXiV preprint"
   month: 8
-  title: "Predicting Microstructural Representativity from a Single Image"
+  title: "Prediction of Microstructural Representativity from a Single Image"
   year: 2024
diff --git a/README.md b/README.md
@@ -1,38 +1,32 @@
-# Representativity
-
-![Tests](https://github.com/tldr-group/Representativity/actions/workflows/tests.yml/badge.svg)
+# ImageRep
 
 [Try it out!](https://www.imagerep.io/)
 
-You take a micrograph of a material. You segment it, and measure the phase fractions. How sure are you that the phase fraction of the whole material is close to your measurements?  
-Here we define 'representativity' as [1]
-> A microstructure is $(c, d)$-property representative if the measured value of the microstructural property deviates by no more than $d\%$ from the bulk material property, with at least $c\%$ confidence. For example, if $(c,d)=(95,3)$, and the property is phase-fraction, this means we can be $95\%$ confident that the measured phase-fraction is within $3\%$ of the bulk material phase-fraction. 
-
-We introduce the 'ImageRep' model for performing fast phase-fraction representativity estimation from a single microstructural image. This is achieved by estimating the Two-Point Correlation (TPC) function of the image via the FFT. From the TPC the 'Integral Range' can be directly determined - the Integral Range has previously been determined using (slow) statistical methods. We then represent the image as binary squares of length 'Integral Range' which are samples from a Bernoulli distribution with a probability determined by the measured phase fraction. From this we can establish the uncertainty in the phase fraction in the image to a given confidence, **and** the image size that would be needed to meet a given target uncertainty.
+Here we introduce the 'ImageRep' method for fast phase fraction representativity estimation from a single microstructural image. This is achieved by calculating the Two-Point Correlation (TPC) function of the image, combined with a data-driven analysis of the [MicroLib](https://microlib.io/) dataset. By applying a statistical framework that utilizes both data sources, we can establish the uncertainty in the phase fraction in the image with a given confidence, **and** the image size that would be needed to meet a given target uncertainty. Further details are provided in our [paper](CITATION.cff).
 
-If you use this model in your research, [please cite us](CITATION.cff).
+If you use this ImageRep in your research, [please cite us](CITATION.cff).
 
 ## Usage:
 
-This model can be used as python package - see [`example.ipynb`](example.ipynb) or via the [website (imagerep.io)](https://www.imagerep.io/).
+This method can be used via the [website (imagerep.io)](https://www.imagerep.io/) or as python package - see [`example.ipynb`](example.ipynb).
 
 <p align="center">
     <img src="https://sambasegment.blob.core.windows.net/resources/repr_repo_v2.gif">
 </p>
 
-NB: the website may run out of memory for large volumes (>1000x1000x1000) - if this happens run the model locally or contact us
+NB: the website may run out of memory for large volumes (>1000x1000x1000) - if this happens run the method locally or contact us
 
 ## Limitations:
 - **This is not the only source of uncertainty!** Other sources *i.e,* segmentation uncertainty, also contribute and may be larger
-- For multi-phase materials, this model estimates the uncertainty in phase-fraction of a single (chosen) phase, counting all the others as a single phase (*i.e,* a binary microstructure)
+- For multi-phase materials, this method estimates the uncertainty in phase-fraction of a single (chosen) phase, counting all the others as a single phase (*i.e,* a binary microstructure)
 - Not validated for for images smaller than 200x200 or 200x200x200
 - Not validated for large integral ranges/features sizes (>70 px) 
 - Not designed for periodic structures
 - 'Length needed for target uncertainty' is an intentionally conservative estimate - retry when you have measured the larger sample to see a more accurate estimate of that uncertainty
 
 ## Local Installation Instructions
 
-These instructions are for installing and running the model locally. They assume a UNIX enviroment (mac or linux), but adapting for Windows is straightforward. Note you will need 2 terminals, one for the frontend local server and one for the backend local server.
+These instructions are for installing and running the method locally. They assume a UNIX enviroment (mac or linux), but adapting for Windows is straightforward. Note you will need 2 terminals, one for the frontend local server and one for the backend local server.
 
 ### Preliminaries
 
@@ -51,7 +45,7 @@ git clone https://github.com/tldr-group/Representativity && cd Representativity
 pip install -e .
 ```
 
-**NOTE: this is all you need to do if you wish to use the model via the python package.** To run the website locally, follow the rest of the instructions.
+**NOTE: this is all you need to do if you wish to use the method via the python package.** To run the website locally, follow the rest of the instructions.
 
 2. With your virtual environment activated, and inside the `representativity/` directory, run
 
@@ -86,6 +80,8 @@ yarn && yarn start
 
 ## Testing Instructions
 
+![Tests](https://github.com/tldr-group/Representativity/actions/workflows/tests.yml/badge.svg)
+
 1. Run (with your virtual enviroment activated!)
 
 ```

diff --git a/paper_figures/SI_figures/porespy_ims.py b/paper_figures/SI_figures/porespy_ims.py
@@ -0,0 +1,58 @@
+import numpy as np
+import matplotlib.pyplot as plt
+import random
+from itertools import product
+import porespy as ps
+from representativity.validation import validation
+
+
+if __name__ == '__main__':
+    num_generators = 50
+    num_images = 5
+    generators_chosen = np.random.choice(num_generators, num_images, replace=False)
+    images = []
+    large_img_size = np.array([1000, 1000])
+    img_size = np.array([200, 200])
+    alpha = 1
+    ps_generators = validation.get_ps_generators()
+    rand_iter = 0
+    for generator, params in ps_generators.items():
+        for value_comb in product(*params.values()):
+            if rand_iter in generators_chosen:
+                args = {key: value for key, value in zip(params.keys(), value_comb)}
+                args = validation.factors_to_params(args, im_shape=large_img_size)
+                image = validation.get_large_im_stack(generator, large_img_size, 1, args)
+                image = image[0]
+                image = image[:img_size[0], :img_size[1]]
+                images.append(image)
+            rand_iter += 1
+    random.shuffle(images)
+
+    layers = num_images  # How many images should be stacked.
+    x_offset, y_offset = img_size[0]-25, 30  # Number of pixels to offset each image.
+
+    new_shape = ((layers - 1)*y_offset + images[0].shape[0],
+                (layers - 1)*x_offset + images[0].shape[1]
+                )  # the last number, i.e. 4, refers to the 4 different channels, being RGB + alpha
+
+    stacked = np.zeros(new_shape)
+
+
+    for layer in range(layers):
+        cur_im = images[layer]
+        stacked[layer*y_offset:layer*y_offset + cur_im.shape[0],
+                layer*x_offset:layer*x_offset + cur_im.shape[1] 
+                ] += cur_im
+    stacked = 1 - stacked
+
+
+
+    # Create the PoreSpy images:
+    ax_porespy_im = fig.add_subplot(gs[0, 0])
+    ax_porespy_im.imshow(stacked, vmin=0, vmax=1, cmap='gray', interpolation='nearest')
+    ax_porespy_im.set_title('(a)')
+    ax_porespy_im.axis('off') 
+
+    pos1 = ax_porespy_im.get_position() # get the original position
+    pos2 = [pos1.x0 - 0.15, pos1.y0+0, pos1.width+0.1, pos1.height+0.1] 
+    ax_porespy_im.set_position(pos2) 
diff --git a/paper_figures/SI_figures/validation_tables.py b/paper_figures/SI_figures/validation_tables.py
@@ -0,0 +1,140 @@
+import json
+import numpy as np
+import matplotlib.pyplot as plt
+from functools import reduce
+from matplotlib.gridspec import GridSpec
+from matplotlib.font_manager import FontProperties
+
+def get_in_bounds_results(dims, name="porespy"):
+    # Load the data
+    validation_data_dir = 'representativity/validation/validation_w_real.json'
+    with open(validation_data_dir, "r") as file:
+        validation_data = json.load(file)
+    # Get the in-bounds results
+    in_bound_results = {}
+    for dim in dims:
+        dim_results = {
+            "one_im": [], "model_with_gmm": [], "model_wo_gmm": []}
+        for gen_name in validation_data[f"validation_{dim}"].keys():
+            # if there are more generators, they need to be added here:
+            if name == "porespy":
+                if not gen_name.startswith("blob") and not gen_name.startswith("frac"):
+                    continue
+            else:
+                if dim == "2D":
+                    if not gen_name.startswith("anode"):
+                        continue
+                else:
+                    if name == 'Targray':  # 3D
+                        if not gen_name.startswith("separator_Targray"):
+                            continue
+                    else:
+                        if not gen_name.startswith("separator_PP1615"):
+                            continue
+            gen_data = validation_data[f"validation_{dim}"][gen_name]
+            for run in gen_data.keys():
+                if not run.startswith("run"):
+                    continue
+                run_data = gen_data[run]
+                dim_results["one_im"].append(run_data["in_bounds_one_im"])
+                dim_results["model_with_gmm"].append(run_data["model_in_bounds"])
+                dim_results["model_wo_gmm"].append(run_data["model_wo_gmm_in_bounds"])
+        n_trials = len(dim_results["one_im"])
+        for res in dim_results.keys():
+            dim_results[res] = np.array([n_trials, np.array(dim_results[res]).sum()])
+        in_bound_results[dim] = dim_results
+    return in_bound_results
+
+def make_data(dim_res, order):
+        data = []
+        for key in order:
+            num_trials, num_in_bounds = dim_res[key]
+            row = [
+                f"{num_in_bounds}/{num_trials} = {num_in_bounds/num_trials*100:.2f}%", 
+                "95%", 
+                f"{np.abs(0.95-num_in_bounds/num_trials)*100:.2f}%"
+                ]
+            data.append(row)
+        return data
+
+def bold_min_value(table_data, table1, start_idx = 0):
+
+    absolute_errors = np.array([float(table_data[i][2][:-1]) for i in range(start_idx, start_idx+3)])
+    min_indices = np.where(absolute_errors==absolute_errors.min())[0]
+    for min_idx in min_indices:
+        imagerep_2d_cell_right = table1[(start_idx+min_idx+1, 2)]
+        imagerep_2d_cell_right.set_text_props(fontproperties=FontProperties(weight='bold'))
+        imagerep_2d_cell_left = table1[(start_idx+min_idx+1, -1)]
+        imagerep_2d_cell_left.set_text_props(fontproperties=FontProperties(weight='bold'))
+        # imagerep_2d_cell.set_facecolor('lightgreen')
+
+def make_table(dims, ax_table, in_bounds_res, title):
+
+    order = ["one_im", "model_wo_gmm", "model_with_gmm"]
+    dim_data = [make_data(in_bounds_res[dim], order) for dim in dims]
+    table_data = reduce(lambda x, y: x + y, dim_data)
+    # plt.figtext(0.415, 0.485, '(b)', ha='center', va='bottom', fontsize=12)
+    ax_table.axis('off')
+    colWidths = np.array([0.31, 0.14, 0.14])
+    colWidths /= colWidths.sum()
+    column_labels = ["Material's true phase fraction in the predicted bounds", "Confidence goal", "Absolute error"]
+
+    general_row_labels = ["Classical subdivision method", "ImageRep only std", "ImageRep"]
+    dim_row_labels = [[f"{general_row_labels[i]} ({dim})" for i in range(len(general_row_labels))] for dim in dims]
+    row_labels = reduce(lambda x, y: x + y, dim_row_labels)
+
+    table1 = ax_table.table(cellText=table_data, colLabels=column_labels, rowLabels=row_labels, loc='center', colWidths=colWidths)
+    for key, cell in table1.get_celld().items():
+        cell.set_text_props(ha='center', va='center')
+
+    title_len_addition = len(title) * 0.003
+    y_pos = 0.905 if len(dims) == 1 else 0.885
+    ax_table.text(-0.11-title_len_addition, y_pos, title, ha='left', va='top', transform=ax_table.transAxes)
+    # Find minimum error and highlight the corresponding cell in bold:
+    bold_min_value(table_data, table1)
+    if len(dims) > 1:
+        bold_min_value(table_data, table1, start_idx=3)
+
+def join_all_data(all_data):
+    dims = ["2D", "3D"]
+    res = {dim: {key: np.array([0,  0]) for key in all_data[0]["2D"].keys()} for dim in dims}
+    for dim in dims:
+        # add the reults of the dimension together:
+        for i in range(len(all_data)):
+            if dim in all_data[i]:
+                for key in all_data[i][dim].keys():
+                    res[dim][key] += all_data[i][dim][key]
+    return res
+
+if __name__ == '__main__':
+    dims = ["2D", "3D"]
+
+    # make a fig with 2 subplots, one for each table:
+    # Create a figure with 2 subplots
+    col_width = 16
+    fig = plt.figure(figsize=(col_width, col_width/2.1))
+    gs = GridSpec(5, 1, height_ratios=[2, 1, 1, 1, 2])
+
+    subplot_args = (
+        [["2D", "3D"], "porespy", 'PoreSpy simulated materials'],
+        [["2D"], "anode", 'Solid Oxide Fuel Cell anode'],
+        [["3D"], "Targray", 'Targray separator'],
+        [["3D"], "PP1615", 'PP1615 separator']
+    )
+
+    all_data = []
+    for i, (dims, name, title) in enumerate(subplot_args):
+        in_bounds_res = get_in_bounds_results(dims=dims, name=name)
+        all_data.append(in_bounds_res)
+        make_table(dims, fig.add_subplot(gs[i]), in_bounds_res, title=title)
+
+    # Join all the data together:
+    all_data = join_all_data(all_data)
+    make_table(["2D", "3D"], fig.add_subplot(gs[4]), all_data, title="All materials")
+
+    # Adjust layout
+    plt.tight_layout()
+
+    # Save the figure, with high dpi
+    plt.savefig("paper_figures/output/SI_validation_tables.pdf", format='pdf', dpi=300)
+