Introduction

In microscopic cancer images, the morphology of epithelium (tissue that lines the outer surfaces of organs and blood vessles throughout the body) plays a crucial role in predicting overall survival and outcome in cancer patients. This makes segmentation of epithelium cells from stromal tissue a crucial step in digital pathology. Manual segmentation of epithelia can be extremely cumbersome due to the sheer number of epithelial cells and their complex shapes (see example below). This is a problem well-suited to be solved by deep learning.

On the left is a raw ER+ BCa (estrogen receptor positive breat cancer) image. Dark purple regions are epithelium and pink+white regions are stroma. The image on the right is the corresponding mask with epithelial cells in white and stroma in black. The mask was a result of manual segmentation by an expert pathologist. Our job is to segment the epithelial regions automatically using deep learning.

Dataset Description

The training data consists of 42 images (1,000 x 1,000) from the same number of patients. Each patient only has one image.

Methodology

Patch extraction

For each patient, let's extract 100 patches of size 500 x 500. Before extracting patches, let's pad the images with size 500 so that regions close to the edges can also be trained on. The number of patches is limited by the computing resources available. Patch size is more than twice than the input training size of patches (256 x 256) so that non-local regions of the bigger patch are cropped during augmentation. This is important because we want the segmentation network to learn from patches which don't look alike.

Creating database

After extracting the patches from each whole slide image of a patient, we will store them in a pytable built on top of hdf5 library (hierarchical data format) and numpy package. It allows patches to be read and augmented in parallel by GPUs. Reading patches from pytables is much faster as compared to reading from images. This is especially important if one wants to read a patch a hundred epochs during training.

UNet training

We will use the highly successful UNet model for segmentations. See the paper for more details on the unet architecture. We will use cross-entopy for each pixel as the loss function.

Results

Below is an example of the performance of the unet on a validation image. The dice score is 0.78.

Areas of improvement

Thresholding: Because of a high number of false positives, setting a higher threshold for positive class could improve the dice scores.
More patches: Due to computational limitations, 100 patches per patient/whole slide image were extracted. More patches will help performance.
Hard cases: UNet performance will benefit from images which are difficult to segment manually. For instance, a lower contrast between epithelium and stroma makes it harder to segment. Learning from difficult cases should help. Additionally, the UNet seems to oversegment. It would be interesting to see the performance if we include images where the dominant component is stroma.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introduction

Dataset Description

Methodology

Patch extraction

Creating database

UNet training

Results

Areas of improvement

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introduction

Dataset Description

Methodology

Patch extraction

Creating database

UNet training

Results

Areas of improvement