lidc-binary-classification

This repository contains code to pre-process the LIDC-IDRI dataset of CT-scans with pulmonary nodules into a binary classification problem, easy to use for learning deep learning

Overview

The workflow consists of a few steps

use the pylidc library to process image annotations and segmentations (identifying malignant vs benign and the locations of the nodules)
resample to 1mm x 1mm x 1mm and process HU values of different scanners
export cropped regions around the nodules in 2 ways: 3D cubes, 2D slices

Download scans

Download the original scans using the steps from this website: https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI

Setup python environment

download anaconda 3
create a new environment (e.g. conda create --name lidc)
install some packages

(note we need scikit-image version 0.13 since replacement of measure.marching_cubes with measure.marching_cubes_lewiner in version 0.14 breaks compatibility with pylidc (as of yet)

conda install jupyter numpy pandas feather-format scikit-image=0.13

pip install pylidc pypng

configure pylidc to know where the scans are located, follow these steps: https://pylidc.github.io/install.html

Follow the notebook

Pre processing: lidc-preprocessing.jpynb

Modeling example:

keras + tf CNN 3D: CNN_keras_3D.jpynb
keras + tf CNN 2D: CNN_keras_3D.jpynb

Issues

Currently, the code uses the pylidc function 'cluster_annotations' twice: ones to create a DataFrame of annotations, a second time to export the images. Since this function takes some time, this could be made more efficient

This is by no means an 'optimal' approach in the sense that I have not experimented with hyperparameters of the pre-processing like

resampling size
'borderline malignancy' definition
output size
number of 2D slices
extensive CNN alterations

But it is enough to get a model running as one can see from the provided examples. It should be able to get you up to speed for using deep learning on actual medical images!

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
experiments		experiments
model		model
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
fastai-example-v1.ipynb		fastai-example-v1.ipynb
fastai-example.ipynb		fastai-example.ipynb
feat-class.ipynb		feat-class.ipynb
lidc-preprocessing.ipynb		lidc-preprocessing.ipynb
prepare-data.ipynb		prepare-data.ipynb
representation.ipynb		representation.ipynb
simutils.py		simutils.py
train.py		train.py
utils-preprocessing.py		utils-preprocessing.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lidc-binary-classification

Overview

Download scans

Setup python environment

Follow the notebook

Issues

About

Releases

Packages

Languages

License

vanAmsterdam/lidc-binary-classification

Folders and files

Latest commit

History

Repository files navigation

lidc-binary-classification

Overview

Download scans

Setup python environment

Follow the notebook

Issues

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages