BDRAD Rad Classify

Rad Classify is a small Python library for quickly building classifiers for radiology reports. It leverages semantic dictionary mapping and fastText and is currently available as a black box classifier.

Installation

Rad Classify depends on fastText for Python, which needs to be installed manually. Instructions for installation can be found on the fastText GitHub page. The instructions are copied below for convenience:

$ git clone https://github.com/facebookresearch/fastText.git
$ cd fastText
$ pip install .

Once fastText is installed, you can install Rad Classify by cd'ing into the rad_classify directory and running

$ pip install -r requirements.txt

Rad Classify also depends on NLTK and sklearn, but these should be installed by pip automatically.

Usage

After installation, we can import the rad_classify module and start using its methods

from rad_classify import EndToEndPreprocessor, get_reports_from_csv
from rad_classify.models import FastTextClassifier

path = "./data/rad_reports.csv"
reports, labels = zip(*get_reports_from_csv(path, report_col="Report Text", label_col="Label"))

get_reports_from_csv is a utility method for reading CSVs. The EndToEndProcessor performs all of the necessary preprocessing before the text is fed into the classify. You can pass in paths to semantic dictionary files for performing semantic dictionary mapping. These files should be pickled dictionaries of strings to their replacements. CLEVER and RadLex dictionaries are provided with the library.

preprocessor = EndToEndProcessor(replacement_file_path="./semantic_dictionaries/clever_replacements", radlex_path="./semantic_dictionaries/radlex_replacements", sections=None, sections=["impression"])
processed_reports = preprocessor.transform(reports)

Note that you provide which section(s) of the radiology report you wish to extract and use for classication. Options are impression, findings, and clinical_history. Once we've processed the reports, we can use fastText to classify them.

clf = rad_classify.FastTextClassifier()
clf.train(processed_reports, labels, dim=50, epoch=20, lr=0.05)

Now that we've trained the classifier, we can use it to classify other reports

valid_path = "./data/validation_set.csv"
validation_reports, validation_labels = zip(*get_reports_from_csv(path, report_col="Report Text", label_col="Label"))
processed_valid = preprocessor.transform(validation_reports)
predictions = clf.predict(processed_valid)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
rad_classify		rad_classify
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BDRAD Rad Classify

Installation

Usage

About

Releases 1

Packages

Languages

bdrad/rad_classify

Folders and files

Latest commit

History

Repository files navigation

BDRAD Rad Classify

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages