Skip to content

Supervised machine learning model for radiotherapy incident learning

License

Notifications You must be signed in to change notification settings

kildealab/NLP-ML-for-incident-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP-ML-for-incident-learning

This is a python program that uses Natural Language Processing (NLP) in conjunction with supervised Machine Learning (ML) techniques to semi-automate the classification of radiotherapy incident reports. In this script we compile the radiotherapy incident reports collected from the Canadian Institute for Health Information (CIHI) database as well as Safety and Incident Learning System (SaILS) database of McGill University Health Centre (MUHC) Montreal, Canada. The incident reports are processed using numerous NLP techniques. The processed reports are then used to train multiple machine learning models from the Scikit-Learn library. We extended the models to be multi-label compatible and our final model is capable of generating a drop-down menu of label suggestions to assist incident investigating personnel. Refer to the reference section for more details.

DOI

Table of Contents

Author

Felix Mathew
Contact email: [email protected]

Prerequisites

  • Scikit-Learn (v0.23.1)
  • SpaCy (v2.3.2)
  • google-trans-new (v1.1.9)
  • PyEnchant (v3.1.1)

Features:

Natural Language Processing (NLP):

  • French to English translation
  • Autocorrection
  • Stopword removal
  • Lemmatization
  • Entity replacement

Machine Learning (ML) with Scikit-Learn:

  • One-hot encoding of the class labels
  • TF-IDF vectorization of the free-text data
  • Multi-label capability using multi-output methods
  • Extensive model evaluation
  • Custom scorer
  • 5-fold cross-validation
  • Grid search for hyperparameter tuning

Instructions

The Linear SVR models that we trained and tuned on our radiation oncology incident reports can be obtained from the out folder.

To develop a machine learning model on an entirely new dataset, follow the steps:

  1. Fill-in the MUHC datafile and the CIHI datafile with the incident report data according to the templates given.
  2. Run the python files in order from the src folder.
  3. Obtaine the output files from the out folder.

License

This project is provided under the MIT license. See the LICENSE file for more info.

Reference

  1. Angers C, Brown R, Clark B, Renaud J, Taylor R, Wilkins A. SaILS: A Free & Open Source Tool for Incident Learning. Quebec City: Canadian Organization of Medical Physicists Winter School; 2014
  2. Montgomery L, Fava P, Freeman CR, Hijal T, Maietta C, Parker W, et al. Development and implementation of a radiation therapy incident learning system compatible with local workflow and a national taxonomy. J Appl Clin Med Phys. 2018;19: 259–270.