medical NLP project

Going to try and predict medical speciality based on medical transcript

Dataset found on kaggle. Cannot find the URL so uploaded it here.

Code implements a simple bag of words, from sklearn tutorial (https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html)

df.columns: Index(['index', 'description', 'medical_specialty', 'sample_name','transcription', 'keywords']

There are 39 medical specialities:

array(['Allergy / Immunology', 'Bariatrics', 'Cardiovascular / Pulmonary',
       'Dentistry', 'Urology', 'General Medicine', 'Surgery',
       'Speech - Language', 'SOAP / Chart / Progress Notes',
       'Sleep Medicine', 'Rheumatology', 'Radiology',
       'Psychiatry / Psychology', 'Podiatry', 'Physical Medicine - Rehab',
       'Pediatrics - Neonatal', 'Pain Management', 'Orthopedic',
       'Ophthalmology', 'Office Notes', 'Obstetrics / Gynecology',
       'Neurosurgery', 'Neurology', 'Nephrology', 'Letters',
       'Lab Medicine - Pathology', 'IME-QME-Work Comp etc.',
       'Hospice - Palliative Care', 'Hematology - Oncology',
       'Gastroenterology', 'ENT - Otolaryngology', 'Endocrinology',
       'Emergency Room Reports', 'Discharge Summary',
       'Diets and Nutritions', 'Dermatology',
       'Cosmetic / Plastic Surgery', 'Consult - History and Phy.',
       'Chiropractic'])

I'd like to classify medical speciality based on keywords. It's easy to change which speciality to keep using:

df = df[df['medical_specialty'].isin(['Neurosurgery','Neurology'])]

I used Gaussian NB, Multinomial NB, SVM from Linear model, SVM from SGDClassifier, and logistic regression from SGDClassifier both here.

Input data is discrete so Gaussian NB should perform badly because it assumes features of a continuous nature. Multinomial NB is its discrete classification equivalent.

Example 1: Classify between Neurosurgery and Neurology:

Example 2: Classify between all specialities:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bin		bin
README.md		README.md
ms_classification.py		ms_classification.py
mtsamples.csv		mtsamples.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

medical NLP project

Going to try and predict medical speciality based on medical transcript

About

Releases

Packages

Languages

Samy-mri/medicalNLP

Folders and files

Latest commit

History

Repository files navigation

medical NLP project

Going to try and predict medical speciality based on medical transcript

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages