Authors: Irene D'Onofrio and Mario Esposito
Aim: This project provides a robust benchmarking of machine learning models for predicting differentiated thyroid cancer (DTC) recurrence, using nested cross-validation (nCV) for evaluation and SHAP analysis for interpretation.
Methods: Models tested included Support Vector Machine, XGBoost, Random Forest, Decision Trees, Logistic Regression, and Multi-Layer Perceptron, evaluated using a stratified 5-fold outer nCV with a 3-fold inner loop for hyperparameter tuning. SHAP analysis was applied to the best-performing model (SVM) to assess feature importance and explain predictions.
Results: SVM, XGBoost, and RF showed the strongest generalization, with SVM achieving the highest average MCC of 0.91 ± 0.04. SHAP analysis identified "Response" as the most influential feature (followed by others), and provided insight into misclassified cases.
UC Irvine Machine Learning Repository: DTC Dataset
- Donated: 10/30/2023
- Description: 13 clinicopathologic features collected over 15 years, with a minimum 10-year follow-up per patient.
- Dataset Characteristics: Tabular
- Primary Task: Binary Classification
- Target label: Recurred/Not Recurred
- Instances: 383
- Suggested split: No
- Features: Age, Gender, Smoking, Hx Smoking, Hx Radiotherapy, Thyroid Function, Physical Examination, Adenopathy, Pathology, Focality, Risk, T, N, M, Stage, Response
- Reference: Springer Link
- Exploratory Data Analysis
- Data downloading
- Order categories (Ordinal features)
- Plot Features Distributions
- Plot Feature Distributions stratified per classes (Recurred / Not Recurred)
- Feature Encoding
- Nested Cross-Validation (nCV)
- Models hyperparameters space
- Stratified 5-fold nCV (3-fold inner CV)
- Save or Import existing nCV_results
- Compare models metrics on testing
- MCC, ROC AUC and PRC AUC
- ROC and PRC curves
- SHAP analysis on SVM
- SHAP on testing data (loop in the outer CV)
- Save or Import existing results
- SHAP Visualization
- Global feature importance
- SHAP values per features (sample-wise)
- Feature values effect on prediction (sorted by average feature importance)
- Misclassified samples