Final Project for MA463X: Data Analytics & Statistical Learning.
We analyzed the breast cancer data set available here. Our goal was to classify the given tumor cells as malignant or benign. We experimented with the following methods:
- K-Nearest Neighbors
- Linear Discriminant Analysis & Quadratic Discriminant Analysis (LDA & QDA)
- Logistic Regression
- Random Forest
- Bagging
- Boosting
Our final choice based on training/validation results was bagging with logistic regression. This model's accuracy was 96.70% on test (unseen) data.
Our report, which goes into greater deal on our experiments, results, and analysis can be found here.
Completed by Mike Giancola, Ranier Gran, Cassidy Litch, Charles Lovering, and Cuong Nguyen.
Randy Paffenroth