Optimizing Heart Disease Classification: An Exploration of Multiple Algorithms and Hyperparameter Tuning

This project explores various machine learning algorithms and their hyperparameter tuning to classify heart disease. The goal is to find the best-performing model based on accuracy metrics by evaluating models like Logistic Regression, KNN, Naive Bayes, SVM, Decision Tree, Random Forest, AdaBoost, Gradient Boosting, and XGBoost.

Project Overview

Heart disease classification is a critical application of machine learning in the healthcare industry. This project focuses on optimizing multiple machine learning models to classify the presence of heart disease. The dataset used includes various health metrics such as age, sex, cholesterol levels, and more. The performance of different models is evaluated based on accuracy, both on cross-validation and test data.

Dataset

The dataset used in this project contains the following fields:

ID: Unique identifier for the dataset entry.
Age: Age of the patient.
Sex: Gender of the patient.
CP: Chest pain type.
Trestbps: Resting blood pressure.
Chol: Cholesterol level.
Fbs: Fasting blood sugar.
Restecg: Resting electrocardiographic results.
Thalch: Maximum heart rate achieved.
Exang: Exercise-induced angina.
Oldpeak: ST depression induced by exercise relative to rest.
Slope: The slope of the peak exercise ST segment.
Ca: Number of major vessels colored by fluoroscopy.
Thal: Thalassemia test result.
Num: Diagnosis of heart disease (target variable).

The dataset can be found here.

Objective

The main objective of this project is to identify the best machine learning model for classifying heart disease by experimenting with various algorithms and hyperparameter tuning. The best model is chosen based on cross-validation accuracy and test accuracy.

Methodology

Data Preprocessing: The dataset is cleaned, and features are standardized where necessary.
Model Training: Several machine learning models, including:
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Gaussian Naive Bayes
- Support Vector Machine (SVM)
- Decision Tree
- Random Forest
- AdaBoost
- Gradient Boosting
- XGBoost
Hyperparameter Tuning: Each model's hyperparameters are tuned using grid search to optimize performance.
Model Evaluation: Models are evaluated based on accuracy during cross-validation and on the test set.

Features

Hyperparameter Tuning: Grid search is used to fine-tune the model parameters.
Cross-Validation: Used to evaluate the model's performance and reduce overfitting.
Model Comparison: Multiple models are compared to find the best-performing one.

Prerequisites

Python 3.x
Libraries:
- pandas
- numpy
- scikit-learn
- xgboost
- matplotlib
- seaborn

Setup and Installation

Clone the repository:

git clone https://github.com/your-username/heart-disease-classification.git
cd heart-disease-classification

Install the required dependencies:
```
pip install -r requirements.txt
```
Download the dataset:
- The dataset can be found here.
- Place the dataset.csv file in the project directory.
Run the main script:
```
python main.py
```

Results

Model	Cross-Validation Accuracy	Test Accuracy
Logistic Regression	0.5115	0.5109
Gradient Boosting	0.6396	0.5978
KNeighbors Classifier	0.5767	0.5870
Decision Tree Classifier	0.5840	0.5761
AdaBoost Classifier	0.6058	0.5978
Random Forest	0.6288	0.6739
XGBoost Classifier	0.6263	0.6413
Support Vector Machine	0.5877	0.5870
Naive Bayes Classifier	0.5780	0.5435

Best Model: XGBoost Classifier
- Cross-Validation Accuracy: 0.6263
- Test Accuracy: 0.6413

Usage

The main.py script handles the training, hyperparameter tuning, and evaluation of all the models. You can modify the script to change the dataset or the models being used.

License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details.

Acknowledgments

Special thanks to the contributors of open-source machine learning libraries and the healthcare community for providing valuable insights and datasets to help with this analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
LICENSE		LICENSE
README.md		README.md
dataset.csv		dataset.csv
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizing Heart Disease Classification: An Exploration of Multiple Algorithms and Hyperparameter Tuning

Project Overview

Dataset

Objective

Methodology

Features

Prerequisites

Setup and Installation

Results

Usage

License

Acknowledgments

About

Releases 1

Languages

License

skumbhar272002/heart-disease-classification

Folders and files

Latest commit

History

Repository files navigation

Optimizing Heart Disease Classification: An Exploration of Multiple Algorithms and Hyperparameter Tuning

Project Overview

Dataset

Objective

Methodology

Features

Prerequisites

Setup and Installation

Results

Usage

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Languages