Heart Disease Analysis

Overview

This Python script analyzes the heart disease dataset from the UCI Machine Learning Repository. The analysis includes data exploration, cleaning, analysis, and visualization using the Pandas library for data manipulation and Matplotlib for plotting.

Data Collection

The heart disease dataset is sourced from the UCI Machine Learning Repository. The dataset contains various features related to heart disease, such as age, sex, chest pain type, cholesterol levels, and more. The data is read into a Pandas DataFrame using the provided URL.

Data Exploration and Cleaning

The script performs initial data exploration and cleaning steps, including checking for missing values and handling them appropriately by dropping rows with missing values.

Data Analysis

The script calculates summary statistics and correlation matrix to understand the data distribution and relationships between different features.

Data Visualization with Matplotlib

The script creates various visualizations to further explore the dataset:

Distribution of age
Scatter plot of age vs. cholesterol
Correlation heatmap
Health risk score distribution
Scatter plot of age vs. health risk score
Histogram of age distribution for people with the target condition
Percentage of people with high cholesterol in each age group

Additional Analysis

The script calculates the age with the maximum number of persons suffering from high cholesterol and creates a histogram of age distribution for people with the target condition. Additionally, it calculates the percentage of people in each age group with high cholesterol and identifies the age group with the maximum percentage.

Plot Images

The plot images are included in the repository:

Distribution_of_Age.png
Scatter_Plot_Age_vs_Cholesterol.png
Correlation_Heatmap.png
Health_Risk_Score_Distribution.png
Scatter_Plot_Age_vs_Health_Risk_Score.png
Age_Distribution_of_People_with_Target_Condition.png
Percentage_of_People_with_High_Cholesterol_in_Each_Age_Group.png

Usage

Install the required Python libraries: pandas, matplotlib.
Run the Python script main.py.
View the generated visualizations to gain insights into the heart disease dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Heart Disease Analysis

Overview

Data Collection

Data Exploration and Cleaning

Data Analysis

Data Visualization with Matplotlib

Additional Analysis

Plot Images

Usage

Preview

Files

README.md

Latest commit

History

README.md

File metadata and controls

Heart Disease Analysis

Overview

Data Collection

Data Exploration and Cleaning

Data Analysis

Data Visualization with Matplotlib

Additional Analysis

Plot Images

Usage

Preview