Skip to content

Latest commit

 

History

History
47 lines (38 loc) · 2.72 KB

README.md

File metadata and controls

47 lines (38 loc) · 2.72 KB

Heart Disease Analysis

Overview

This Python script analyzes the heart disease dataset from the UCI Machine Learning Repository. The analysis includes data exploration, cleaning, analysis, and visualization using the Pandas library for data manipulation and Matplotlib for plotting.

Data Collection

The heart disease dataset is sourced from the UCI Machine Learning Repository. The dataset contains various features related to heart disease, such as age, sex, chest pain type, cholesterol levels, and more. The data is read into a Pandas DataFrame using the provided URL.

Data Exploration and Cleaning

The script performs initial data exploration and cleaning steps, including checking for missing values and handling them appropriately by dropping rows with missing values.

Data Analysis

The script calculates summary statistics and correlation matrix to understand the data distribution and relationships between different features.

Data Visualization with Matplotlib

The script creates various visualizations to further explore the dataset:

  • Distribution of age
  • Scatter plot of age vs. cholesterol
  • Correlation heatmap
  • Health risk score distribution
  • Scatter plot of age vs. health risk score
  • Histogram of age distribution for people with the target condition
  • Percentage of people with high cholesterol in each age group

Additional Analysis

The script calculates the age with the maximum number of persons suffering from high cholesterol and creates a histogram of age distribution for people with the target condition. Additionally, it calculates the percentage of people in each age group with high cholesterol and identifies the age group with the maximum percentage.

Plot Images

The plot images are included in the repository:

  • Distribution_of_Age.png
  • Scatter_Plot_Age_vs_Cholesterol.png
  • Correlation_Heatmap.png
  • Health_Risk_Score_Distribution.png
  • Scatter_Plot_Age_vs_Health_Risk_Score.png
  • Age_Distribution_of_People_with_Target_Condition.png
  • Percentage_of_People_with_High_Cholesterol_in_Each_Age_Group.png

Usage

  1. Install the required Python libraries: pandas, matplotlib.
  2. Run the Python script main.py.
  3. View the generated visualizations to gain insights into the heart disease dataset.

Preview

Figure_1 Figure_4 Figure_3 Figure_2