Welcome to the California Housing Data Analysis project! In this project, we delve into a comprehensive dataset containing information about housing in various block groups across California. The dataset provides metrics such as population, median income, median housing prices, and more.
The dataset includes the following features:
-
Geographical Information:
- Latitude
- Longitude
-
Housing Details:
- Housing Median Age
- Total Rooms
- Total Bedrooms
- Households
-
Population Details:
- Population
-
Income and Value Metrics:
- Median Income
- Median House Value
-
Categorical Information:
- Ocean Proximity
-
Nominal Features:
- Latitude
- Longitude
- Ocean Proximity
-
Ordinal Features:
- Housing Median Age
-
Continuous Features:
- Median House Value
- Median Income
- Median Age
-
Discrete Features:
- Population
- Number of Rooms and Bedrooms
-
Exploratory Data Analysis (EDA):
- Calculate and visualize the average median income.
- Explore the distribution of housing median age.
-
Visualization:
- Examine the relationship between median income and median house values through visualization.
-
Data Cleaning:
- Create a dataset by removing entries with missing total bedrooms.
- Create a dataset by filling in missing total bedroom values with the mean.
-
Statistical Analysis:
- Develop a user-defined function to calculate the median value for selected columns.
-
Geospatial Analysis:
- Plot latitude versus longitude to visualize the geographical distribution.
-
Subset Creation:
- Form a dataset containing entries with 'Near Ocean' as the ocean proximity.
-
Mean and Median Calculation:
- Calculate the mean and median of the median income for the 'Near Ocean' dataset.
-
Feature Engineering:
- Introduce a new column, 'total_bedroom_size,' based on total bedrooms.
The project progresses from data exploration to cleaning and advanced analysis, incorporating geospatial representation and feature engineering. Each step is documented with clear explanations and visualizations to facilitate understanding.
Ensure you have the required libraries installed by running:
pip install pandas numpy seaborn matplotlib
Before getting started, ensure you have the required Python libraries installed by running the following command:
pip install pandas numpy seaborn matplotlib
-
Clone this repository to your local machine by executing the following command in your terminal:
git clone https://github.com/SPARTANX21/Python---Understanding-California-Housing-Data-Set.git
-
Navigate to the project directory using the terminal:
cd Python---Understanding-California-Housing-Data-Set.git
-
Launch Jupyter Notebook to access the project files and execute the analysis:
jupyter notebook
Follow these steps to explore and analyze the California Housing Data Set efficiently. Should you have any questions or suggestions, feel free to reach out or open an issue on the repository.
Happy analyzing! 🏡📊