Skip to content

Bhushan0097/02.CAPSTONE.ML.REGRESSION-BikeShareDemandPrediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 

Repository files navigation

Bike Share Demand Prediction

GitHib Logo

Overview

This repository contains the code and resources for a supervised machine learning project aimed at predicting bike share demand in Seoul, South Korea. The dataset used is SeoulBikeData.csv.

Table of Contents

Introduction

Bike share demand prediction is a critical aspect of urban transportation planning. This project focuses on using machine learning techniques to predict bike rental demand in Seoul, aiding in efficient resource allocation and city planning.

Dataset

The dataset SeoulBikeData.csv is included in the 📁 data directory. It contains information about bike rentals, including weather conditions, temperature, humidity, and other relevant features.

The SeoulBikeData.csv file contains the following columns:

  • Date: Year-Month-Day
  • Rented Bike count: Count of bikes rented at each hour
  • Hour: Hour of the day
  • Temperature: Temperature in Celsius
  • Humidity: %
  • Windspeed: m/s
  • Visibility: 10m
  • Dew point temperature: Celsius
  • Solar radiation: MJ/m2
  • Rainfall: mm
  • Snowfall: cm
  • Seasons: Winter, Spring, Summer, Autumn
  • Holiday: Holiday/No holiday
  • Functional Day: NonFunctional/Functional Day

Dependencies

The project is developed using Python and relies on the following libraries:

  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn
  • Scikit-learn

Documentation

The project involves the following steps:

  1. Data Cleaning and Preparation
  2. Exploratory Data Analysis
  3. Visualization and Insights
  4. Hypothesis Testing
  5. Feature Enginerring & Data Pre-processing
  6. ML Model Training , Implementation and Evaluation

Data Cleaning and Preparation

The first step in this project involves cleaning and preparing the data. This includes checking for missing data, removing duplicates, and converting data types. Some of the specific tasks involved in this step include:

  • Handling missing data
  • Removing duplicates
  • Converting data types
  • TimeSeries Analysis

Exploratory Data Analysis

The next step in the project is to conduct exploratory data analysis. This involves examining the data to understand its distribution, central tendencies, and correlations between variables.

Hypothesis Testing

Hypothesis testing , a statistical method used to make inferences about a population based on a sample of data. To perform hypothesis testing on the 'SeoulBikeData.csv' dataset, we first start with a null hypothesis (H0) and an alternative hypothesis (H1), then use statistical tests to determine if there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis.

Below is general step-by-step guide on to perform hypothesis testing on a dataset like SeoulBikeData.csv:

  1. Define the Hypotheses
  2. Choose a Significance Level (α)
  3. Select the Test
  4. Perform the Test
  5. Analyze the Results
  6. Draw Conclusions

Feature Enginerring & Data Pre-processing

  1. Handling Missing Values
  2. Handling Outliers
  3. Label Encoding
  4. Textual Data Preprocessing
  5. Feature Manipulation & Selection
  6. Data Transformation
  7. Data Scaling
  8. Dimesionality Reduction
  9. Data Splitting

ML Model Training and Evaluation

The dependent variable is Rented Bike Count is a contionus variable. Hence to Regression ML algorithms are used to train the model to predict the depedent variable.
Following are the ML algorithms on which the model is trained

  1. Ridge Regression (Logistic Regression + L2 Regularization )
  2. Decision Tree Regression
  3. Random Forest Regression

Result

Conclusion

Model Name

Train r2_score

Test r2_score

MAE

Ridge Regression (Base Model)
0.5969834427027465
0.5611509471571063
282.97650291199943
Ridge Regression + GridSearchCV
0.5969815828381584
0.5612334108757271
282.9539718021544
Decision Tree Regression
1.0
0.8453401099843291
133.34609878310667
Decision Tree Regression + GridSearchCV
| 0.9480705668434853
0.8723154222023835
125.69186170892704
Random Forest Regression
0.9881705009991666
0.9181275336765233
100.85899069434504
Random Forest Regression + GridSearchCV
0.9881705009991666
0.9181275336765233
100.85899069434504

About

AlmaBetter Capstone ML Regression Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published