Bike Share Demand Prediction

Overview

This repository contains the code and resources for a supervised machine learning project aimed at predicting bike share demand in Seoul, South Korea. The dataset used is SeoulBikeData.csv.

Introduction

Bike share demand prediction is a critical aspect of urban transportation planning. This project focuses on using machine learning techniques to predict bike rental demand in Seoul, aiding in efficient resource allocation and city planning.

Dataset

The dataset SeoulBikeData.csv is included in the 📁 data directory. It contains information about bike rentals, including weather conditions, temperature, humidity, and other relevant features.

The SeoulBikeData.csv file contains the following columns:

Date: Year-Month-Day
Rented Bike count: Count of bikes rented at each hour
Hour: Hour of the day
Temperature: Temperature in Celsius
Humidity: %
Windspeed: m/s
Visibility: 10m
Dew point temperature: Celsius
Solar radiation: MJ/m2
Rainfall: mm
Snowfall: cm
Seasons: Winter, Spring, Summer, Autumn
Holiday: Holiday/No holiday
Functional Day: NonFunctional/Functional Day

Dependencies

The project is developed using Python and relies on the following libraries:

NumPy
Pandas
Matplotlib
Seaborn
Scikit-learn

Documentation

The project involves the following steps:

Data Cleaning and Preparation
Exploratory Data Analysis
Visualization and Insights
Hypothesis Testing
Feature Enginerring & Data Pre-processing
ML Model Training , Implementation and Evaluation

Data Cleaning and Preparation

The first step in this project involves cleaning and preparing the data. This includes checking for missing data, removing duplicates, and converting data types. Some of the specific tasks involved in this step include:

Handling missing data
Removing duplicates
Converting data types
TimeSeries Analysis

Exploratory Data Analysis

The next step in the project is to conduct exploratory data analysis. This involves examining the data to understand its distribution, central tendencies, and correlations between variables.

Hypothesis Testing

Hypothesis testing , a statistical method used to make inferences about a population based on a sample of data. To perform hypothesis testing on the 'SeoulBikeData.csv' dataset, we first start with a null hypothesis (H0) and an alternative hypothesis (H1), then use statistical tests to determine if there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis.

Below is general step-by-step guide on to perform hypothesis testing on a dataset like SeoulBikeData.csv:

Define the Hypotheses
Choose a Significance Level (α)
Select the Test
Perform the Test
Analyze the Results
Draw Conclusions

Feature Enginerring & Data Pre-processing

Handling Missing Values
Handling Outliers
Label Encoding
Textual Data Preprocessing
Feature Manipulation & Selection
Data Transformation
Data Scaling
Dimesionality Reduction
Data Splitting

ML Model Training and Evaluation

The dependent variable is Rented Bike Count is a contionus variable. Hence to Regression ML algorithms are used to train the model to predict the depedent variable.
Following are the ML algorithms on which the model is trained

Ridge Regression (Logistic Regression + L2 Regularization )
Decision Tree Regression
Random Forest Regression

Result

Conclusion

Model Name	Train r2_score	Test r2_score	MAE
Ridge Regression (Base Model)	0.5969834427027465	0.5611509471571063	282.97650291199943
Ridge Regression + GridSearchCV	0.5969815828381584	0.5612334108757271	282.9539718021544
Decision Tree Regression	1.0	0.8453401099843291	133.34609878310667
Decision Tree Regression + GridSearchCV	\| 0.9480705668434853	0.8723154222023835	125.69186170892704
Random Forest Regression	0.9881705009991666	0.9181275336765233	100.85899069434504
Random Forest Regression + GridSearchCV	0.9881705009991666	0.9181275336765233	100.85899069434504

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Notebook		Notebook
data		data
Readme.md		Readme.md

Bhushan0097/02.CAPSTONE.ML.REGRESSION-BikeShareDemandPrediction

Folders and files

Latest commit

History

Repository files navigation

Bike Share Demand Prediction

Overview

Table of Contents

Introduction

Dataset

Dependencies

Documentation

Data Cleaning and Preparation

Exploratory Data Analysis

Hypothesis Testing

Feature Enginerring & Data Pre-processing

ML Model Training and Evaluation

Result

Conclusion

Model Name

Train r2_score

Test r2_score

MAE

Ridge Regression (Base Model)

0.5969834427027465

0.5611509471571063

282.97650291199943

Ridge Regression + GridSearchCV

0.5969815828381584

0.5612334108757271

282.9539718021544

Decision Tree Regression

1.0

0.8453401099843291

133.34609878310667

Decision Tree Regression + GridSearchCV

| 0.9480705668434853

0.8723154222023835

125.69186170892704

Random Forest Regression

0.9881705009991666

0.9181275336765233

100.85899069434504

Random Forest Regression + GridSearchCV

0.9881705009991666

0.9181275336765233

100.85899069434504

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages