-
This project focuses on predicting bank loan approvals using machine learning techniques.
-
By analyzing historical loan data, we built and evaluated several models to identify key factors influencing loan approval decisions.
-
The Random Forest Classifier emerged as the best-performing model, significantly improving prediction accuracy.
- Data Loading and Initial Exploration
○ Imported necessary libraries, including pandas and google.colab.drive.
○ Loaded the dataset from Google Drive.
○ Displayed the first and last 5 rows of the dataset.
○ Checked the shape of the dataset.
○ Obtained information about the dataset, including the number of rows, columns, and data types of each column.
○ Identified columns with missing values and calculated the percentage of missing values.
- Data Cleaning
○ Dropped the Loan_ID column, as it was not required.
○ Dropped columns with missing values less than 5%.
○ Filled remaining missing values with appropriate strategies (mean, median, or mode).
- Data Preprocessing
○ Converted categorical variables into numerical values using label encoding.
○ Checked for and addressed multicollinearity using correlation matrix and VIF (Variance Inflation Factor).
- Model Building
○ Split the data into training and testing sets.
○ Trained and evaluated several models, including Logistic Regression, Decision Tree Classifier, Random Forest Classifier, K-Nearest Neighbors, and Support Vector Classifier.
○ Used GridSearchCV for hyperparameter tuning of the best performing models.
- Model Evaluation
○ Evaluated models using accuracy scores before and after hyperparameter tuning.
○ Chose the Random Forest Classifier based on performance metrics.
- Model Deployment
○ Saved the trained model using joblib.
○ Loaded the saved model and made predictions on new data.
○ Displayed the loan approval status for each prediction.