This project demonstrates the implementation of multiple variable linear regression from scratch using Python. The goal is to predict a student's performance index based on the following factors:
- Hours Studied
- Previous Scores
- Extracurricular Activities
- Sleep Hours
- Sample Question Papers Practiced
The dataset used for this project is available on Kaggle: Student Performance Dataset.
-
Z-Score Normalization
Standardizes the data to center it around 0 and scale it to unit variance. -
Custom Gradient Descent
Performs weight and bias updates iteratively to minimize the Mean Squared Error (MSE). -
Cost Function
Evaluates the model's performance using MSE. -
Feature Engineering
Adds polynomial features for better model accuracy. -
Visualization
Visualizes actual vs. predicted values for better understanding of the model's fit. -
R² Score Calculation
Measures the goodness of fit for the regression model. -
Custom Predictions
Predicts the performance index for new input data.
- Python 3.x
- Libraries:
numpy
,pandas
,matplotlib
-
Clone this repository.
git clone https://github.com/KartikAg13/student_performance_prediction.git cd student_performance_prediction
-
Download the dataset from the Kaggle link and place it in the root folder.
-
Run the Notebook
Open the Python notebook file in Jupyter or any compatible environment and execute the cells step-by-step. -
Make Predictions
Input new data in the format[Hours Studied, Previous Scores, Extracurricular Activities, Sleep Hours, Sample Question Papers Practiced]
to predict the Performance Index.
- Initial Cost: Evaluates the model's cost before training.
- Final Cost: Reduced cost after applying gradient descent.
- R² Score: Indicates how well the regression model fits the data.
main.ipynb
: Main implementation notebook.README.md
: Project documentation.
Input:
x_predict = np.array([6, 71, 1, 8, 2])
Output:
Predicted Performance Index: 85.43
Contributions are welcome! Please fork this repository and submit a pull request with your changes.