Skip to content

Latest commit

 

History

History
94 lines (60 loc) · 5.06 KB

README.md

File metadata and controls

94 lines (60 loc) · 5.06 KB

Wine Quality Prediction using Linear Regression

This project aims to predict the quality of wines based on various features using linear regression. The dataset used for this project is sourced from Kaggle, a popular platform for data science competitions and datasets.

Dataset

The dataset used for this project can be found on Kaggle under the name "Red Wine Quality". It contains a collection of red and white wine samples with their corresponding quality ratings. The dataset includes various chemical and sensory features that describe the properties of each wine sample. The dataset can be downloaded from the following link: Kaggle Dataset

Context

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).

Dependencies

To run this project, the following dependencies are required:

  • Python 3.6 or above
  • NumPy
  • Pandas
  • Matplotlib
  • Scikit-learn

You can install the required packages using pip:

pip install numpy pandas matplotlib scikit-learn

Usage

  1. Download the dataset from the provided Kaggle link and save it in the project directory.

  2. Run the Red_Wine_Quality.ipynb file, which contains the code for data preprocessing, model training, and evaluation.

  3. The script will load the dataset, preprocess the data, split it into training and testing sets, train a linear regression model, and evaluate its performance.

  4. After the model training is completed, it will predict the wine quality for the testing set and display the evaluation metrics such as mean absolute error, mean squared error, and R-squared score.

Input variables (based on physicochemical tests):

  1. fixed acidity
  2. volatile acidity
  3. citric acid
  4. residual sugar
  5. chlorides
  6. free sulfur dioxide
  7. total sulfur dioxide
  8. density
  9. pH
  10. sulphates
  11. alcohol

Output variable (based on sensory data): quality (score between 0 and 10)

Coefficient Values

alt text

Regression Evaluation Metrics

Here are three common evaluation metrics for regression problems: Mean Absolute Error (MAE) is the mean of the absolute value of the errors: alt text

Mean Squared Error (MSE) is the mean of the squared errors: alt text

Root Mean Squared Error (RMSE) is the square root of the mean of the squared errors: alt text

Comparing these metrics:

  • MAE is the easiest to understand because it’s the average error.
  • MSE is more popular than MAE because MSE “punishes” larger errors, which tends to be useful in the real world.
  • RMSE is even more popular than MSE because RMSE is interpretable in the “y” units.

Results

The results of the wine quality prediction are displayed after the model training and evaluation. These results include the evaluation metrics and can be used to assess the accuracy and performance of the linear regression model. alt text

License

The dataset used in this project is subject to the licensing terms provided by Kaggle. Please refer to the dataset's license for more details.

Contributing

Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.

Contact

For any questions or inquiries, please contact [[email protected]].