Wine Quality Prediction using Linear Regression

This project aims to predict the quality of wines based on various features using linear regression. The dataset used for this project is sourced from Kaggle, a popular platform for data science competitions and datasets.

Dataset

The dataset used for this project can be found on Kaggle under the name "Red Wine Quality". It contains a collection of red and white wine samples with their corresponding quality ratings. The dataset includes various chemical and sensory features that describe the properties of each wine sample. The dataset can be downloaded from the following link: Kaggle Dataset

Context

The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).

Dependencies

To run this project, the following dependencies are required:

Python 3.6 or above
NumPy
Pandas
Matplotlib
Scikit-learn

You can install the required packages using pip:

pip install numpy pandas matplotlib scikit-learn

Usage

Download the dataset from the provided Kaggle link and save it in the project directory.
Run the Red_Wine_Quality.ipynb file, which contains the code for data preprocessing, model training, and evaluation.
The script will load the dataset, preprocess the data, split it into training and testing sets, train a linear regression model, and evaluate its performance.
After the model training is completed, it will predict the wine quality for the testing set and display the evaluation metrics such as mean absolute error, mean squared error, and R-squared score.

Input variables (based on physicochemical tests):

fixed acidity
volatile acidity
citric acid
residual sugar
chlorides
free sulfur dioxide
total sulfur dioxide
density
pH
sulphates
alcohol

Output variable (based on sensory data): quality (score between 0 and 10)

Coefficient Values

Regression Evaluation Metrics

Here are three common evaluation metrics for regression problems: Mean Absolute Error (MAE) is the mean of the absolute value of the errors: $alt text$

Mean Squared Error (MSE) is the mean of the squared errors: $alt text$

Root Mean Squared Error (RMSE) is the square root of the mean of the squared errors: $alt text$

Comparing these metrics:

MAE is the easiest to understand because it’s the average error.
MSE is more popular than MAE because MSE “punishes” larger errors, which tends to be useful in the real world.
RMSE is even more popular than MSE because RMSE is interpretable in the “y” units.

Results

The results of the wine quality prediction are displayed after the model training and evaluation. These results include the evaluation metrics and can be used to assess the accuracy and performance of the linear regression model.

License

The dataset used in this project is subject to the licensing terms provided by Kaggle. Please refer to the dataset's license for more details.

Contributing

Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.

Contact

For any questions or inquiries, please contact [m.arpanareddy18@gmail.com].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Wine Quality Prediction using Linear Regression

Dataset

Context

Dependencies

Usage

Input variables (based on physicochemical tests):

Coefficient Values

Regression Evaluation Metrics

Results

License

Contributing

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Wine Quality Prediction using Linear Regression

Dataset

Context

Dependencies

Usage

Input variables (based on physicochemical tests):

Coefficient Values

Regression Evaluation Metrics

Results

License

Contributing

Contact