wine-connoisseur

In this project I have explored Red Wine Quality dataset and further tried to predict wine quality from the data.

With the increase in social consumption, the red wine industry has recently grown exponentially. Today, industry participants are using product quality certification to promote their products. This is a time-consuming process that requires evaluation by human experts, making this process very expensive.
With the human evaluation being a rather abstract concept this dataset aims to classify the quality of wine purely on the basis of its chemical composition.

Dataset

The dataset populated with chemical components of the wine samples has over 1600 rows with columns describing various aspects such as:

Fixed Acidity
Volatile Acidity
Citric Acid
Residual Sugar
Chlorides
Free Sulfer Dioxide
Total Sulfer Dioxide
Density
pH
Sulphates
ALcohol

Training Methods

Intial Training

The initial training of model is performed using 3 models which on further analysis will be further optimized for best results. The models that are used are:

Random Forest Classifier: Random forest classifier creates a set of decision trees from randomly selected subset of training set. It then aggregates the votes from different decision trees to decide the final class of the test object.
Stochastic Gradient Descent Classifier: Stochastic Gradient Descent (SGD) is a simple yet efficient optimization algorithm used to find the values of parameters/coefficients of functions that minimize a cost function. In other words, it is used for discriminative learning of linear classifiers under convex loss functions such as SVM and Logistic regression.
Support Vector Classifier: A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they're able to categorize new text.

Improvements

Having the results from the basic training we use optimization techniques to get the best result that can be achieved.

Grid Search: Grid search is a process that searches exhaustively through a manually specified subset of the hyperparameter space of the targeted algorithm.
Cross Validation: We will be sing Cross Validation for Random Forest results.
Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation.

Requirements

Setting up the python requirements using requirements.txt
pip install -r requirements.txt

Credits

A lot of my code was inspired by this notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
README.md		README.md
requirements.txt		requirements.txt
wine.ipynb		wine.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wine-connoisseur

Dataset

Training Methods

Intial Training

Improvements

Requirements

Credits

About

Releases

Packages

Languages

aryankargwal/wine-connoisseur

Folders and files

Latest commit

History

Repository files navigation

wine-connoisseur

Dataset

Training Methods

Intial Training

Improvements

Requirements

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages