In this project work, the problem lies in detection of sentiment of a given input text corpus. The problem is to calculate the accuracy of proposed algorithm against the large hotel review dataset. In the end we calculate the accuracy test data with the trained classification model. We classify all the given hotel reviews based on the ratings given by the customers. At the end, the graphs were plotted to demonstrate the obtained outputs.
The dataset for the Hotel Review is collected and loaded from the below link: https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe The dataset consists of 17 features (attributes) and each feature has 515378 samples. It is stored in a tabular form. The rows being the samples and the columns being the features. We first start by loading the raw data. Each textual review is splitted into a positive and a negative part. We group them together in order to start with the raw data and no other information.