Skip to content

This project performs sentiment analysis on a Twitter dataset, aiming to classify tweets into positive, negative, or neutral sentiments. Sentiment analysis is crucial for understanding public opinion on various topics, brands, or events based on social media data.

Notifications You must be signed in to change notification settings

AhmedAbdAlkreem/Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Sentiment Analysis using BERT

Project Overview

This project is aimed at performing sentiment analysis on a Twitter dataset from Kaggle. The main objective is to classify tweets into positive, negative, or neutral sentiments. The project uses BERT (Bidirectional Encoder Representations from Transformers), a state-of-the-art model for NLP tasks, to achieve high accuracy.

The project also includes the preprocessing of raw text data, such as removing special characters, tokenization, lemmatization, removing stopwords, and converting text to lowercase. The model is deployed using Streamlit, enabling a user-friendly web interface for real-time tweet sentiment analysis. Project Features

Data Preprocessing:
    Removing Special Characters: Cleaning the dataset by removing unnecessary characters (e.g., hashtags, mentions, punctuation).
    Tokenization: Splitting text into tokens (individual words or phrases).
    Removing Stopwords: Eliminating common words that don’t contribute to sentiment (e.g., "the", "is", "in").
    Lemmatization: Converting words to their base or root form.
    Lowercasing: Converting all text to lowercase for uniformity.

Sentiment Classification using BERT:
    Fine-tuning the BERT model to predict tweet sentiment (positive, negative, or neutral).

Deployment:
    A Streamlit web application is created to provide a simple interface for users to input tweets and receive sentiment predictions in real-time.

Dataset

The dataset used in this project is obtained from Kaggle. It consists of labeled tweets for sentiment analysis, with three sentiment categories: positive, negative, and neutral.

Kaggle Dataset Link: https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis

Dataset Structure:

Text: The tweet itself.
Sentiment: The label indicating the sentiment (positive, negative, neutral).

Technology Stack

Programming Language: Python
Libraries:
    Pandas: For data manipulation and preprocessing.
    NLTK: For text processing tasks such as stopword, removal, tokenization and lemmatization.
    Transformers (Hugging Face): For implementing the BERT model.
    Streamlit: For deploying the application and creating an interactive web interface.
    Matplotlib, Seaborn: For data visualization.

Setup and Installation

To run this project locally, follow these steps: 1. Clone the repository:

bash

git clone https://github.com/your-username/twitter-sentiment-analysis-bert.git

cd twitter-sentiment-analysis-bert
  1. Download the Dataset:

Download the dataset from Kaggle and place it in the project's data folder. You can use the Kaggle API to download it directly.

bash

kaggle datasets download -d <dataset-name>
  1. Preprocessing the Dataset:

Run the preprocessing script to clean and prepare the data for training:

bash

python preprocess.py

This script performs the following tasks:

Remove special characters, URLs, and mentions.
Tokenize text.
Remove stopwords.
Apply lemmatization.
Convert text to lowercase.
  1. Train the BERT Model:

Once the data is preprocessed, train the BERT model on the dataset:

bash

python train.py

This script will fine-tune the pre-trained BERT model on the Twitter dataset. 5. Run the Streamlit Web Application:

After the model is trained, you can run the Streamlit app for real-time sentiment analysis:

bash

streamlit run app.py

This will launch a web interface where you can input tweets and receive sentiment predictions.

Results

The BERT model achieves 0.76% accuracy on the test set, outperforming traditional machine learning approaches due to its ability to capture the contextual meaning of words in tweets. Detailed evaluation metrics such as precision, recall, and F1-score are available in the training logs.

Conclusion

This project demonstrates the application of state-of-the-art NLP models (BERT) for sentiment analysis on social media data, leveraging both text preprocessing and deep learning. The deployment using Streamlit provides a simple interface for real-time sentiment predictions, making it a useful tool for businesses or researchers to gauge public sentiment.

About

This project performs sentiment analysis on a Twitter dataset, aiming to classify tweets into positive, negative, or neutral sentiments. Sentiment analysis is crucial for understanding public opinion on various topics, brands, or events based on social media data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published