Twitter Sentiment Analysis using BERT

Project Overview

This project is aimed at performing sentiment analysis on a Twitter dataset from Kaggle. The main objective is to classify tweets into positive, negative, or neutral sentiments. The project uses BERT (Bidirectional Encoder Representations from Transformers), a state-of-the-art model for NLP tasks, to achieve high accuracy.

The project also includes the preprocessing of raw text data, such as removing special characters, tokenization, lemmatization, removing stopwords, and converting text to lowercase. The model is deployed using Streamlit, enabling a user-friendly web interface for real-time tweet sentiment analysis. Project Features

Data Preprocessing:
    Removing Special Characters: Cleaning the dataset by removing unnecessary characters (e.g., hashtags, mentions, punctuation).
    Tokenization: Splitting text into tokens (individual words or phrases).
    Removing Stopwords: Eliminating common words that don’t contribute to sentiment (e.g., "the", "is", "in").
    Lemmatization: Converting words to their base or root form.
    Lowercasing: Converting all text to lowercase for uniformity.

Sentiment Classification using BERT:
Fine-tuning the BERT model to predict tweet sentiment (positive, negative, or neutral).

Deployment:
A Streamlit web application is created to provide a simple interface for users to input tweets and receive sentiment predictions in real-time.

Dataset

The dataset used in this project is obtained from Kaggle. It consists of labeled tweets for sentiment analysis, with three sentiment categories: positive, negative, and neutral.

Kaggle Dataset Link: https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis

Dataset Structure:

Text: The tweet itself. Sentiment: The label indicating the sentiment (positive, negative, neutral).

Technology Stack

Programming Language: Python Libraries: Pandas: For data manipulation and preprocessing. NLTK: For text processing tasks such as stopword, removal, tokenization and lemmatization. Transformers (Hugging Face): For implementing the BERT model. Streamlit: For deploying the application and creating an interactive web interface. Matplotlib, Seaborn: For data visualization.

Setup and Installation

To run this project locally, follow these steps: 1. Clone the repository:

bash git clone https://github.com/your-username/twitter-sentiment-analysis-bert.git cd twitter-sentiment-analysis-bert

Download the Dataset:

Download the dataset from Kaggle and place it in the project's data folder. You can use the Kaggle API to download it directly.

bash kaggle datasets download -d <dataset-name>

Preprocessing the Dataset:

Run the preprocessing script to clean and prepare the data for training:

bash python preprocess.py

This script performs the following tasks:

Remove special characters, URLs, and mentions. Tokenize text. Remove stopwords. Apply lemmatization. Convert text to lowercase.

Train the BERT Model:

Once the data is preprocessed, train the BERT model on the dataset:

bash python train.py

This script will fine-tune the pre-trained BERT model on the Twitter dataset. 5. Run the Streamlit Web Application:

After the model is trained, you can run the Streamlit app for real-time sentiment analysis:

bash streamlit run app.py

This will launch a web interface where you can input tweets and receive sentiment predictions.

Results

The BERT model achieves 0.76% accuracy on the test set, outperforming traditional machine learning approaches due to its ability to capture the contextual meaning of words in tweets. Detailed evaluation metrics such as precision, recall, and F1-score are available in the training logs.

Conclusion

This project demonstrates the application of state-of-the-art NLP models (BERT) for sentiment analysis on social media data, leveraging both text preprocessing and deep learning. The deployment using Streamlit provides a simple interface for real-time sentiment predictions, making it a useful tool for businesses or researchers to gauge public sentiment.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
App.py		App.py
README.md		README.md
fine-tune-bert-model.ipynb		fine-tune-bert-model.ipynb
sentment analysis.pptx		sentment analysis.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Sentiment Analysis using BERT

About

Releases

Packages

Languages

AhmedAbdAlkreem/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter Sentiment Analysis using BERT

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages