This project is part of the ID2221 Data-Intensive Computing course. It involves designing and implementing a real-time data streaming system for sentiment analysis on political tweets. The system utilizes Hugging Face's transformer library to perform sentiment analysis and Apache Spark to handle a data stream.
This repository contains a demo of a data intensive application. The application is a complete data pipeline that ingests data from a source, processes it, and stores it in a database. The application is implemented in Python and uses the following technologies:
- Kafka for the message broker
- Spark for the data processing
- Transformer for sentiment analysis
- MongoDB for the database