Fintech Project 2 - Analysis Of Crypto Pricing and Ukraine War Twitter Sentiment
An analysis of Twitter Data based on Ukraine and crypto queries. This data was cleaned, and then run through sentiment analysis, looking for relationships between crypto prices and twitter sentiment on war/crypto topics.
See installation guide below for specifics on setting up your environment.
The data source for the sentiment analysis is Twitter Search API, specifically the /2/tweets/search endpoint.
- To run this application, you will need to open an account in the Twitter Developer platform to obtaion a bearer token. See config_example.py for how to stage your bearer token.
- Modify the query
- by default the data_input.py will grab the past 7 days of data which is the limit of the api
- query.py contains the
TwitterQuery
class for managing the querying and data scraping/prep - utils.py wraps the cleaning such as:
- Case normalization/ standardizing text
- Removing Unicode Characters (Punctuation, Emoji’s, URL’s and @’s)
- Removing hyperlinks, marks and styles
- Removing Stopwords (words that don’t value)
- Stemming / Lemmatizing text
- Tokenize tweets text
- resultant data is saved as .csv format in
./data/
folder, eg 2022323_144.csv
BTC Price Data from FTX BTC Feed
NOTE: Data Prep details in slide presentation pages 4 and 5.
Analysis Steps:
- sentiment.py is used to analyze the csv data and generate sentiment infer csv
- consolidate_data.py reads in infer csv data, and BTC data, generates training data and plots
- keras_train.py consumes train_dataset.csv
NOTE: Model details contained in slide presentation page 6.
This proect uses python 3.7 and the following modules:
- time
- datetime
- re
- pandas
- numpy
- keras
- tensorflow
- wordcloud
- nltk
- json
- folium
- requests
- pysentimiento
- rfc3339
- tqdm
- iso8601
See installation guide below for specifics on setting up your environment.
You will need Python 3.7 for this application to run. An easy way to install python 3.7 is to download and install Anaconda. After installing anaconda, open a terminal/command-prompt, and setup a python 3.7 environment, and then activate it like so:
# creating a python 3.7 environment
# name can be any friendly name to refer to your environment, eg 'dev'
conda create --name dev python=3.7 anaconda
# activating the environment
conda activate dev
Next, use pip to install the required modules from the list above
# instaling required modules
$ pip install pandas
$ pip install numpy
$ etc...
You are now ready to run the program!
IMPORTANT NOTE Twitter API Usage
You must sign up for a Twitter API key in order to authenticate and fetch twitter data.
See config_example for how to stage your Twitter API Bearer Token
Also, allow time and apply for an academic twitter api key, and not the free tier. This will open up a significantly higher usage and data granularity limit. Unfortunately with the free tier, you have limits on the amount of data you can pull.
WordCloud Generates a wordcloud visual from query data.
Peter Morales
Shivangi Gupta
Jaime Aranda
David Lopez
MIT