This Project was done primarily to shed more lights on the responses from President Trump and Vice President Biden,
during the Presidential debates on Sept 29th and 22nd Oct 2020.
The data for this entire project was scraped from rev.com
The Scientific methodology for this project includes the following:-
- Web-Scraping: Via requests and BeautifulSoup libraries
- Data cleaning and Pre-processing
- Speech-Percentage Computation
- Lexical-Diversity analysis
- TFIDF Computation
- Text Tokenization
- Stopwords Removal: Via nltk library
- Punctuations Removal: Via String library
- Text Lemmatization: via nltk WordNetLemmatizer
- Sentiments Analysis: Via Microsoft Azure Text-Analytics-Client
- Key-Phrase-Extraction: Via Microsoft Azure Text-Analytics-Client
- Bayesian Inference
* The first debate was on September 29, 2020. It was moderated by Chris Wallace
of Fox News.
* The second debate was originally scheduled for October 15th, but was cancelled
due to Trump’s bout of COVID19, and held a week later. After his
‘rather-theatrical-and-spectacular-recovery’. This debate was moderated by
Kristen Welker of NBC News.
* Kindly see the analysis within the analysis_dir folder
and pay attention to the defined methods and intuition of the analysis.
* From this exercise, I have been able to compare and contrast Trump and Biden's
language style, sentiments, responses to key questions and understanding.
* This Project gives the American public and the world at large, the rare insights
to the lexical signature and language structure of President trump and Vice-President Biden.
* The project also explains some of their responses to Key-Areas such as racism, The US Economy,
Health-care, Jobs/Wages/Taxes and The American People.
As a Top-Writer in Artificial Intelligence, I have taken the time to expressively, explain my findings in a conversational and less-technical manner for all. Kindly read the post in the Towards_AI publication in the Medium.
To Follow along, kindly install/import the following libraries...
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pywaffle import Waffle
from PIL import Image
import nltk
from nltk import word_tokenize
nltk.download('stopwords')
# For stopwords removalnltk.download('punkt')
# for tokenizationfrom nltk.corpus import stopwords
*from nltk import WordNetLemmatizer
# To lemmatize sentencesnltk.download('wordnet')
from nltk.stem.porter import PorterStemmer
from wordcloud import WordCloud
import spacy
from collections import defaultdict, Counter
from bs4 import BeautifulSoup
import requests
import string
import math
This project and all resources abide under the MIT license in the root directory.