Skip to content

marcospaulo429/tweets-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Tweets Processor

Introduction

The use of artificial intelligence in the financial market has been growing annually due to the great capacity of these algorithms to help in decision-making, trend prediction and sentiment analysis. In the context of sentiment analysis, this research was conducted with the objective of creating an efficient pipeline for pre-processing tweets about Brazilian stock exchange shares to prepare them for a next stage to be developed in the master's research of co-author Leandro Araújo (Language model for the Brazilian stock market: An approach based on sentiment analysis using the BERTimbau model)

Objective

Test and analyze natural language preprocessing techniques in Portuguese-language tweets about the Brazilian financial market.

Methods

1- Lowercase. 2- Remove SPAM (example: “I beg for help”). 3-Remove URL. 4-Remove RT. 5-Remove emails. 6-Remove mentions (@). 7-Remove hashtags. 8-Remove \n. 9-Remove numbers. 10-Remove unnecessary symbols. 11-Remove tweets that are not in Brazilian Portuguese. 12-Normalizer. 13-Remove accents. 14-Remove stopwords. 15-Remove swear words. 16-Stemmatization, 17-Tokenization, TF-IDF and Bag of Words.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published