Skip to content

Using traditional machine learning and neural networks, we built a model that can identify different types of fake Twitter users, such as fake followers and spammers.

Notifications You must be signed in to change notification settings

bandytan/Fake-Twitter-Account-Detection

Repository files navigation

Fake Twitter Account Detection

The Problem

The omnipresence of bots is not foreign to the Twitter community. Currently, it is estimated that 20% - 29% of content in the US on Twitter is generated by bots (Varanasi, 2022). Some of these bots are harmless, but there exist bots that engage in various fraudulent activities - which is broadly defined as wrongful or criminal deception to result in financial or personal gain. Some examples include manipulating election votes (Metz, 2020) and cryptocurrency scams (Perez, 2022). Ultimately, these fraudulent bots need to be detected fast, and punished accordingly before they bring more harm to users.

Our Solution

Currently, Twitter is culling 1 million bot accounts per day (Sutcliffe, 2022). However, this is far from enough, as bots continue to plague the Twitter space. Furthermore, Twitter admits that fraudulent bot detection is a highly complex and nuanced problem (Twitter, 2021). Therefore, we propose a data-driven approach using a mix of traditional machine learning and neural networks to tackle the uncertainty and complexity of fraudulent bot detection. For simplicity, we shall refer to the fraudulent bots as bots in the report.

Introduction to Codes

  1. scrape_profile_pic.ipynb
    • Scrape the profile picture of Twitter Users
  2. Data Cleaning.ipynb
    • Code to clean data based on the file in scrape_profile_pic.ipynb. E.g. Removal of invalid rows and columns
  3. Get Face.ipynb
    • Read the profile picture of Twitter Users to detect the presence of faces
  4. Graph.ipynb
    • Create the reciprocity feature of users, based on a graph structure
  5. Feature Engineering.ipynb
    • Feature Engineering based on the files generated in Data Cleaning.ipynb, Get Face.ipynb, and Graph.ipynb
  6. All notebooks in /Traditional Models and /Neural Networks
    • Each notebook contains code that trains different ML/NN models and evaluate performance

About

Using traditional machine learning and neural networks, we built a model that can identify different types of fake Twitter users, such as fake followers and spammers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published