Welcome to the Opt Out Tools (OOT) Machine Learning R&D repository. This repository contains the research and the production code allowing us to build a machine learning model for the automatic detection of online misogyny on Twitter.
A first version of this model is currently in use in the Opt Out browser extension. The extension is currently itself in its alpha version and available for download in the Firefox add-ons library. A data statement of the dataset used for the first version of the model can be found on OOT's website.
Please read the CONTRIBUTING.md file in this repository to know how you can contribute to it.
Quick links:
This repository has two purposes:
- Researching online misogyny automatic detection, i.e. exploring hate speech datasets and experimenting with machine learning algorithms.
- Building a machine learning model for the browser extension based on our research.
├── .circleci <- Folder containing the CircleCI configuration file for this repository.
├── .github/ISSUE_TEMPLATE <- Folder containing templates to create different types of issues for this
│ repository.
├── data <- Folder for copying the OOT dataset and for documenting other datasets that
│ tackle the problem of misogyny/hate speech and their labeling process.
├── docs <- Folder containing the files necessary to produce documentation with
│ Sphinx.
├── models <- Folder for saving trained and serialized models fit for production.
├── notebooks <- Folder for saving Jupyter notebooks.
├── reports <- Folder for saving reports generated with Sphinx (HTML, PDF,
│ LaTeX, etc.).
├── src <- Folder containing the source code to train models. The source code currently
│ │ runs preprocessing pipelines, error analysis scripts and acceptance criteria
│ │ scripts.
│ └── text <- Folder containing the utility modules for text processing in the pipeline.
├── stages <- Folder containing the files necessary to run the machine learning pipeline.
├── tests <- Folder for saving tests for the machine learning pipeline to make sure that
│ the source code works as expected.
├── .flake8 <- Linter file necessary to format code to the OOT standards.
├── .pre-commit-config.yaml <- List of the scripts run at the pre-commit stage.
├── .pylintrc <- Linter file necessary to format code to the OOT standards.
├── CONTRIBUTING.md <- Instructions on how to contribute to this repository.
├── Dvcfile <- Default stage (i.e evaluation stage) for the machine learning pipeline.
├── LICENSE <- Folder containing the license for use of this repository.
├── README.md <- General information about this repository.
├── mypy.ini <- File necessary to allow types in Python.
├── opt_out_logo.png <- Logo used in the README of this repository.
├── requirements.txt <- Requirements file for reproducing the analysis environment.
└── setup.py <- Configuration file for the source code.
This repository is managed by the Opt Out Tools data team. If you have any question, please reach out to one of the following members of the team on Github:
- Andrada:
andra-pumnea
- Verena:
Ver2307
We use CircleCI for CI/CD. You can always check if anything is broken in the repository in this section.
NOTE: We do not currently have an automated model deployment mechanism.
Please note that this repository is part of the Opt Out Tools project which is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Project structure based on the cookiecutter data science project template. #cookiecutterdatascience