Skip to content

konkinit/topic_modeling

Repository files navigation

Topic Modeling with BERTopic

GitHub Workflow Status
GitHub License Python Version Code Style
GitHub repo size Docker Image Size (latest by date)

ToDo

  • Write Unit Tests ...

Project Description

The project combines the power of BERTopic and Streamlit to deliver an interface performing an end-to-end topic modeling. Instructions and more details are provided in the app ...

archi_plot

Getting Started

  • Through Docker Image: the image extends a Pytorch image while installing on this layer some RAPIDS packages such as cuML and cuDF. RAPIDS - Open GPU Data Science packages require GPU hardware to run the container. Visit the following link to install the nvidia-container-toolkit
docker pull kidrissa/bertopicapp:latest
docker run --gpus all -p 8501:8501 -d kidrissa/bertopicapp:latest 
  • Through Repo cloning (to be performed preferable on a Linux-based OS):
git clone https://github.com/konkinit/topic_modeling.git
cd topic_modeling/
bash package_installing.sh
streamlit run ./src/frontend/Onboarding.py

Continious Integration

One continious integration (CI) procedure with 2 jobs mainly is crafted and launched at every push to the main branch:

  • Pytest collects the test from the tests folder and executes them
  • if Testing passes, a Docker Image is built and pushed onto the docker hub

Citation

@article{grootendorst2022bertopic,
  title={BERTopic: Neural topic modeling with a class-based TF-IDF procedure},
  author={Grootendorst, Maarten},
  journal={arXiv preprint arXiv:2203.05794},
  year={2022}
}

About

A BERTopic-based modeling project

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published