This project focuses on Text Summarization, leveraging the power of Natural Language Processing (NLP) and Hugging Face Transformers to condense lengthy text documents into concise and meaningful summaries. It includes the development of robust pipelines for data ingestion and preprocessing, model fine-tuning, and performance evaluation.
Additionally, a FastAPI-based API was created for seamless integration, enabling easy access to the summarization service.
- Utilizes algorithms to identify and extract key sentences from the original text.
- Highlights the most important information while maintaining context.
- Generates new sentences that capture the essence of the original text using transformer-based models.
- Incorporates Hugging Face Transformers (e.g., T5, BART) for coherent and human-like summaries.
- FastAPI-based API for real-time summarization and easy integration into other applications.
- Programming Language: Python
- Libraries: Hugging Face Transformers, NLTK, spaCy
- Web Framework: FastAPI, Streamlit (for an optional user interface)
- Models: Pre-trained transformer models like T5 and BART for abstractive summarization.
This project can work with any text data, including:
- Articles
- Research papers
- News
For testing purposes, sample datasets have been provided in the data/
directory.
- Extractive Summarization: Delivers summaries with an average ROUGE score of 0.75.
- Abstractive Summarization: Achieves natural and concise summaries with fine-tuned transformer models.
- Add multilingual text summarization.
- Incorporate real-time web scraping for dynamic content summarization.
- Enhance model performance using fine-tuning on domain-specific data.
- Expand API functionality for batch summarization.
Transforming Text into Insights with Hugging Face and FastAPI!