This project introduces an innovative approach to abstractive text summarization, focusing on conversational text using the Pegasus model and the SAMSum dataset. The SAMSum corpus, known for its informal and interactive style, presents unique challenges not typically addressed by standard summarization models. The approach includes a thorough preprocessing of SAMSum, aligning it with Pegasus's input requirements, followed by detailed fine-tuning. This process integrates hyperparameter optimization, adaptive transfer learning, and a focused training regimen, emphasizing the preservation of conversational context. The model's effectiveness is evaluated using both objective metrics like ROUGE-N, ROUGE-L, and BLEU, and subjective human assessments, ensuring the summaries' coherence, relevance, and readability. The results show a notable improvement in summarizing conversational texts, surpassing existing benchmarks and contributing significantly to NLP, particularly in enhancing dialogue summarization and model adaptability in domain-specific contexts.
Frontend URL: https://text-summarization-psi.vercel.app/
- Nextjs 14
- TypeScript
- TailwindCSS
- Shadcn.UI
- Python
- DVC
- MLFlow
- Transformers & HuggingFace
- FastAPI
- OOP Principles
- Fine-tuned Pegasus model to summarize conversational texts with high accuracy.
- Special focus on maintaining the integrity and context of the original conversation in the summaries.
- Utilization of the SAMSum dataset ensures the model is specifically trained for dialogue-based texts.
- Rigorous testing using both objective metrics and subjective human assessments to ensure the quality of summaries.
- Incorporation of a frontend with technologies like Next.js and TailwindCSS for ease of use and accessibility.
- The system incorporates extensive logging and custom exception handling mechanisms to monitor the application's performance, detect issues, and ensure a seamless user experience. These mechanisms provide detailed information on errors, warnings, and system events, enabling developers to troubleshoot and improve the system continuously.
The SAMSum dataset, central to this project, is a collection of conversational texts designed specifically for summarization tasks. It includes a variety of dialogue-based texts, providing a rich resource for training models to understand and summarize conversational nuances effectively.
Dataset Link: https://huggingface.co/datasets/samsum
Pegasus is an advanced text summarization model developed by Google and available on Hugging Face. It uses a Transformer-based architecture, specifically optimized for abstractive text summarization tasks.
- Pre-training Technique: Pegasus is pre-trained with a novel gap-sentence objective, enhancing its ability to generate coherent and concise summaries.
- Abstractive Summarization: Unlike extractive models, Pegasus paraphrases and condenses the original text, providing more fluent and readable summaries.
The model has been fine-tuned on the CNN/DailyMail dataset, a collection of news articles, making it particularly effective for summarizing journalistic content.
- Objective Metrics: The model's performance is evaluated using metrics like ROUGE-N, ROUGE-L, and BLEU.
- Subjective Assessments: In addition to objective metrics, subjective human assessments are used to ensure the quality of summaries in terms of coherence, relevance, and readability.
Ideal for applications in news summarization and other domains requiring concise representation of textual content.
For more details, visit the Hugging Face Pegasus model page.
- Update config.yaml
- Update secrets.yaml [Optional]
- Update params.yaml
- Update the entity
- Update the configuration manager in src config
- Update the components
- Update the pipeline
- Update the main.py
- Update the dvc.yaml
- app.py
Clone the repository
git clone https://github.com/Priyanshu9898/Text-Summarization.git
cd Text-Summarization
cd backend
python -m venv env
env\Scripts\activate
pip install -r requirements.txt
python app.py
python main.py
cd frontend
npm install
npm run dev
npm run build
Model | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-Lsum |
---|---|---|---|---|
Pegasus | 0.024595 | 0.0 | 0.024311 | 0.024418 |
Insert gif or link to demo
Add badges from somewhere like: shields.io