How to run Backend?

Text summarization

This project introduces an innovative approach to abstractive text summarization, focusing on conversational text using the Pegasus model and the SAMSum dataset. The SAMSum corpus, known for its informal and interactive style, presents unique challenges not typically addressed by standard summarization models. The approach includes a thorough preprocessing of SAMSum, aligning it with Pegasus's input requirements, followed by detailed fine-tuning. This process integrates hyperparameter optimization, adaptive transfer learning, and a focused training regimen, emphasizing the preservation of conversational context. The model's effectiveness is evaluated using both objective metrics like ROUGE-N, ROUGE-L, and BLEU, and subjective human assessments, ensuring the summaries' coherence, relevance, and readability. The results show a notable improvement in summarizing conversational texts, surpassing existing benchmarks and contributing significantly to NLP, particularly in enhancing dialogue summarization and model adaptability in domain-specific contexts.

Frontend URL: https://text-summarization-psi.vercel.app/

Backend URL:

Technology Stack

Frontend:

Nextjs 14
TypeScript
TailwindCSS
Shadcn.UI

Backend:

Python
DVC
MLFlow
Transformers & HuggingFace
FastAPI
OOP Principles

Project's Features

Advanced Summarization Capabilities:
Fine-tuned Pegasus model to summarize conversational texts with high accuracy.
Contextual Understanding:
Special focus on maintaining the integrity and context of the original conversation in the summaries.
Optimized for Conversational Data:
Utilization of the SAMSum dataset ensures the model is specifically trained for dialogue-based texts.
Comprehensive Model Evaluation:
Rigorous testing using both objective metrics and subjective human assessments to ensure the quality of summaries.
User-Friendly Interface:
Incorporation of a frontend with technologies like Next.js and TailwindCSS for ease of use and accessibility.
Logging and Custom Exception Handling:
The system incorporates extensive logging and custom exception handling mechanisms to monitor the application's performance, detect issues, and ensure a seamless user experience. These mechanisms provide detailed information on errors, warnings, and system events, enabling developers to troubleshoot and improve the system continuously.

Dataset Information

The SAMSum dataset, central to this project, is a collection of conversational texts designed specifically for summarization tasks. It includes a variety of dialogue-based texts, providing a rich resource for training models to understand and summarize conversational nuances effectively.

Dataset Link: https://huggingface.co/datasets/samsum

Pegasus Model on Hugging Face (CNN/DailyMail)

Overview

Pegasus is an advanced text summarization model developed by Google and available on Hugging Face. It uses a Transformer-based architecture, specifically optimized for abstractive text summarization tasks.

Unique Features

Pre-training Technique: Pegasus is pre-trained with a novel gap-sentence objective, enhancing its ability to generate coherent and concise summaries.
Abstractive Summarization: Unlike extractive models, Pegasus paraphrases and condenses the original text, providing more fluent and readable summaries.

Fine-Tuning on CNN/DailyMail

The model has been fine-tuned on the CNN/DailyMail dataset, a collection of news articles, making it particularly effective for summarizing journalistic content.

Performance and Evaluation

Objective Metrics: The model's performance is evaluated using metrics like ROUGE-N, ROUGE-L, and BLEU.
Subjective Assessments: In addition to objective metrics, subjective human assessments are used to ensure the quality of summaries in terms of coherence, relevance, and readability.

Applications

Ideal for applications in news summarization and other domains requiring concise representation of textual content.

Further Information

For more details, visit the Hugging Face Pegasus model page.

Workflows

Update config.yaml
Update secrets.yaml [Optional]
Update params.yaml
Update the entity
Update the configuration manager in src config
Update the components
Update the pipeline
Update the main.py
Update the dvc.yaml
app.py

How to run Backend?

STEPS:

Clone the repository

git clone https://github.com/Priyanshu9898/Text-Summarization.git

STEP 01- Create a Python environment after opening the repository

cd Text-Summarization
cd backend
python -m venv env

env\Scripts\activate

STEP 02- install the requirements

pip install -r requirements.txt

STEP 03- Run the Flask Backend

python app.py

STEP 04- Run the Training Pipeline

python main.py

How to run Frontend?

STEP 01- Go to client

cd frontend

STEP 02- install the requirements

npm install

STEP 03- Run the NextJS frontend

npm run dev

STEP 04- Build the frontend

npm run build

Pegasus Model Performance

Model	ROUGE-1	ROUGE-2	ROUGE-L	ROUGE-Lsum
Pegasus	0.024595	0.0	0.024311	0.024418

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
backend		backend
frontend		frontend
LICENSE		LICENSE
README.md		README.md
Text s.pdf		Text s.pdf
Text summarization paper.docx		Text summarization paper.docx
Text_Summarization.ipynb		Text_Summarization.ipynb

License

Priyanshu9898/Text-Summarization

Folders and files

Latest commit

History

Repository files navigation

Text summarization

Frontend URL: https://text-summarization-psi.vercel.app/

Backend URL:

Technology Stack

Frontend:

Backend:

Project's Features

Advanced Summarization Capabilities:

Contextual Understanding:

Optimized for Conversational Data:

Comprehensive Model Evaluation:

User-Friendly Interface:

Logging and Custom Exception Handling:

Dataset Information

Dataset Link: https://huggingface.co/datasets/samsum

Pegasus Model on Hugging Face (CNN/DailyMail)

Overview

Unique Features

Fine-Tuning on CNN/DailyMail

Performance and Evaluation

Applications

Further Information

Workflows

How to run Backend?

STEPS:

STEP 01- Create a Python environment after opening the repository

STEP 02- install the requirements

STEP 03- Run the Flask Backend

STEP 04- Run the Training Pipeline

How to run Frontend?

STEP 01- Go to client

STEP 02- install the requirements

STEP 03- Run the NextJS frontend

STEP 04- Build the frontend

Pegasus Model Performance

🔗 Links

Demo

Badges

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages