Arabic Text Summarization using Transformers, and PyQt5

Overview

This project mainly focuses on Arabic text summarization using transformers. After that we use the summary result with its original text to be evaluated through arabic classification and clustering algorithms to check whether the meaning of the summary matches the original text.

The classification problem is article classification between 5 topics context.
The clustering problem is the same arabic articles clustering between 5 topics.

Dependencies

nltk
Scipy
pickle
transformers==4.19.2
tensorflow-gpu==2.9.1
numpy
pandas
re
time
PyQt5
pyarabic
farasapy
functools
operator
emoji
string
sklearn
plotly

Datasets

Arabic News Articles :
- For Classification and clustering.
- 45,000 Articles with 7 different topics.
WikiLingua :
- For Summarization
- ~ 40,000 Arabic articles with their summaries.

Dealing with project files

The project folder contains the following files:

summarization.ipynb
inference.py
class_clust.ipynb
class_clust_infer.py
MainWindow.py
Arabic_stop_words.txt
champion_models.pickle
objects.pickle

Also it contains the following folders:

arabic Folder
checkpoints Folder

Description

summarization

This file contains all the processes of summarization algorithm
class-clust

This file contains all the processes of building clustering and classification models
inference

This file contains the inference code for summarization that returns the summary of the text to be summarized in the GUI.
class-clust-infer

This file contains the inference code for classification and clustring to be imported in the GUI code file that returns the class and cluster names of the original and summarized text in addition to their similarity score.
main-window

This is the GUI code file that is used for inference.
arabic-stop-words

This is a text file that is used in the preprocessing process in summarization file.
champion-models

This pickle file contains the TF-IDF vectorizer, champion classfier and champion cluster to be loaded in class_clust_infer.py. It has a large size, so you can click here for download.
objects

This pickle file contains the trained tokenizers to be loaded in inference.py file. It has a large size, so you can click here for download.
arabic-folder

This file contains the WikiLingua arabic datasets. You can click here for download.

If you downloaded it, you wouldn't need to run read_text(dir_path, fin) in summarization Notebook.
checkpoints-folder

This file is generated from summarization code file for saving the checkpoints while training in addition to be loaded in inference.py to use the latest checkpoints directly for inference. You can click here for download.

If you downloaded it, you wouldn't need to run Training steps section in summarization notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic Text Summarization using Transformers, and PyQt5

Overview

Dependencies

Datasets

Dealing with project files

Description

summarization

class-clust

inference

class-clust-infer

main-window

arabic-stop-words

champion-models

objects

arabic-folder

checkpoints-folder

GUI Output

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Image		Image
Arabic_stop_words.txt		Arabic_stop_words.txt
MainWindow.py		MainWindow.py
README.md		README.md
Ui_MainWindow.py		Ui_MainWindow.py
champion_models.pickle		champion_models.pickle
class_clust.ipynb		class_clust.ipynb
class_clust_infer.py		class_clust_infer.py
final_data.csv		final_data.csv
inference.py		inference.py
objects.pickle		objects.pickle
summarization.ipynb		summarization.ipynb
untitled.ui		untitled.ui

girgismicheal/Arabic-Text-Summarization-using-Transformers-and-PyQt5

Folders and files

Latest commit

History

Repository files navigation

Arabic Text Summarization using Transformers, and PyQt5

Overview

Dependencies

Datasets

Dealing with project files

Description

summarization

class-clust

inference

class-clust-infer

main-window

arabic-stop-words

champion-models

objects

arabic-folder

checkpoints-folder

GUI Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages