Automatic Text Summarizer Using Wu-Palmer Measure and Topological Sentence Selection (V 0.1)

This code is based on an unfinished paper of mine that uses a version of topological sorting and Wu-Palmer measure and topological sentence selection to summarize a corpus. I've written a short script to demonstrate the methodology used in the paper.

Paper Methodology

Coming Soon.

Pre-Requisites

You need the following libraries for Python to be installed in your computer.

NLTK

You can install NLTK via pip by executing: 
pip install nltk

Summa

Along with our summarizer script, I've attached a summarizer script Summa that uses Text Rank. We have used this script to compare results generated by Text Rank to our script.

You can install Summa via pip by executing:
pip install summa

Pre-Processing

Before running our summarizer script, you need to run the pre-processor script (which I've written yet). The preprocessor script performs the following tasks:

1. Remove stop words from the corpus using a standard list. 
2. Prune off punctuations & numeric values as they do not affect the quality of sentence selection.
3. Remove symbolic short forms such as Mr.,Ms.,Dr.,Rs.,&,%,$ etc. 
4. Expand texual short forms such as It's , That's , What's etc. 
5. Form a list of sentences from the corpus after preprocessing through steps 1 to 4 are complete.

Running the Script

You can run the script simply by typing the following in terminal:

python v01.py

Selecting the Corpus

I've already attached a few manually pre-processed text files in the 'Corpus-Collection' folder. However, you can use any passage of your choice as long as it is parsable into a string by Python. You can simply edit the line:

file=open('Corpus-Collection/text4.txt','r')

Changing The Percetage of Summarization

By default the percentage_summarization value has been set to 0.5 indicating 50% summarization. You can change the factor by simply editing the line:

percentage_summarization=0.5

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Corpus-Collection		Corpus-Collection
LICENSE		LICENSE
README.md		README.md
textrank.py		textrank.py
v01.py		v01.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Text Summarizer Using Wu-Palmer Measure and Topological Sentence Selection (V 0.1)

Paper Methodology

Pre-Requisites

NLTK

Summa

Pre-Processing

Running the Script

Selecting the Corpus

Changing The Percetage of Summarization

About

Releases

Packages

Languages

License

RishavR/TopoRank

Folders and files

Latest commit

History

Repository files navigation

Automatic Text Summarizer Using Wu-Palmer Measure and Topological Sentence Selection (V 0.1)

Paper Methodology

Pre-Requisites

NLTK

Summa

Pre-Processing

Running the Script

Selecting the Corpus

Changing The Percetage of Summarization

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages