LibIndic N-gram Generator

An n-gram generator for indic languages.

What is Ngram?

An n-gram model is a type of probabilistic model for predicting the next item in a sequence. n-grams are used in various areas of statistical natural language processing and genetic sequence analysis.

An n-gram is a subsequence of n items from a given sequence. The items in question can be phonemes, syllables, letters, words or base pairs according to the application.

An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram"); size 3 is a "trigram"; and size 4 or more is simply called an "n-gram".

Installation

Clone the repository git clone https://github.com/libindic/indicngram.git
Change to the cloned directory cd indicngram
Run setup.py to create installable source python setup.py sdist
Install using pip pip install dist/libindic-ngram*.tar.gz

Usage

Input Parameters: Text and value of N (default value 2)
Output: List of grams


>>> from libindic.ngram import Ngram
>>> ngram_generator = Ngram()
>>> ngram_gerator(<text>, <window size>)

Example

>>> from libindic.ngram import Ngram
>>> ngram_generator = Ngram()
>>> text = "Languages"
>>> grams = ngram_generator.letterNgram(text, 3)
>>> print(grams)
['Lan', 'ang', 'ngu', 'gua', 'uag', 'age', 'ges']
>>> for gram in grams:
...     print("".join(gram))

Lan
ang
ngu
gua
uag
age
ges

Tests

Run tests with python setup.py test

Read the docs for more.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
docs		docs
libindic/ngram		libindic/ngram
.gitignore		.gitignore
.testr.conf		.testr.conf
.travis.yml		.travis.yml
Makefile		Makefile
README.md		README.md
circle.yml		circle.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
test-requirements.txt		test-requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LibIndic N-gram Generator

What is Ngram?

Installation

Usage

Example

Tests

About

Releases

Packages

Contributors 6

Languages

libindic/indicngram

Folders and files

Latest commit

History

Repository files navigation

LibIndic N-gram Generator

What is Ngram?

Installation

Usage

Example

Tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages