LibIndic's shingling module may be used to generate word shinglings from a text. It is built on top of the N-Gram module.
- Clone the repository
git clone https://github.com/libindic/shingling.git
- Change to the cloned directory
cd shingling
- Run setup.py to create installable source
python setup.py sdist
- Install using pip
pip install dist/libindic-shingling*.tar.gz
>>> from libindic.shingling import Shingling
>>> instance = Shingling()
>>> shinglings = instance.wshingling(u"ഇത് ഒരു നല്ല കാര്യമാണ് ഇത് ഒരു", window_size = 2)
>>> for shingling in shinglings:
... print(" ".join(shingling))
...
ഇത് ഒരു
ഒരു നല്ല
നല്ല കാര്യമാണ്
കാര്യമാണ് ഇത്
For more details read the docs