Bengali is one of the most morphologically rich languages and it has lots of inflectional and derivational variant forms of a word. Because of that it is quite complicated to determine the stem of word.
Therefore our main focus was to program a light-weight library to determine a identical word for same type of inflected word as stem to run on any model.
- using pip
pip install bangla_stemmer
- using git
git clone https://github.com/Fatick-DevStudio/Bangla-stemmer.git
cd Bangla-stemmer
python setup.py install
Example 01:
from bangla_stemmer.stemmer import stemmer wordlist = ['কবিরগুলিকে', 'আমাকে', 'নামাবার'] stmr = stemmer.BanglaStemmer() stm = stmr.stem(word) print(stm)
output: ['কবির', 'আমা', 'নামা']
Example 01:
from bangla_stemmer.stemmer.stemmer import BanglaStemmer word = 'কবিরগুলিকে' stm = BanglaStemmer().stem(word) print(stm)
output: ['কবির', 'আমা', 'নামা']
The grammar rules applied here are based on the algorithm of Rafi Kamal. However some major and required modification has been made in this library in order to make it compatible with any model.
Reference: ⮑ BanglaKit