-
Notifications
You must be signed in to change notification settings - Fork 509
MeCab
This page describes interaction with MeCab and how to get in running.
MeCab is a word segmentation tool for Japanese and can be found at:
Like most academic software MeCab has a few rough edges, but we will get you up and running in a jiffy with some knowledge about software porting. We'll even make sure it runs in its own directory not depending on being able to root (MeCab has a strong desire to be in /usr/local but we will dodge that).
These instructions assume that we are installing the 0.98 version of MeCab and the 2.7.0 version of the IPA dictionary.
Get the following files:
- mecab (SourceForge Link)
- mecab-ipadict (SourceForge Link)
- mecab-python (SourceForge Link)
Create a directory and extract the source code.
mkdir mecab
cd mecab
mv ${PATH_TO_MECAB_DOWNLOADS}/mecab-*.tar.gz ./
find . -name '*.tar.gz' | xargs -n 1 tar xfz
We will install MeCab in this directory, thus we need a local
to place it in.
mkdir local
Now, we configure, compile and install MeCab.
(cd mecab-0.98 && env PATH=`pwd`/local/bin \
./configure --prefix=`pwd`/../local \
&& make install clean)
Then the same for the dictionaries.
(cd mecab-ipadic-2.7.0-20070801 && env PATH="${PATH}:`pwd`/../local/bin" ./configure --prefix=`pwd`/../local --with-charset=utf8 && make install clean)
Do a dry-run with the MeCab binary.
echo '鴨かも?' | local/bin/mecab
Now we only have to build the Python SWIG bindings.
(cd mecab-python-0.98 && env PATH="${PATH}:`pwd`/../local/bin" python setup.py build_ext --inplace)
Then try out the bindings, but first patch test.py
since it doesn't have an encoding.
sed -i -e '2i# -*- coding: utf-8 -*-' mecab-python-0.98/test.py
Then we are ready to go.
(cd mecab-python-0.98/ && env LD_LIBRARY_PATH=`pwd`/../local/lib python test.py)
Just remember that you need to set the LD_LIBRARY_PATH programmatically when you are using the bindings.