Skip to content
ninjin edited this page May 16, 2011 · 8 revisions

MeCab

This page describes interaction with MeCab and how to get in running.

About

MeCab is a word segmentation tool for Japanese and can be found at:

http://mecab.sourceforge.net/

Like most academic software MeCab has a few rough edges, but we will get you up and running in a jiffy with some knowledge about software porting. We'll even make sure it runs in its own directory not depending on being able to root (MeCab has a strong desire to be in /usr/local but we will dodge that).

Instructions

These instructions assume that we are installing the 0.98 version of MeCab and the 2.7.0 version of the IPA dictionary.

Get the following files:

Create a directory and extract the source code.

mkdir mecab
cd mecab
mv ${PATH_TO_MECAB_DOWNLOADS}/mecab-*.tar.gz ./
find . -name '*.tar.gz' | xargs -n 1 tar xfz

We will install MeCab in this directory, thus we need a local to place it in.

mkdir local

Now, we configure, compile and install MeCab.

(cd mecab-0.98 && env PATH=`pwd`/local/bin ./configure --prefix=`pwd`/../local && make install clean)

Then the same for the dictionaries.

(cd mecab-ipadic-2.7.0-20070801 && env PATH="${PATH}:`pwd`/../local/bin" \
    ./configure --prefix=`pwd`/../local --with-charset=utf8 && make install clean)

Do a dry-run with the MeCab binary.

echo '鴨かも?' | local/bin/mecab

Now we only have to build the Python SWIG bindings.

(cd mecab-python-0.98 && env PATH="${PATH}:`pwd`/../local/bin" python setup.py build_ext --inplace)

Then try out the bindings, but first patch test.py since it doesn't have an encoding.

sed -i -e '2i# -*- coding: utf-8 -*-' mecab-python-0.98/test.py

Then we are ready to go.

(cd mecab-python-0.98/ && env LD_LIBRARY_PATH=`pwd`/../local/lib python test.py)

Just remember that you need to set the LD_LIBRARY_PATH programmatically when you are using the bindings.

Clone this wiki locally