Building a Lucene index for a Word2Vec model

Since the pre-trained word2vec models are often in .bin format, first convert the existing model into .tsv format. You can do this using one of the existing tools such as this.

With IndexW2v you can build a Lucene index for a word2vec model from raw text.

Possible parameters are:

-input (required)

Path of the raw file

-index (required)

Path of the index file

Example command:

sh target/appassembler/bin/IndexW2V -input GoogleNews-vectors-negative300.txt -index lucene.GoogleNews.index

Search the index for embeddings

With SearchW2V you can search for the word embeddings.

Possible parameters are:

-index (required)

Path of the index file

-term (required)

Get the embeddings corresponding to the term

Example command:

sh target/appassembler/bin/SearchW2V -index lucene.GoogleNews.index -term "hello"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embeddings.md

embeddings.md

Building a Lucene index for a Word2Vec model

Search the index for embeddings

Files

embeddings.md

Latest commit

History

embeddings.md

File metadata and controls

Building a Lucene index for a Word2Vec model

Search the index for embeddings