Demo examples for linguistics in Lucene and Solr
The demo consists of the following modules:
- lucene-analyzer-example
- opennlp-example
- solr-multilang-example
Each example demo can be run as described below.
The Lucene analyzer example consists of two demos, AnalyzerExampleTest and FrenchSynonymExampleTest.
Run both demos with mvn test.
$ cd lucene-analyzer-example
$ mvn test
The demos can be run individually as well. For example:
$ mvn -Dtest=AnalyzerExampleTest test
The OpenNLP example consists of examples demonstrating sentence segmentation, tokenization, person name extraction as well as part-of-speech tagging.
Execute the following commands to run the examples.
$ cd opennlp-example
$ mvn -Dget-models test
The Solr multilanguage example demonstrates how
Download and unpack Solr (we are using 4.2.1 in this example)
$ cd solr-multilang-example
$ tar zxvf solr-4.2.1.tgz
Copy the demo schema.xml and solrconfig.xml to Solr's example config as follows
$ cp cp conf/schema.xml
conf/solrconfig.xml
solr-4.2.1/example/solr/collection1/conf/
Start up Solr
$ cd solr-4.2.1/example
$ java -jar start.jar
In a different directory, post the Wikipedia documents
$ ./posh.sh
The below query gives an overview of the documents now searchable from the various Wikipedia language editions
The below query gives the distribution of languages detected
The below query gives the distribution of languages detected in the Japanese Wikipedia
Contact us on [email protected] if you have questions or problems.