Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

Use enwiki data set #4

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Use enwiki data set #4

wants to merge 3 commits into from

Conversation

tonysun83
Copy link
Contributor

@tonysun83 tonysun83 commented May 8, 2019

This performance test loads the enwiki data set defined by
docs.file in the content-source.properties file.
DocMaker: https://lucene.apache.org/core/8_0_0/benchmark/org/apache/lucene/benchmark/byTask/feeds/DocMaker.html

parses the xml files and creates ready to be indexed documents. Any wikipedia dataset downloaded from https://dumps.wikimedia.org/enwiki/20190501/ is usable as long as it is specified correctly in the properties file.

We can override the getContentSource to use other content sources:
https://lucene.apache.org/core/7_3_1/benchmark/org/apache/lucene/benchmark/byTask/feeds/ContentSource.html
for different benchmarking needs.

I left the lucene-test-framework library intact in our pom.xml because there are older performance benchmarks that still rely on it. We should remove it once we fully transition over to the lucene benchmark data set

Next steps:

  1. Use a different content source
  2. Run queries with https://lucene.apache.org/core/7_4_0/benchmark/org/apache/lucene/benchmark/byTask/feeds/EnwikiQueryMaker.html

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants