-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java performance improvements from Apache Lucene #195
Conversation
Thanks, mostly this looks good but I've spotted some issues and noted them above (mostly minor - the incorrect implementation of It would also be helpful to update |
I tried to keep it simple here, but doing a rough measurement of `time make check_java` x 3 gives the idea: ``` before: real 0m24.713s user 0m39.545s sys 0m1.962s real 0m24.992s user 0m39.570s sys 0m2.082s real 0m24.166s user 0m38.705s sys 0m1.842s after: real 0m20.365s user 0m31.869s sys 0m1.781s real 0m20.593s user 0m32.733s sys 0m1.858s real 0m20.576s user 0m32.992s sys 0m1.763s ``` This mini-benchmark of the tests isn't indicative of typical performance, it is bottlenecked by JVM startup cost for each language.
I updated the TestApp in edaa2fa with some very timing data... I will followup for your other comments one-by-one, thank you for looking into this. |
…tions when running tests
Thanks for the updates - I'm happy to merge now (and I can then tweak so |
Yes if you could help with the conditional emit of |
Done in 34f3612.
It'd be good to aim to have Lucene able to use an unmodified version of Snowball - not just because then all Java users can benefit from improvements, but also it's caused confusion for multiple users over the years who've generated Java code for a new stemmer with upstream Snowball then tried to add it to their Lucene source tree. |
I will also document the new Java 7 requirement (I agree it's not problematic but we should state version requirements up front). |
This is long overdue...
char[]
to reduce allocations and overhead.MethodHandle.invokeExact()
instead of reflection.The changes mean that users will need java 7 at a minimum, but since java 7 version is long EOL, I don't think it will cause anyone grief.
Users can still use
setCurrent(String) ... stem() ... getCurrent()
, but this adds support for a higher-performance approach:setCurrent(char[], int), getCurrentBuffer(), stem() ... getCurrentBuffer()/getCurrentBufferLength()
This avoids many per-word object allocations:
String
for every input word.byte[]/char[]
of thatString
they were forced to create.char[]
for that StringBuilder.String
when they retrieve the result.When indexing documents, all these per-word allocations put too much pressure on garbage collection and bottleneck performance.