-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LANG-1675 - Improve performance of StringUtils.join for primitives #812
LANG-1675 - Improve performance of StringUtils.join for primitives #812
Conversation
Reading through this, I wonder whether we should deprecate StringJoiner?
…On Sat, Oct 9, 2021 at 12:55 PM Hubert ***@***.***> wrote:
This PR replaces StringJoiner to StringBuilder in StringUtils.join methods.
The change is limited to methods which use primitives as input only.
More
Please find more details here:
1. https://issues.apache.org/jira/browse/LANG-1675
2.#784 <#784>
JMH with example results
JMH tests - StringBuilder vs StringJoiner may be found in dedicated
repository:
https://github.com/HubertWo/apache-commons-lang-jmh
------------------------------
You can view, comment on, or merge this pull request online at:
#812
Commit Summary
- String.join - boolean - use StringBuilder
<db16c58>
- Replaced StringJoiner by StringBuilder
<e31a723>
- Fix: formatting, removed condition in for
<1fe3e64>
- Merge branch 'apache:master' into fix/LANG-1675_string_join_refactor
<f2f6cd9>
File Changes
- *M* src/main/java/org/apache/commons/lang3/StringUtils.java
<https://github.com/apache/commons-lang/pull/812/files#diff-dc0df3da8c3d510794ee168c3bbe26fb19707732391dbd9e8a19d26033df5eb7>
(96)
Patch Links:
- https://github.com/apache/commons-lang/pull/812.patch
- https://github.com/apache/commons-lang/pull/812.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#812>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIBUGKXBC2IUTE2Z7IGCG3UGANTNANCNFSM5FVFOPNA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Look, that's why there's rules, understand? So that you think before
you break 'em.
-- (Terry Pratchett, Thief of Time)
|
agree. |
Really? How are you going to do that? It's part of the JDK... |
yep,that is the only thing stopping me from deprecate it lol |
If the benchmarks are correct, the difference in performance is huge. I also did some research and looks like StringBuilder is much better choice in this case. |
What do the POMs in https://github.com/HubertWo/apache-commons-lang-jmh require Java 17? I'd be much easier to test on Java 8 without forcing these settings on to the tester by default. |
@HubertWo May you please update the project to remove the Java 17 requirement and make the benchmarks runnable from the command line? I'd like to just run a Maven command (however you want to set it up) and then post my results here so we can compare results. |
Thank you for your time. Please find run instructions and example results here:
I have also added GitHub Actions which run both benchmarks agains Java 8. Logs from GitHub run:
|
Hi @HubertWo Thank you for your update. I just pulled from your repo and saw:
This is confusing to me. Why are these files here? I want the SNAPSHOT in my local Maven repository to be used. Otherwise, do I have to copy the jar and pom over to this snapshot folder every time I touch sources in Commons Lang? |
JAR was added only to run benchmarks on GitHub actions. (https://github.com/HubertWo/apache-commons-lang-jmh/blob/main/.github/workflows/maven.yml) You may ignore both files. |
This is what I see on Microsoft Windows [Version 10.0.19042.1288], Java 8, and 11: StringJoinerBenchmark.stringJoinerPrimitiveInt on Java 8
StringBuilderBenchmark.stringBuilderPrimitiveInt on Java 8
To summarize Java 8:
StringJoinerBenchmark.stringJoinerPrimitiveInt on Java 11
StringBuilderBenchmark.stringBuilderPrimitiveInt on Java 11
To summarize Java 11:
To summarize Java 8 and 11:
These results look mixed to me: The PR implementation overlaps the current code accounting for the margin of errors. Can anyone reproduce the results? Or any result? I did not run Java 17 since I won't be able to use it in production for a while. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a look at the linked repository, cloned it, tried to run without success. Had another look at the code, and realized the dependencies in the pom.xml
were used in Lang too. So I copied the StringJoiner test to the master
branch and ran that test first.
My environment:
Apache Maven 3.8.2 (ea98e05a04480131370aa0c110b8c54cf726c06f)
Maven home: /opt/apache-maven-3.8.2
Java version: 11.0.11, vendor: Ubuntu, runtime: /usr/lib/jvm/java-11-openjdk-amd64
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.4.0-90-generic", arch: "amd64", family: "unix"
And class annotations I used in my tests.
@BenchmarkMode(Mode.Throughput) // Then saved again with .AverageTime to re-run tests
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
StringJoiner + master
Using master
on the following commit.
commit 7f9472e8989a859b7b4a3394486c7064cb3d11af (HEAD -> master, upstream/master, upstream/HEAD, dependabot/maven/com.puppycrawl.tools-checkstyle-9.1)
Author: Gary Gregory <[email protected]>
Date: Thu Nov 18 22:12:49 2021 -0500
Fix Javadoc typo.
First thing I did was to edit the pom.xml
to run just that one test.
diff --git a/pom.xml b/pom.xml
index 021e056d9..b48d6c750 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1022,7 +1022,7 @@
<id>benchmark</id>
<properties>
<skipTests>true</skipTests>
- <benchmark>org.apache</benchmark>
+ <benchmark>org.apache.commons.lang3.StringJoinerBenchmarkTest</benchmark>
</properties>
<build>
<plugins>
A copied the StringJoiner
test editing package name, adding the license header, and adding more JMH annotations. First to run with average time, and then with the throughput.
Average time results:
(...)
Result "org.apache.commons.lang3.StringJoinerBenchmarkTest.stringJoinerPrimitiveBoolean":
93.846 ±(99.9%) 6.002 ns/op [Average]
(min, avg, max) = (84.979, 93.846, 112.687), stdev = 8.013
CI (99.9%): [87.844, 99.848] (assumes normal distribution)
(...)
Result "org.apache.commons.lang3.StringJoinerBenchmarkTest.stringJoinerPrimitiveInt":
110.469 ±(99.9%) 2.159 ns/op [Average]
(min, avg, max) = (107.508, 110.469, 120.933), stdev = 2.883
CI (99.9%): [108.310, 112.629] (assumes normal distribution)
(...)
Result "org.apache.commons.lang3.StringJoinerBenchmarkTest.stringJoinerPrimitiveLong":
109.523 ±(99.9%) 1.101 ns/op [Average]
(min, avg, max) = (107.619, 109.523, 111.960), stdev = 1.470
CI (99.9%): [108.421, 110.624] (assumes normal distribution)
# Run complete. Total time: 00:25:05
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
StringJoinerBenchmarkTest.stringJoinerPrimitiveBoolean avgt 25 93.846 ± 6.002 ns/op
StringJoinerBenchmarkTest.stringJoinerPrimitiveInt avgt 25 110.469 ± 2.159 ns/op
StringJoinerBenchmarkTest.stringJoinerPrimitiveLong avgt 25 109.523 ± 1.101 ns/op
Throughput results:
(...)
Result "org.apache.commons.lang3.StringJoinerBenchmarkTest.stringJoinerPrimitiveBoolean":
0.011 ±(99.9%) 0.001 ops/ns [Average]
(min, avg, max) = (0.010, 0.011, 0.012), stdev = 0.001
CI (99.9%): [0.011, 0.012] (assumes normal distribution)
(...)
Result "org.apache.commons.lang3.StringJoinerBenchmarkTest.stringJoinerPrimitiveInt":
0.009 ±(99.9%) 0.001 ops/ns [Average]
(min, avg, max) = (0.007, 0.009, 0.010), stdev = 0.001
CI (99.9%): [0.008, 0.009] (assumes normal distribution)
(...)
Result "org.apache.commons.lang3.StringJoinerBenchmarkTest.stringJoinerPrimitiveLong":
0.009 ±(99.9%) 0.001 ops/ns [Average]
(min, avg, max) = (0.008, 0.009, 0.009), stdev = 0.001
CI (99.9%): [0.008, 0.009] (assumes normal distribution)
# Run complete. Total time: 00:25:05
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
StringJoinerBenchmarkTest.stringJoinerPrimitiveBoolean thrpt 25 0.011 ± 0.001 ops/ns
StringJoinerBenchmarkTest.stringJoinerPrimitiveInt thrpt 25 0.009 ± 0.001 ops/ns
StringJoinerBenchmarkTest.stringJoinerPrimitiveLong thrpt 25 0.009 ± 0.001 ops/ns
StringBuilder + the branch for this pull request
Did the same thing using the branch for this pull request, but with the code for testing StringBuilder
with JMH.
commit f2f6cd9b0580f25a3a5447da092ac30642abd2aa (HEAD -> pr-812)
Merge: 1fe3e64bc a3a064587
Author: Hubert <[email protected]>
Date: Sat Oct 9 12:54:37 2021 +0200
Merge branch 'apache:master' into fix/LANG-1675_string_join_refactor
diff --git a/pom.xml b/pom.xml
index 64e3e8d85..498e44f4f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1005,7 +1005,7 @@
<id>benchmark</id>
<properties>
<skipTests>true</skipTests>
- <benchmark>org.apache</benchmark>
+ <benchmark>org.apache.commons.lang3.StringBuilderBenchmarkTest</benchmark>
</properties>
<build>
<plugins>
Average time results:
(...)
Result "org.apache.commons.lang3.StringBuilderBenchmarkTest.stringBuilderPrimitiveBoolean":
49.887 ±(99.9%) 0.542 ns/op [Average]
(min, avg, max) = (49.086, 49.887, 52.508), stdev = 0.723
CI (99.9%): [49.345, 50.429] (assumes normal distribution)
(...)
Result "org.apache.commons.lang3.StringBuilderBenchmarkTest.stringBuilderPrimitiveInt":
36.461 ±(99.9%) 0.505 ns/op [Average]
(min, avg, max) = (35.892, 36.461, 37.850), stdev = 0.674
CI (99.9%): [35.956, 36.966] (assumes normal distribution)
(...)
Result "org.apache.commons.lang3.StringBuilderBenchmarkTest.stringBuilderPrimitiveLong":
37.222 ±(99.9%) 0.157 ns/op [Average]
(min, avg, max) = (36.922, 37.222, 37.824), stdev = 0.209
CI (99.9%): [37.065, 37.379] (assumes normal distribution)
(...)
# Run complete. Total time: 00:25:05
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
StringBuilderBenchmarkTest.stringBuilderPrimitiveBoolean avgt 25 49.887 ± 0.542 ns/op
StringBuilderBenchmarkTest.stringBuilderPrimitiveInt avgt 25 36.461 ± 0.505 ns/op
StringBuilderBenchmarkTest.stringBuilderPrimitiveLong avgt 25 37.222 ± 0.157 ns/op
Throughput results:
(...)
Result "org.apache.commons.lang3.StringBuilderBenchmarkTest.stringBuilderPrimitiveBoolean":
0.020 ±(99.9%) 0.001 ops/ns [Average]
(min, avg, max) = (0.020, 0.020, 0.020), stdev = 0.001
CI (99.9%): [0.020, 0.020] (assumes normal distribution)
(...)
Result "org.apache.commons.lang3.StringBuilderBenchmarkTest.stringBuilderPrimitiveInt":
0.026 ±(99.9%) 0.001 ops/ns [Average]
(min, avg, max) = (0.024, 0.026, 0.028), stdev = 0.001
CI (99.9%): [0.025, 0.027] (assumes normal distribution)
(...)
Result "org.apache.commons.lang3.StringBuilderBenchmarkTest.stringBuilderPrimitiveLong":
0.026 ±(99.9%) 0.001 ops/ns [Average]
(min, avg, max) = (0.025, 0.026, 0.027), stdev = 0.001
CI (99.9%): [0.026, 0.026] (assumes normal distribution)
# Run complete. Total time: 00:25:05
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
StringBuilderBenchmarkTest.stringBuilderPrimitiveBoolean thrpt 25 0.020 ± 0.001 ops/ns
StringBuilderBenchmarkTest.stringBuilderPrimitiveInt thrpt 25 0.026 ± 0.001 ops/ns
StringBuilderBenchmarkTest.stringBuilderPrimitiveLong thrpt 25 0.026 ± 0.001 ops/ns
TL;DR:
Before
StringJoinerBenchmarkTest.stringJoinerPrimitiveBoolean avgt 25 93.846 ± 6.002 ns/op
StringJoinerBenchmarkTest.stringJoinerPrimitiveInt avgt 25 110.469 ± 2.159 ns/op
StringJoinerBenchmarkTest.stringJoinerPrimitiveLong avgt 25 109.523 ± 1.101 ns/op
StringJoinerBenchmarkTest.stringJoinerPrimitiveBoolean thrpt 25 0.011 ± 0.001 ops/ns
StringJoinerBenchmarkTest.stringJoinerPrimitiveInt thrpt 25 0.009 ± 0.001 ops/ns
StringJoinerBenchmarkTest.stringJoinerPrimitiveLong thrpt 25 0.009 ± 0.001 ops/ns
After
StringBuilderBenchmarkTest.stringBuilderPrimitiveBoolean avgt 25 49.887 ± 0.542 ns/op
StringBuilderBenchmarkTest.stringBuilderPrimitiveInt avgt 25 36.461 ± 0.505 ns/op
StringBuilderBenchmarkTest.stringBuilderPrimitiveLong avgt 25 37.222 ± 0.157 ns/op
StringBuilderBenchmarkTest.stringBuilderPrimitiveBoolean thrpt 25 0.020 ± 0.001 ops/ns
StringBuilderBenchmarkTest.stringBuilderPrimitiveInt thrpt 25 0.026 ± 0.001 ops/ns
StringBuilderBenchmarkTest.stringBuilderPrimitiveLong thrpt 25 0.026 ± 0.001 ops/ns
Conclusion
I think the StringBuilder
version was consistently faster, at least on my JVM 11, Ubuntu Linux. Looking at the code, I thought it would be faster to check the index of the for-loop to avoid adding something that would be later removed, but looks like this could be a follow-up.
+1
Thanks!
Bruno
p.s.: Friday 10 PM, so hopefully I didn't confuse the metrics, I think I got higher throughput with StringBuilder
and lower average time with StringBuilder
, but feel free to correct me if I read the results incorrectly
p.p.s: Sorry, couldn't test with other OS's or JVM's 👍
return joiner.toString(); | ||
return stringBuilder | ||
.deleteCharAt(stringBuilder.length() - 1) | ||
.toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I thought instead of deleteChartAt
using an if
statement in the for
above, to avoid adding the last delimiter
would improve more the performance. For a follow-up discussion I think.
@kinow Your results are not that helpful IMO because you're are not measuring useful runs, you've effectively invented new tests and new methodology :( I was able to follow the instructions from @HubertWo without issue, please see his docs. My impression is that you are measuring so little that there is too much risk for noise: In your tests, you are measuring throughput in hundredths of operations per nanoseconds. In @HubertWo's setup, you measure millions of operations per second. Unless you are running on some LN2-cooled CPU and the rest of us on potatoes, I'm not sure how to account for the difference aside from simply running the tests in a completely different way. As soon as you start editing the sources and changing the JMH annotations, making a simple comparison is hopeless. Please do take the time to review @HubertWo's docs, it worked after I took the time to follow his readme, nothing special to do. Otherwise, do post what your issue is, I am sure we will all benefit from learning how to deal with JMH better ;) |
@garydgregory will try running the original tests again. I used the test settings of an existing jmh benchmark in lang, though no idea if that's being actually used. |
Thank you @kinow ! It will be simpler to compare our results with hardware and OS hopefully the only difference. |
I have a Windows 10 here too, but rarely use it. I will try to run the tests on Ubuntu LTS and on Win 10. JVM 8 and 11 only would be enough? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had given up after a SNAPSHOT couldn't be located, but then just had to mvn install
this PR's branch to get the SNAPSHOT installed in my local Maven repo. Here's the results on Ubuntu LTS.
JVM 11
First JVM 11, where I think StringBuilder
is the winner with 24_967_220.771
throughput versus StringJoiner
's 8_685_054.944
.
Apache Maven 3.8.2 (ea98e05a04480131370aa0c110b8c54cf726c06f)
Maven home: /opt/apache-maven-3.8.2
Java version: 11.0.11, vendor: Ubuntu, runtime: /usr/lib/jvm/java-11-openjdk-amd64
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.4.0-90-generic", arch: "amd64", family: "unix"
StringJoiner
Result "com.github.hubertwo.acljmh.StringJoinerBenchmark.stringJoinerPrimitiveInt":
8685054.944 ±(99.9%) 143639.898 ops/s [Average]
(min, avg, max) = (8276817.474, 8685054.944, 8968956.809), stdev = 191755.154
CI (99.9%): [8541415.046, 8828694.843] (assumes normal distribution)
# Run complete. Total time: 00:08:21
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
StringJoinerBenchmark.stringJoinerPrimitiveInt thrpt 25 8685054.944 ± 143639.898 ops/s
StringBuilder
Result "com.github.hubertwo.acljmh.StringBuilderBenchmark.stringBuilderPrimitiveInt":
24967220.771 ±(99.9%) 770683.808 ops/s [Average]
(min, avg, max) = (21267937.022, 24967220.771, 25980909.331), stdev = 1028840.835
CI (99.9%): [24196536.963, 25737904.579] (assumes normal distribution)
# Run complete. Total time: 00:08:21
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
StringBuilderBenchmark.stringBuilderPrimitiveInt thrpt 25 24967220.771 ± 770683.808 ops/s
JVM 16
Now with JVM 16. I think StringBuilder
is a winner here too with 23_751_776.524
, while StringJoiner
had 8508216.567
throughput.
Apache Maven 3.8.2 (ea98e05a04480131370aa0c110b8c54cf726c06f)
Maven home: /opt/apache-maven-3.8.2
Java version: 16.0.1, vendor: Private Build, runtime: /usr/lib/jvm/java-16-openjdk-amd64
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.4.0-90-generic", arch: "amd64", family: "unix"
StringJoiner
Result "com.github.hubertwo.acljmh.StringJoinerBenchmark.stringJoinerPrimitiveInt":
8508216.567 ±(99.9%) 126690.808 ops/s [Average]
(min, avg, max) = (8046279.518, 8508216.567, 8834546.942), stdev = 169128.605
CI (99.9%): [8381525.759, 8634907.376] (assumes normal distribution)
# Run complete. Total time: 00:08:21
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
StringJoinerBenchmark.stringJoinerPrimitiveInt thrpt 25 8508216.567 ± 126690.808 ops/s
StringBuilder
Result "com.github.hubertwo.acljmh.StringBuilderBenchmark.stringBuilderPrimitiveInt":
23751776.524 ±(99.9%) 956416.445 ops/s [Average]
(min, avg, max) = (20958707.696, 23751776.524, 25285000.397), stdev = 1276788.591
CI (99.9%): [22795360.079, 24708192.969] (assumes normal distribution)
# Run complete. Total time: 00:08:21
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
StringBuilderBenchmark.stringBuilderPrimitiveInt thrpt 25 23751776.524 ± 956416.445 ops/s
So for now keeping my +1 based on these results for Ubuntu LTS, especially since they were consistent between JVM 11 and JVM 16. Happy to include other JVM's or OS'es in the tests too, if needed.
Bruno
I think so. That is what I used. |
Thanks for the update @kinow This pattern does not make sense to me:
First Why not skip the first part by simply saying:
? Also for the char and boolean versions, it seems like the builder size can be simply computed because each element's String version is a fixed size (assuming 5 for boolean). For ints and longs it might not be worth it as you'd have to assume the max size. |
f2f6cd9
to
2c6dd0e
Compare
…r - char, boolean join methods
@garydgregory, @kinow thank you for testing! Regarding code suggestions. I've changed |
This PR replaces StringJoiner by StringBuilder in StringUtils.join methods.
The change is limited to methods which use primitives as input only.
More
Please find more details here:
JMH with example results
JMH tests - StringBuilder vs StringJoiner may be found in dedicated repository. Full logs, code and results included.
https://github.com/HubertWo/apache-commons-lang-jmh