Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spliterators for sorted collections cause inefficient stream operations when natural ordering is used #6187

Open
kilink opened this issue Sep 28, 2022 · 3 comments · May be fixed by #6188
Open
Labels
P4 no SLO package=collect type=performance Related to performance

Comments

@kilink
Copy link

kilink commented Sep 28, 2022

The Spliterators returned by sorted collections such as ImmutableSortedSet and RegularImmutableAsList result in unnecessary sorting occurring when a sort operation is performed on their associated Stream. The root cause is that the
Spliterators' getComparator() implementations always return a Comparator instance, even when natural ordering is used.

According to the Javadoc, null should be returned by getComparator() to indicate the source items are sorted in natural order.

You can see here that the SORTED flag gets unset when a spliterator has the SORTED characteristic but does not return a null comparator. And here is where the missed optimization happens due to that.

The issue is trivially reproducible by doing the following, and stepping through the code in SortedOps where the expected no-op / optimization would ideally happen:

ImmutableSortedSet<Integer> sortedSet = ImmutableSortedSet.of(1, 2, 3, 4);
// sorted should be a no-op, but isn't because ImmutableSortedSet's Spliterator does not return a null comparator.
sortedSet.stream().sorted().collect(Collectors.toList());
kilink added a commit to kilink/guava that referenced this issue Sep 28, 2022
Spliterators returned by CollectSpliterators.indexed and ImmutableSortedSet.spliterator
violated the Spliterator API by not returning null in getComparator when the source
items are naturally sorted. The effect was that sort operations on Streams backed by
these Spliterators were not optimized away, resulting in additional unnecessary sorting.

Fixes google#6187.
@amalloy amalloy added package=collect type=performance Related to performance P4 no SLO labels Sep 30, 2022
@kevinb9n
Copy link
Contributor

kevinb9n commented Oct 15, 2022

Nice find, and thanks for the deep links.

It's of course generally impossible to know whether a Comparator is actually implementing natural order, but we could at least special-case Comparator.naturalOrder(). @lowasser any thoughts?

We should search for a JDK issue; given that it introduced that naturalOrder comparator I think it should do this trick itself. I also think the Spliterator javadoc could use a tweak so as not to imply the implementor is contractually bound to return null.

@kevinb9n
Copy link
Contributor

Oh funny, I didn't bother mentioning it could special-case Ordering.natural() too, because we hope users are moving off of that, but that might be what our spliterators are returning at the moment.

kilink added a commit to kilink/guava that referenced this issue Oct 18, 2022
Spliterators returned by CollectSpliterators.indexed and ImmutableSortedSet.spliterator
violated the Spliterator API by not returning null in getComparator when the source
items are naturally sorted. The effect was that sort operations on Streams backed by
these Spliterators were not optimized away, resulting in additional unnecessary sorting.

Fixes google#6187.
@kilink
Copy link
Author

kilink commented Oct 18, 2022

I agree that the JDK ought to handle this check itself, although it wouldn't help Guava at the moment due to Ordering.natural(). I'm assuming the null means natural ordering contract is a holdover from SortedMap / SortedSet.

cpovirk added a commit to jspecify/jdk that referenced this issue Jun 7, 2023
This has always struck me as a weird feature, but people use it, and it
"works" in the sense of "does not cause NPE" (though it may cause
CCE...). I see both calls to `sort(null)` and calls like
`sort(priorityQueue.comparator())` (which _might_ be `null`;
[example](https://github.com/google/guava/blob/e82e2a2c07c68108f318958ee0355cc835c97743/guava-testlib/src/com/google/common/collect/testing/testers/SortedSetNavigationTester.java#L57)).
And while I prefer a world in which methods like `comparator()` never
return `null`, as we arranged for [in
`SortedMultiset`](https://guava.dev/SortedMultiset#comparator()), there
are apparently [downsides](google/guava#6187)
to using a `Comparator` that implements natural order rather than using
`null`.

Is any of that convincing? :) This is definitely a case in which I can
see how eisop would want to stay on its own path. My main motivation for
doing this now is that I hear that the current signature is causing us
trouble during Java->Kotlin transpilation. I can get more details.
cpovirk added a commit to jspecify/jdk that referenced this issue Jun 7, 2023
* Annotate the parameter of `List.sort` as `@Nullable`.

This has always struck me as a weird feature, but people use it, and it
"works" in the sense of "does not cause NPE" (though it may cause
CCE...). I see both calls to `sort(null)` and calls like
`sort(priorityQueue.comparator())` (which _might_ be `null`;
[example](https://github.com/google/guava/blob/e82e2a2c07c68108f318958ee0355cc835c97743/guava-testlib/src/com/google/common/collect/testing/testers/SortedSetNavigationTester.java#L57)).
And while I prefer a world in which methods like `comparator()` never
return `null`, as we arranged for [in
`SortedMultiset`](https://guava.dev/SortedMultiset#comparator()), there
are apparently [downsides](google/guava#6187)
to using a `Comparator` that implements natural order rather than using
`null`.

Also, Werner points us to `Arrays.sort`.

(#13)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 no SLO package=collect type=performance Related to performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants