-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize map unions to avoid building long lists #14215
Conversation
lib/elixir/lib/module/types/descr.ex
Outdated
|
||
defp trivial_subtype?(%{} = left, %{} = right) | ||
when map_size(left) == 1 and map_size(right) == 1 do | ||
case {left, right} do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for an even simpler version with just shallow comparisons (except for the structural comparison on top)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion, let's go with the regular subtyping? We apply this on very few cases and once we start having structs nested inside structs, this will make a difference?
In other words, let's go with the "slow" but more general and less code version and we optimize it again in the future? Especially to avoid getting the logic wrong here... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes total sense, especially since
a) we'll bail at the first non-match
b) we avoid building lists and make a bunch of stuff cheaper downstream so the extra precision might pay
c) it's easier to optimize based on actually slow projects, so deferring this one seems like it's a good idea
And perhaps we'll be able to make subtype?
cheaper in the future and bail early in some cases or something, in which case the whole typing would benefit.
lib/elixir/lib/module/types/descr.ex
Outdated
strategy when strategy != nil <- map_union_optimization_strategy(tag1, pos1, tag2, pos2) do | ||
case strategy do | ||
:all_equal -> | ||
dnf1 | ||
|
||
:any_map -> | ||
[{:open, %{}, []}] | ||
|
||
{:one_key_difference, key, v1, v2} -> | ||
new_pos = Map.put(pos1, key, union(v1, v2)) | ||
[{tag1, new_pos, []}] | ||
|
||
:left_subtype_of_right -> | ||
dnf2 | ||
|
||
:right_subtype_of_left -> | ||
dnf1 | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can encapsulate this and use it on map_normalize
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you mean instead of the map_non_negated_fuse
/ fusible_maps
/ map_non_negated_fuse_pair
part right?
Seems it's doing very similar stuff, let's see!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I can also be the one giving that a try. We can merge this once the comments above are addressed, then I can work on the map_normalize and you work on tuples? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got it actually! Will push in a sec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't spend time on renaming but I think we can probably use the term fuse
/ fusion
etc more for consistency.
Feel free to rename as you see fit!
Beautifully written and well done! 😍 |
|> :maps.next() | ||
|> do_map_union_optimization_strategy(pos2, :all_equal) | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not handle cases where the open map on the left has some extra fields that are set to if_set(term())
, and the right map is closed.
Example:
assert union(
open_map(a: if_set(term())),
closed_map([])
) == open_map(a: if_set(term()))
Similarly, what if we have a larger (in size) open map as pos1
, but which is a supertype of pos2
? Then the only tried strategy will be l.1340 which leads to :left_subtype_of_right
.
Example:
assert union(
open_map(a: if_set(term()), b: number()),
open_map(b: integer())
) == open_map(a: if_set(term()), b: number())
I don't think those are case that necessarily need to be covered, but adding those tests to highlight it would prevent us discovering this again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I was totally forgetting about if_set
.
I don't think those are case that necessarily need to be covered
Yeah this is an optimization supposed to deal with some "obvious" cases that happen frequently, so it might be OK not to catch all cases (we're not dealing with negs either).
But in this case it might be possible to implement in the current pass with something like:
- if one key is only on the side of the supertype and its value is if_set, continue inferring this supertype relation
- if we can switch to the supertype strategy, do it
- otherwise bail
The map size issue is a real problem though... Perhaps by changing the internal representation to store if_set
as part of a different map, we can easily compute the size of the required map, and have a separate pass for optional keys?
This might be overkill for this particular use case, but if this new representation can simplify a bunch of other places such as subtyping etc it might be worth considering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map size issue is a real problem though...
I think it is fine because our goal is to traverse the smallest map for performance. The full algorithm does require traversing both sides but the point here is precisely to not implement the full algorithm. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
I was more thinking if we get bottlenecks in the future due to if_set
and if there was a way to optimize them in other parts too, but it's best to avoid speculation and to wait if real world slow/pathological cases to show up, and iterate then.
Co-authored-by: Guillaume Duboc <[email protected]>
@gldubc thanks a lot for the review 💜 @josevalim will merge this and open one for tuples this weekend. |
Fixes #14203
Replaces #14205
Only implemented maps for now, once approved I'm happy to copy it for tuples.
We do only one pass at keys, and are able to switch the strategy in the middle (including just bailing if a key is missing or an expectation is not met).
It also fixes the bamboo example.
Perf-wise it seems to not spend a lot of time at all in the new code:
Profiling details