Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate struct update syntax, such as %URI{x | ...} #13974

Closed
josevalim opened this issue Nov 6, 2024 · 30 comments
Closed

Deprecate struct update syntax, such as %URI{x | ...} #13974

josevalim opened this issue Nov 6, 2024 · 30 comments

Comments

@josevalim
Copy link
Member

josevalim commented Nov 6, 2024

Instead, we should prefer pattern matching. With the type system, we already get the benefits of checking if fields exist or not, in a more reliable and extensible way.

Here is an example from LiveView:

for entry <- conf.entries do
  %UploadEntry{entry | preflighted?: entry.preflighted? || entry.ref in refs}
end

However, the code above does not catch typos on entry.ref. This code, however, would:

for %UploadEntry{} = entry <- conf.entries do
  %{entry | preflighted?: entry.preflighted? || entry.ref in refs}
end

Since we attach type information to variables when they are defined.

@stevejonesbenson
Copy link

Is there a discussion about this anywhere? I don't see anything in the elixir-lang-core mailing list but I could have missed it.

I know this isn't the place to comment on the issues validity and I 100% trust the core team whatever you decide - however I would like to see any discussion on the issue to see the different points of view and reasoning if possible.

For what its worth I like the current syntax given that structs are made to feel slightly special than normal maps and I think the status quo is quite pleasant when you are returning an updated struct at the end of a function, It is immediately obvious that you are returning a struct without looking up at what the entry variable is from the example above.

Saying that, I am speaking only from the developers POV and not from the compiler which this issue seems primarily concerned with.

Sorry to add this here but I'm not sure where else to comment 🫡

@josevalim
Copy link
Member Author

It is completely fine to discuss it here. :)

For what its worth I like the current syntax given that structs are made to feel slightly special than normal maps and I think the status quo is quite pleasant when you are returning an updated struct at the end of a function, It is immediately obvious that you are returning a struct without looking up at what the entry variable is from the example above.

While I do agree this is true, it is worth noting this is pretty much a special case of structs. You don't denote in return types when you are returning an integer, updating a tuple, etc. Typically in Elixir, you specify the types of variables in patterns and guards, and not throughout your function.

Other than that, the rationale is pretty much in the issue description. All cases where you use the update syntax, you will be better served by pattern matching on it instead.

@whatyouhide
Copy link
Member

Just wanted to add here that doing this with mix format --migrate should be trivial, so "fixing" this warning will be a piece of cake 🍰

@stevejonesbenson
Copy link

That's a good point and you make it well - Thank you

@LostKobrakai
Copy link
Contributor

LostKobrakai commented Nov 12, 2024

Is there really a need to deprecate the struct update syntax? I can see the point of prefering pattern matching, but it doesn't feel like a great justification for getting rid of the struct update at the same time. Things are easily visible here in this short example, but consider 20ish lines between the for and the struct update and suddenly it's no longer so obvious that this is mapping struct to updated struct vs. struct to some random map derived from the entry. Removing the signaling of "this is an UploadEntry struct" feels like a step backwards for people reading code, even if technically it wouldn't matter.

@josevalim
Copy link
Member Author

I'd say so, because, if our official advice is to use the pattern matching syntax 99% of the cases, there is no reason to keep a pattern in the language that we wouldn't recommend.

As said above, I understand the signaling can be helpful but it is also inconsistent. Structs are the only data structure that get those and only for updates. You don't signal integers, tuples, struct access, etc. And it will always do a worse job than pattern matching.

@LostKobrakai
Copy link
Contributor

LostKobrakai commented Nov 12, 2024

Structs are the only data structure that get those and only for updates.

I can kinda see that, but also should we deprecate struct! then as well? Or at least replace existing instances of struct updates with struct! if it is decided to not deprecate struct!?

And it will always do a worse job than pattern matching.

I don't necessarily agree with that, given I don't see a pattern matching as an alternative to the struct update syntax. To me %URI{uri | …} vs. %{map | …} is more like choosing between Keyword.update! vs. Map.update! or using MapSet apis instead of manually messing with the set internals.

I guess I'd be fine with deprecating the native syntax for struct updates, but if that happens imo the proper replacement would be struct!/2 not nothing.

@josevalim
Copy link
Member Author

josevalim commented Nov 12, 2024

To me %URI{uri | …} vs. %{map | …} is more like choosing between Keyword.update! vs. Map.update! or using MapSet apis instead of manually messing with the set internals.

So a more apt comparison is int + int and float + float, which also do not differentiate themselves, or any other polymorphic function call. Also note we don't make a syntax distinction either between map.key and struct.field.

I can kinda see that, but also should we deprecate struct! then as well? Or at least replace existing instances of struct updates with struct! if it is decided to not deprecate struct!?

struct! is also used for dynamic fields, so there is no plan to deprecate it. You could use struct!(struct, foo: 123) for updating it, although currently that is done at runtime. However, we could convert struct! to a macro and make it emit more optimal code if the keys are statically known.

@sabiwara
Copy link
Contributor

However, the code above does not catch typos on entry.ref. This code, however, would:

for %UploadEntry{} = entry <- conf.entries do

Just mentioning for the record, but this change is not strictly equivalent and can possibly accidentally filter non-matching entries (e.g. nils) instead of raising. One might want to use Enum.map/2 here to use the pattern-matching version?

@zachallaun
Copy link
Contributor

zachallaun commented Nov 13, 2024

Is there a reason that this example...

for entry <- conf.entries do
  %UploadEntry{entry | preflighted?: entry.preflighted? || entry.ref in refs}
end

cannot provide useful typing information for expressions inside the block? It seems to me that, in order to succeed, entry must be an %UploadEntry{} in the current block, so the above is equivalent to:

for entry <- conf.entries do
  %UploadEntry{} = entry
  %UploadEntry{entry | preflighted?: entry.preflighted? || entry.ref in refs}
end

If that "phantom match" could be created automatically, it would find typos in entry.ref.

I'll acknowledge that this is only one example and this code could be modified such that you get more/better errors by pattern matching where entry is assigned, but this would be something and might lessen the "sting" of keeping the struct update syntax around.

@josevalim
Copy link
Member Author

josevalim commented Nov 13, 2024

Your proposal above would work for that example but what if someone wrote this:

for entry <- conf.entries do
  preflighted? = entry.preflighted? || entry.ref in refs
  %UploadEntry{entry | preflighted?: preflighted?}
end

What do we rewrite it to? If we do this:

for entry <- conf.entries do
  preflighted? = entry.preflighted? || entry.ref in refs
  %UploadEntry{} = entry
  %UploadEntry{entry | preflighted?: preflighted?}
end

Then we still wouldn't have caught a bug if entry.ref had a typo. Perhaps we hoist it all the way up? But that would require a separate pass in the type system only to pre-assign structs updates to variables. Do we want to make compilation 2-5% slower because of this single feature?

The simplest way to reason about the type system is that types are assigned to variables when they are defined. When we have type signatures, that makes total sense and it should just work as you expect, because the signatures will provide the type. But all Elixir code written today relies on inference and doing any sort of refinement during inference, which is partially what you are suggesting, is expensive because you need to either hoist it or go back and retype all prior uses of entry. If we don't do that, then it is inconsistent, you get different typing errors depending on where you use the update syntax.

All in all, it is much simpler to say "variables get their types when they are defined" and those are the patterns we should promote.*

We also need to accept that, as we work on the type system, we will discover Elixir idioms which are just not useful if we had types. Some struct features were designed exactly because we didn't have a type system. And other features are too lax. For example, the Kernel.struct/2 function allows anyone to create a struct without enforcing keys... which is equivalent to saying that all structs today may have nullable fields, which is not a desirable long term property.

Overall, we want to keep the deprecations too a minimum, but I think we have to revisit our practices to make the best use of our new tools as we move forward.

*There is a feature we will add to the type system called occurrence tying, which allows us to refine types in if, cond, case, etc. But the difference is exactly that those structs introduce a new scope, where the type is refined, instead of hoisting or refine the typing within an existing one.

@zachallaun
Copy link
Contributor

Then we still wouldn't have caught a bug if entry.ref had a typo. Perhaps we hoist it all the way up? But that would require a separate pass in the type system only to pre-assign structs updates to variables. Do we want to make compilation 2-5% slower because of this single feature?

I was indeed considering hoisting it to the top of the scope (or just after entry = ... if it's defined that way).

I do agree that a separate pass just to pre-assign structs, resulting in a 2-5% slow down, would not be worth it. I know there's a delicate balance between inference quality and expense, so introducing things like this may not be worth it. If there were other cases, however, where inference could be improved by a separate pass, then the cost/benefit might change, but I'm not sure that those cases exist.

Overall, we want to keep the deprecations too a minimum, but I think we have to revisit our practices to make the best use of our new tools as we move forward.

Agreed 👍. To be clear, though I do like the update syntax occasionally for the various reasons that others have stated, I'm not opposed to this change.

Another thought that crossed my mind is that, in the future when typing information is exposed through some compiler API, editors will be able to re-expose this information through things like inlay hints.

@whatyouhide whatyouhide changed the title Deprecate struct update syntax, such as %URI{x | ...} Deprecate struct update syntax, such as %URI{x | ...} Nov 14, 2024
@Coffei
Copy link

Coffei commented Nov 22, 2024

One difference between the original examples is if you mistype the field in the updated struct. The struct update syntax produces a compilation error, while the pattern match only produces a warning, if I am updating a field that doesn't exist. That can worse the experience for projects not using mix compile --warnings-as-errors.

Would changing these warnings to errors be a part of the deprecation to keep the same developer experience?

@josevalim
Copy link
Member Author

I would say having them as warnings is a benefit, as it allows compilation to continue and find other warnings, rather than stopping midway. In fact, we generally do our best to emit warnings and only emit errors when we cannot produce an actual artifact.

@Coffei
Copy link

Coffei commented Nov 25, 2024

I've been thinking about it some more, and we maybe missed the point of my comment a little. It's not really important whether it's an error or a warning technically. I feel it's the difference in the strength of feedback that makes the new proposed syntax worse. A typo in a field seems like a situation where the compiler knows the code will not work. And if it knows that I feel like it's good the compilation will fail and not let me run the code. I don't really care whether we use errors or warnings for that purpose.

But using warnings without any other mechanism than --warnings-as-errors seems lacking, since there are bunch of other warnings that are not as important. So mixing warnings like unused variables, which are non threatening, with warnings that essentially say "this code will not work and will crash in runtime" seems a bit backwards. People have to choose between failing for every little violation and not having any check at all. Maybe we already have some warnings like that, idk.

So maybe this is not a point against this proposed deprecation, but rather a suggestion to have some multi-level warnings some of which could fail the compilation by default (but not in the same way as errors, as you mention above), if the compiler is sure the code is wrong. I am sure what is the right solution. But I know I myself would prefer using the old syntax for this very reason.

@josevalim
Copy link
Member Author

josevalim commented Nov 25, 2024

It is very important to not break the user flow. Phoenix compiles during requests, tests compile before running. Being able to run and debug a program, even if it says that it will error, is better than aborting the user process and leave them to "fight the compiler" (or the type checker) as the only option available. Forcing compilation to fail can be very disruptive to the developer experience. If you really want that, then set elixirc_options: [warnings_as_errors: true] in your def project. You don't have to only pass it via CLI.

@jonatanklosko
Copy link
Member

A typo in a field seems like a situation where the compiler knows the code will not work. And if it knows that I feel like it's good the compilation will fail and not let me run the code.

This assumes that the field is missing because of a typo, but that may not be the only case. Imagine that we remove a field from a struct, while working on a subset of the code. In such case it can be useful to run the app and tests, without going and fixing all other places where the field is already used.

@josevalim
Copy link
Member Author

Originally I wrote this:

The simplest way to reason about the type system is that types are assigned to variables when they are defined.

But this is no longer true as of #14145. We do perform single pass inference now, which means we can find some bugs on the update syntax (but not all). Take this code:

def bad_code(x) do
  x.w
  %Point{x | x: :zero}
end

where x is a Point struct with fields x, y, and z. The code above has an error because x does not have a w field. With #14145, it will emit the following warning:

incompatible types in struct update:

    %Point{x | x: :zero}

expected type:

    dynamic(%Point{x: term(), y: term(), z: term()})

but got type:

    dynamic(%{..., w: term()})

where "x" was given the type:

    # type: dynamic(%{..., w: term()})
    # from: types_test.ex:915
    x.w

However, if we defined the struct upfront in pattern matching, we get this error instead:

unknown key .w in expression:

    x.w

the given type does not have the given key:

    dynamic(%Point{x: term(), y: term(), z: term()})

where "x" was given the type:

    # type: dynamic(%Point{})
    # from: types_test.ex:913
    x = %Point{}

So while we alleviate a bit the reasons for deprecating the syntax, it may be that finding more errors with better warnings are still enough reasons to discourage/deprecate it. Thoughts?

@LostKobrakai
Copy link
Contributor

So while we alleviate a bit the reasons for deprecating the syntax, it may be that finding more errors with better warnings are still enough reasons to discourage/deprecate it.

I'd be curious how those "more errors" look like specifically. Because the better error on the example don't really come from deprecating the update syntax, but from making me add the struct on the pattern match, which I can do independently from the update syntax as well. Adding that pattern match I guess would be a quite natural reaction to seeing the typesystem consider x as dynamic(%{..., w: term()}) in the example.

@josevalim
Copy link
Member Author

The idea of the deprecation, since the beginning, is to nudge people towards pattern matching. The reason why you would use %Point{x | ...} in the first place is because you want to assert x is a struct, so why not do it at the time the variable is defined? And I'd say deprecating %Point{x | ...} will be a stronger nudge than expecting people to see the type warning and realizing that pattern matching would lead to be better errors.

For completeness, here is an example of a warning that would not be caught with the current inference but would with pattern matching:

def absolute_z(condition, point) do
  point.w

  if condition do
    %Point{point | z: abs(point.z)}
  else
    point
  end
end

Any conditional is problematic because we can only assume that point is a %Point{} in a specific branch, so we cannot propagate the inference up. If you did %Point{} = point, there is no ambiguity. Inference will always be incomplete, it is a (very) nice to have to make our lives easier.

@zachdaniel
Copy link
Contributor

zachdaniel commented Jan 14, 2025 via email

@josevalim
Copy link
Member Author

We already have such syntax, it is %{x | ...}. You simply remove the struct name. :)

@zachdaniel
Copy link
Contributor

zachdaniel commented Jan 14, 2025 via email

@sabiwara
Copy link
Contributor

so why not do it at the time the variable is defined?

I was a bit skeptical at first for the deprecation but this is a very convincing argument for me. +1 for being more assertive early on.

I can imagine a couple of cases which can't be trivially be rewritten with pattern matching or might lose readablily, none are blockers IMHO but bringing it up for the sake of discussion:

  • & %Point{&1 | ...} (easy one, just use fn)
  • for p <- points, do: %Point{p | ...} - more tricky due to pattern filtering (more a strict comprehension issue):
    • for %Point{} = p <- points, do: ... instead of being assertive, we're silently filtering incorrect data
    • for p <- points, %Point{} = p, do: ... assertive as we want, but not a great syntax
    • Enum.map/2 but this won't work for more complex comprehensions
  • if p, do: %Point{p | ...} or p && %Point{p | ...} => easy enough to rewrite with case but slightly more verbose

@josevalim
Copy link
Member Author

I'd say the comprehension is the only tricky one, as it can lead to a subtle change of behaviour. And I would say if you wrote & %Point{&1 | ...}, then that code deserves to be rewritten regardless of this discussion. :D

@stevejonesbenson
Copy link

Originally I wrote this:

The simplest way to reason about the type system is that types are assigned to variables when they are defined.

But this is no longer true as of #14145. We do perform single pass inference now, which means we can find some bugs on the update syntax (but not all). Take this code:

def bad_code(x) do
  x.w
  %Point{x | x: :zero}
end

I think I somewhat missed the crux of the rationale previously (or maybe just forgot), it seems like the issue (from the types side) would be also solved by adding the pattern per:

def bad_code(%Point{} = x) do
  x.w
  %Point{x | x: :zero}
end

Personally I would use both already and my inclination is that removing one does not naturally lead someone to adding the other unless they understand why. Could it be solved with a warning or lsp message or a credo rule? Could it lead to a situation where the below code is produced by a user instead? On the face of it this would seem to be an even worse outcome for the code in question.

def bad_code(x) do
  x.w
  %{x | x: :zero}
end

I take your point about the signalling being inconsistent with the primitive types and I fully agree however I would like to add a slightly different perspective which is that in the trenches one typically treats each struct as their own individual types and from that angle the stats would most likely swing the other way where the the number of types in a given codebase that have return type signalling available (Point Coordinate User Car Job Part Booking Comment etc.) far outnumber those that do not (integer tuple etc.).

Saying all this that I fully understand your position of not wanting to leave potential foot-guns laying around but I would be remiss if I didn't give the struct update syntax the defense it deserves. This feels like less of a foot gun and more of a "by the way you would get better type inference here if you added the pattern" which (i think) would be true for all functions that take structs as args, not just those that update them per the examples above.

Either way thanks for everything and I trust the core team no matter which direction you choose, Cheers

Tl;dr: Can we have our cake && eat it?

@Moosieus
Copy link
Contributor

Moosieus commented Jan 17, 2025

I think the struct update syntax is worth not-deprecating so long as it doesn't pose an obstruction to future work. Struct update syntax like %Point{x | x: ___} can serve as a desirable assertion of a function's return type when refactoring. No matter what I change, I expect x at this point (usually a return) to be a %Point{} struct.

@zachdaniel
Copy link
Contributor

zachdaniel commented Jan 17, 2025

Since we attach type information to variables when they are defined.

Should perhaps this be addressed ultimately in the type system?

for entry <- conf.entries do
  %UploadEntry{entry | preflighted?: entry.preflighted? || entry.rfe in refs}
end

There is theoretically enough information there to determine that entry.ref in refs is not a valid thing to say, just not necessarily in a single pass.

Just a thought that if there is a possibility that the type system will one day be "smart enough" to catch that, then we may be removing the syntax unnecessarily.

@josevalim
Copy link
Member Author

josevalim commented Jan 18, 2025

Should perhaps this be addressed ultimately in the type system?

I have already address this in a later reply: #13974 (comment)

I think the struct update syntax is worth not-deprecating so long as it doesn't pose an obstruction to future work.

Right, the only reason it is being deprecated is because it poses obstruction to future work and code quality. The update syntax will give you fewer guarantees than pattern matching and, while we can keep on improving things, that will always be the case unless we start introducing false positives in dynamic code (i.e. warn about typing violations that won't happen in runtime, which is generally a worse idea). See the comment linked above and this one.

In other words, we can confidently say that pattern matching on the struct will always be better than the struct update syntax.

@josevalim
Copy link
Member Author

Another way to look at it is: statically typed languages have you declare types when the variable is defined, not when the variable is used. The struct update syntax is all about the latter. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

10 participants