Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic runtime representation for extensible types and exceptions #6570

Conversation

diogomqbm
Copy link
Contributor

Attempts to solve #6569

@zth
Copy link
Collaborator

zth commented Jan 17, 2024

This is awesome! We're taking it a bit easy right now after shipping v11, but hoping we can get around to reviewing this soon.

@zth
Copy link
Collaborator

zth commented Jan 25, 2024

@diogomqbm I haven't touched this part of the compiler yet, so bear with me for some stupid questions:

  1. I assume this bug is something you discovered in your code base? If so, did you try this fix on your code base, and that fixed the problem?
  2. Am I right in understanding that what matters here is only stuff happening inside the same ReScript file?

@diogomqbm
Copy link
Contributor Author

diogomqbm commented Jan 27, 2024

@zth No worries, and thank you for taking a look 🙏

  1. I assume this bug is something you discovered in your code base? If so, did you try this fix on your code base, and that fixed the problem?

Yes. We discovered this bug while working on the Walnut codebase. And yes, we fixed it by having a transformer build pass that removed the calls to Caml_exceptions.create in the runtime and generated the value during the build. Exactly what we do with this change but during the build time instead. And yes, it fixed our issue.

  1. Am I right in understanding that what matters here is only stuff happening inside the same ReScript file?

I'm not sure I completely understand this question. What do you mean by "only stuff happening"?
What matters here is that the value for the variant is generated at the compile time instead of in the runtime. It doesn't matter if it's in the same ReScript file or not. If you're asking about the counter, it's global, just like the runtime one. So, it increments for each new exception or extension created during the compile time.

@zth
Copy link
Collaborator

zth commented Jan 28, 2024

From what I understand of exceptions and extensible variants, this looks fine to me. But I'd like to see what @cristianoc thinks too.

For posterity: As I tested this, I noticed a few things not working that might be cool to address in the future to enhance extensible variants more. Although I'm not 100% sure all of them makes sense, I'll list them anyway:

  • Now that we have a pure string representation that's determined at compile time, it'd be nice if the compiler could inline that string when switching on extensible variants. Right now, an import to each module with the extensible variant constructors is added, to look up the value for the variant constructor id (that was previously generated at runtime, but is now a static string). This is inefficient/unnecessary and it breaks code splitting. I assume there's some optimization that needs to be applied for that to work, that previously wasn't applied because the runtime mattered.
  • @tag for the full variant definition, and @as for each variant type added, does not have any effect, like they do for regular variants. This might not matter because you rarely need that type of control over extensible variants, and I'm not sure @tag would make sense since the runtime representation relies on the exn encoding, but just pointing it out.

@cristianoc
Copy link
Collaborator

I wonder what the original intentions were. Looks like the number was assigned at runtime intentionally. If so, it would be great to understand the intentions to be more confident in the change.
Eg the first thing that comes to mind is that perhaps the desired invariant was to keep exceptions globally unique. So that one would not catch an unintended exception by accident.
(At compile time, only one file is compiled at one given time, so a global counter does not seem possible).

@diogomqbm
Copy link
Contributor Author

I wonder what the original intentions were. Looks like the number was assigned at runtime intentionally. If so, it would be great to understand the intentions to be more confident in the change.
Eg the first thing that comes to mind is that perhaps the desired invariant was to keep exceptions globally unique. So that one would not catch an unintended exception by accident.

Yeah, same. I think that for exceptions it does make sense to have the current implementation. But when thinking about extensible types, I don't think it does. I agree that it would be ideal to know why the original implementation was like this and what can we do to maintain consistency between two different contexts. Being dependent on the runtime is not good enough because then you depend on how or which order that the modules are loaded.

@cristianoc
Copy link
Collaborator

Looking at the details of issue #6569 again, I'm not sure I understand the problem experienced.
Moving to separate files, the generated code looks like this:

var A = require("./A.js");
var B = require("./B.js");

function handleRequest(req) {
  if (req.RE_EXN_ID === A.FromModuleA) {
    return "received from A";
  }
  if (req.RE_EXN_ID === B.FromModuleB) {
    return "received from B";
  }
...

So the generated code does not seem depend on the actual value used in module A or B. Even if they were to be loaded in different order, the runtime representation would be different, but the code behaves in the same way as it uses whatever value is in that module (e.g. A.FromModuleA).

Perhaps the issue reports a simplified example? I can't see a problem in the example shown there.

@diogomqbm
Copy link
Contributor Author

diogomqbm commented Jan 29, 2024

Even if they were to be loaded in different order, the runtime representation would be different, but the code behaves in the same way as it uses whatever value is in that module (e.g. A.FromModuleA).

@cristianoc I'm sorry if I was not clear enough in the issue description. The issue happened for us with message passing between two different runtimes/contexts where the message was an extensible type.

So, while they both refer to the same variable, this variable is created in different contexts with different values. E.g. while we're pattern matching on A.FromModuleA and sending A.FromModuleA on the other side, it occurred that the runtime value in one case was A.FromModuleA/1 where the other was A.FromModuleA/2. Hence, they would never match in the runtime.

Is it clear? I'm happy to discuss this in a more synchronous forum if needed. I'm available on the discord development server as well.

@cristianoc
Copy link
Collaborator

Thanks for providing the context.
I think it makes sense to generate the value statically for extensible variants.
I was unsure about exceptions, but looking at the generated code, it's always prefixed with the current module name, which is the file name, which should be unique or one would get a build error.
So perhaps even statically evaluating always gives a unique value.

In fact, I'm not sure I can even produce an example where the number is required in order to disambiguate values.

@cristianoc
Copy link
Collaborator

cristianoc commented Jan 29, 2024

Looks like functors are what require the usage of numbers to disambiguate:

let F = (M:{}) => { exception E }
module M1 = F({})
module M2 = F({})

Technically, I think a counter per-instance is what's required as a minimum for this to be correct in all cases.
So each time exception E is evaluated, a new number is produced. If one could do that, instead of a global counter shared by all, then it would be robust to different loading order, and correct for multiple invocations of a functor.

@diogomqbm
Copy link
Contributor Author

I think it makes sense to generate the value statically for extensible variants.

As far as I know, there's no representation difference between the extensible variant type and exceptions in OCaml. So, it looks like we'd need to make changes in lambda if we were to create this distinction between them. Or we agree to generate them both at compile time with a random aspect to them. Maybe instead of adding the counter, we could have a random number attached to the string.

In fact, I'm not sure I can even produce an example where the number is required in order to disambiguate values.

Yes, I also tried to find an example where the number would be needed but failed to do so. However, I thought of keeping it as a safety measure since I didn't know all the details of the first implementation.

@cristianoc
Copy link
Collaborator

Technically, not sure it can be done at compile time (with random or other methods), given that a functor is a function that can be invoked several times.

@cristianoc
Copy link
Collaborator

Would a counter per-string take care of the loading order issue?

@diogomqbm
Copy link
Contributor Author

Technically, not sure it can be done at compile time (with random or other methods), given that a functor is a function that can be invoked several times.

Indeed, when thinking about functors it only works if they're on the same file. Consider the following example:

module M = (A: {}) => {
  exception E
}

module M1 = M({})
module M2 = M({})

Generated:

// Generated by ReScript, PLEASE EDIT WITH CARE

function M(A) {
  var E = "Functor.M(A).E/1";
  return {
          E: E
        };
}

var E = "Functor.M(A).E/2";

var M1 = {
  E: E
};

var E$1 = "Functor.M(A).E/3";

var M2 = {
  E: E$1
};

export {
  M ,
  M1 ,
  M2 ,
}
/* No side effect */

As you can see it works for the same file, but now if we have another file calling M, we'd have:

// Generated by ReScript, PLEASE EDIT WITH CARE

import * as Functor from "./Functor.res.mjs";

var M3 = Functor.M({});

console.log({
      RE_EXN_ID: M3.E
    });

export {
  M3 ,
}
/* M3 Not a pure module */

which means that M3.E == "Functor.M(A).E/1", not the desired outcome. I wonder if we could isolate this case and treat it in a more specific way.

Would a counter per-string take care of the loading order issue?

@cristianoc Could you elaborate more on this?

@cristianoc
Copy link
Collaborator

cristianoc commented Jan 30, 2024

Would a counter per-string take care of the loading order issue?

@cristianoc Could you elaborate more on this?

The idea would be to modify slightly the runtime so instead of keeping just an integer, the counter would be a dict, to store a different counter for each string.
So with files A and B one would get say strings "A.X" and "B.Y" and the counter would return 1 in each case. So whether A is loaded first, or B is loaded first, the result will be the same.
But in the case of functor application as in the examples above, the counter would go beyond 1, preserving the semantics.

@diogomqbm
Copy link
Contributor Author

diogomqbm commented Jan 30, 2024

@cristianoc interesting, I think that would work! I'm going to submit an implementation later and also write a test for it so we make sure that a) it's not dependent on the order we load them and b) that we can cover functors by generating different values. Thank you!

@diogomqbm diogomqbm force-pushed the feat/deterministic-runtime-representation-for-exceptions branch from 1ad0ace to 02f5770 Compare January 31, 2024 20:08
@diogomqbm
Copy link
Contributor Author

diogomqbm commented Jan 31, 2024

@cristianoc @zth just pushed a new implementation following the idea mentioned here. So, basically we have an idMap where we keep track of all exceptions and every time we try to create a new exception with an already existing name we pop up the counter. Meaning that if you have exceptions with different names, you shouldn't have the reported order issue.

The only edge case I can see now is if you're using functors with exceptions you might stumble into this problem. But at least it's more narrow now. WDYT?

@zth
Copy link
Collaborator

zth commented Jan 31, 2024

Great work @diogomqbm ! I'll let @cristianoc comment on the implementation. With a little luck this fix can go in v11.1.

@cristianoc
Copy link
Collaborator

Looks great.
I think we can go ahead.

@cristianoc
Copy link
Collaborator

cristianoc commented Feb 1, 2024

For posterity, the corner case left would involve eg creating an exception E in the body of a functor.
Then calling that same functor from 2 different files to get 2 different copies of E.
Then make use of the two different exceptions called E in the same code path. And send such values over between different executions of the program (with different loading order).

@zth
Copy link
Collaborator

zth commented Feb 1, 2024

@diogomqbm a changelog and then we're good to go. Great work, well done! 🎉

@zth
Copy link
Collaborator

zth commented Feb 5, 2024

Ping @diogomqbm (soon time for another RC, would be good to get this in)

@diogomqbm diogomqbm force-pushed the feat/deterministic-runtime-representation-for-exceptions branch from 02f5770 to ebe2dd7 Compare February 5, 2024 22:15
@diogomqbm
Copy link
Contributor Author

@zth sorry for the delay. I got COVID last week and things have been bumpy since then 😅

Anyway, just pushed the changelog. One thing that came to mind is that maybe we don't want to expose idMap and the Map module in Caml_exceptions. So, I also added that an interface file so we don't expose those.

@zth
Copy link
Collaborator

zth commented Feb 6, 2024

@zth sorry for the delay. I got COVID last week and things have been bumpy since then 😅

Anyway, just pushed the changelog. One thing that came to mind is that maybe we don't want to expose idMap and the Map module in Caml_exceptions. So, I also added that an interface file so we don't expose those.

I hope you feel better soon! Covid is still pretty nasty.

Good catch! One final thing that I forgot about - could you change the base branch and rebase to 11.0_release?

@diogomqbm diogomqbm changed the base branch from master to 11.0_release February 6, 2024 15:11
@diogomqbm diogomqbm force-pushed the feat/deterministic-runtime-representation-for-exceptions branch from 627da1e to d0dc3a7 Compare February 6, 2024 15:12
@diogomqbm
Copy link
Contributor Author

could you change the base branch and rebase to 11.0_release?

done ✅

@zth zth enabled auto-merge (squash) February 6, 2024 15:44
@zth zth self-requested a review February 6, 2024 15:45
@zth zth merged commit 7d1fed8 into rescript-lang:11.0_release Feb 6, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants