Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify note about eager vs. lazy evaluation #901

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

catamorphism
Copy link
Collaborator

Closes #784

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start. A couple of minor comments.

Comment on lines 61 to 62
> Users may write custom functions that have observable side effects.
> Lazy evaluation may involve evaluating the same expression multiple times
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good practice to avoid the normative words like 'may' in non-normative contexts like this.

Suggested change
> Users may write custom functions that have observable side effects.
> Lazy evaluation may involve evaluating the same expression multiple times
> Users or implementations can provide functions that have observable side effects.
> Lazy evaluation might involve evaluating the same _expression_ multiple times

Notice that I removed the word "custom" (which might be presumed?) and bring in implementations.

Consider adding a health warning to this paragraph?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the "may". What do you mean by "health warning"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By health warning, something like: It's a bad idea to create selectors or formatter functions with such side effects

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'm not sure I would agree across the board. For example, you might want a currentTime function that returns the system clock time. This doesn't have "side effects" as such, but returns a different value when it's called at different times, so:

.local $time = {:currentTime}
{{ {$time} {$time} }}

could have a different result depending on whether the implementation evaluates the RHS of the variable time once or twice.

(The jargon is that currentTime is not referentially transparent.)

Perhaps some language saying that care should be taken when defining such functions and they should be avoided when possible?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we already had something in the spec about side effects, but the closest we come to is

Function handler access to the formatting context MUST be minimal and read-only,
and execution time SHOULD be limited.

which relies on an implementation choosing to call "everything that a function can access" as its formatting context, because we've not explicitly prohibited access outside it.

That is to say, I agree with @aphillips that we ought to have stronger language to recommend against functions that mutate anything as a side effect.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more language to (a) clarify that the problem also occurs with functions that don't mutate anything, but that depend on external mutable state; and (b) recommend against functions that mutate anything observably.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...

side effect == touches the resolved value?

Lots of functions have "side effects" if that's the case. @eemeli your suggestion of limiting the resolved value of :date and :time would be a side effect if we were to implement it. currentTime is a good example. random is another. Some operations, such as case-folding, alter the value in an irreversible way. Other operations, such as unfloating a time value, change the meaning.

Our design tries to be immutable, but we're actually mutable to the depth of one expression. Chaining .local or reannotating a placeholder can make these side-effects more visible.

Perhaps this paragraphs wants to be more like:

Evaluating an expression can affect the resolved value associated with
a given variable.
When combined with eager vs. lazy evaluation,
in which the same expression might be evaluated multiple times,
the side-effects of the function on the resolved value
might alter the result of formatting the message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side effect == touches the resolved value?

Lots of functions have "side effects" if that's the case. @eemeli your suggestion of limiting the resolved value of :date and :time would be a side effect if we were to implement it.

Defining the resolved value returned by a function handler is not a side effect. Modifying the operand would be a side effect. But a function handler will quite rarely have a reason to return its input operand; in general, it will instead create a new resolved value and return that.

currentTime is a good example. random is another. Some operations, such as case-folding, alter the value in an irreversible way. Other operations, such as unfloating a time value, change the meaning.

A reasonable function handler for any of these would not have any side effects, but would instead create a new resolved value and return it.

Our design tries to be immutable, but we're actually mutable to the depth of one expression. Chaining .local or reannotating a placeholder can make these side-effects more visible.

Being able to use a resolved value as an operand does not introduce mutability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "side effect" is clearly confusing, so I rewrote the paragraph without using it. Please take another look!

> that returns the current time and date.)
> In some environments, users or implementations can provide functions
> that mutate state that is external to the message formatter
> (for example, deleting a file in the filesystem). This is not recommended.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"not recommended" turns out to be a 2119 keyword 😉

Is this a likely example? Maybe a better example would be modifying the system or application locale (in some environments, setting the locale is not thread-safe)

Suggested change
> (for example, deleting a file in the filesystem). This is not recommended.
(for example, setting or modifying the system locale).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... I think the problem here is the audience for your addition. This turns out not to be in a note block. It's in a normative "IMPORTANT" section. It should be written in a normative way or it should be in an informative NOTE inside the IMPORTANT block.

I'd actually do the former, replacing this paragraph with something like:

Implementations and users creating custom functions SHOULD avoid
creating function handlers that mutate external program state
or which depend on external mutable state in a way that would
result in different resolved values for a variable depending on
whether a given expression is evaluated at most once (call-by-need)
or multiple times (call-by-name).

The :currentDateTime is a good example of a function that implementation might very well want to provide and which produces a different result every time you call it (but at least a few ms, right?). Users of that function would need to know that they can't rely on it producing the exact same number each time they call it (the system clock is running) or that they can't rely on it to not provide the same value every time in a given formatting call (the system clock is running, but the function only evaluates it once).

The warning about mutating program state doesn't seem to belong with eager/lazy? Can you think of an example of a real function that an implementation might provide (which isn't a security exploit)? I mean, my "set locale" suggestion above might quality in C, for example. That would certainly affect the formatting of the message (unless the "formatting context" was caching the locale). It would also be cause for spanking the developer who created/installed it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I follow, but I'm a little unsure as to what you're suggesting the normative change should be. If it's "SHOULD avoid...", as in your suggested wording, doesn't that suggest there shouldn't be a currentDateTime or random function? But I can imagine these functions being useful, it's just that it may be surprising that different implementations with different evaluation strategies would handle them differently. Do we want to say to not provide functions like these, or just to be judicious about it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried splitting the note into a normative part and an editorial "note" part, maybe that's closer to what you have in mind?

@macchiati
Copy link
Member

macchiati commented Oct 9, 2024 via email

@macchiati
Copy link
Member

macchiati commented Oct 9, 2024 via email

Comment on lines +69 to +70
> Lazy evaluation might involve evaluating the same _expression_ multiple times
> (call-by-name) or evaluating every expression at most once (call-by-need).
Copy link
Collaborator

@gibson042 gibson042 Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think it's a mistake to allow the same expression to ever evaluate to a different value inside a single message formatting operation. This is something that CEL (to pick a roughly analogous technology) gets very right.

spec/formatting.md Outdated Show resolved Hide resolved
@eemeli
Copy link
Collaborator

eemeli commented Oct 10, 2024

Should we include some language encouraging implementations to sandbox or otherwise restrict the access of function handlers to external state?

Co-authored-by: Addison Phillips <[email protected]>
@catamorphism
Copy link
Collaborator Author

Per the 2024-10-21 call, I added stricter language saying that call-by-name implementations are not allowed because of the presence of impure functions. Please take another look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spec language on eager vs. lazy evaluation should be clarified
5 participants