-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String Interpolation #528
Comments
One advantage of this approach is it doesn't do any dynamic dispatch or anything special, it is just an extension of the current implicits and static overloading. We could even allow multiple parameters and have format specifiers for number precision etc. |
In response to the starting character vs delimiter, when choosing |
I like the idea of no starting character, and I do think braces are a good choice but I wonder if we could still a shorter ` for times where you are just using an identifier. \ doesn't make sense in those situations since \n could either mean a newline or interpolate the variable with the name n. Here is what that might look like:
With proper syntax highlighting or maybe indenting the whole match it might look good, but it seems a bit strange to me especially since blocks in Koka use |
I see what you're getting at. In the context of string interpolation you are more likely to want to put everything on a single line, increasing the chance that braces are needed to delimit blocks. That indeed may conflict visually with the interpolation syntax. A small mitigation might be to escape |
Using escaped braces like \{ and \} or prefix characters solves just the syntactical problem, but not visual. What about using |
The problem with
For simple identifiers an
And using it as a start character might not be too bad.
|
Well, I think the less characters used, the better. |
What about making it configurable per interpolator? This would allow the user to optimize for whatever kind of strings they are generating, and make this feature very helpful for embedded template languages. |
@chtenb I like the idea, maybe we have a definition like:
Of course this means that these definitions need to be at the top of files like infix declarations since we might want to resolve this prior to parsing. Though for infix operators we transform into an intermediate representation and resolve after parsing. @kuchta @chtenb
Of course then what would a DSL for HTML look like in Koka?
or maybe a bit better formatted.
At that point we almost need to auto-infer which prefix tag to use for plain strings "" when a string is passed into a function needing a particular type:
Obviously this looks really nice the non-inverted way with html, but I personally think it looks really nice the other way for more general expressions. Additionally you can argue that you really don't want to be doing this with strings anyways for HTML: (There is no static string in those examples, just nested div calls. So you can still just omit the string interpolation and have the following api which would work for generating strings or ASTs.
The one difficulty about this API is that you really don't want to build up the subpieces of the tree and then have a bunch of string appends. You'd rather generate from the outside in and append directly to a string-builder. You could create an intermediate AST, but that wastes time. Or the vector api in #527 sort of supports the above already via the |
I quite often think of koka as one of the best languages for the web, because it has compatible syntax that allow dashes in indentifiers. I image a future where I can write JSX like expressions in it. Writing html in a template languages (even tagged strings) never felt very pleasant to me and it would miss a lot of opportinities the React world already realized... |
With configurable delimiters you could even do lisp/scheme style quoting. :)
|
Yes, I think using configurable delimiters are probably the best way how to go about it. I was also thinking about inverted parenthesis 🙌🏻, but it probably could visually distract even more, if we are used to interpret them in some way (I mean in the context where there would be even non-inverted ones) |
With configurable delimiters it would probably be wise to have the escape character |
Regarding current syntax, it should be even possible to write something like this, right?
With all nested blocks treated as trailing lambdas. IMHO this is vastly superior syntax to something like JSX. It matches quite nicely do HTML, but doesn't suffer from having to close the tags... I'm just not sure if named parameters don't have to come last, but what about trailing lambda? documentation don't show how to write function consuming it... |
@TimWhiting: "You could create an intermediate AST, but that wastes time." - Not if that's what you'll need to update the DOM on the client... |
@kuchta Of course, I was thinking about string interpolation specific to this issue, not saying that AST is bad, but one heavily used use case for advanced string interpolation would be a server side renderer, and especially if you do not use the ast in any way and just plan to convert it to a string, it seems a bit wasteful. Yes, your syntax with trailing lambdas would be a great way to be build an AST unfortunately we need #491. Currently named parameters have to come last, including after trailing lambdas (since they just get desugared I think to the last parameter). With the change in the PR you could make the trailing lambda be a positional argument. Alternatively we could adjust the desugarer to put trailing lambdas after all positional arguments, but before named ones. |
@TimWhiting Exactly and those advanced server-side renderers might want to have features like React Server Components (RCS) for which some form of (build-time) code transformation (compilation) would be probably needed anyway. Aren't trailing lambdas always a positional argument, if as you say must come before named ones? But it then can't have a default value. That's unfortunatte... |
Back to the issue though: I realized a major flaw with allowing user configurable interpolation delimiters. Due to nested strings / interpolation, you have to resolve this at lexing time, otherwise you cannot find the end of the string! This means we would not be able to lex & parse in parallel, and would need to lex just the imports, later lexing the rest of the body after we know the delimiters to use for any prefixed string. This is not only complex, but a lot more work than the original proposal which would be able to be desugared directly in the parser. As much as I would like to see user configurable delimiters, it seems like at least for now we would need to settle on what to use, though ultimately the decision rests with Daan. It seems like most of us are interested in trying out Daan has more important issues he would like to work on first I think (specifically a robust async library, http/s tcp and other I/O). |
Yeah, not surprising :) The grammar essentially becomes configurable using a language construct. Perhaps instead of ` the |
I though it wouldn't be so easy, since practically nobody is using it. I will leave here some prior art that led me to angle brackets. Unix man pages syntax was probably the first where I encountered them and to this day most of the (not just) unix commands are using them as a placeholder for substitution of required arguments. |
@TimWhiting Wow, I'm really looking forward to it. 🤗 Yesterday I found out that my koka installation is quite outdated, since homebrew channel is probably no longer maintained... |
So here is a radical idea: Just don't use delimiters (if we have a delimiter why not use the normal delimiter "). And then an interpolation is just sequence of expressions beginning with a tagged string, you can have spaces or not.
Maybe an auto-formatter with some basic rules could make this look nice. I think it would still be good to have a rule for desugaring that nested strings in interpolation inherit the same tag as their parent, unless specified otherwise.
I realize the html example isn't necessarily the best example, especially since I just realized I messed up the syntax anyways, but it illustrates the point still, and gives something to consider when talking about indentation / formatting. This is not a totally crazy idea. Dart has 'adjacent string literals', which allows you to split string literals onto multiple lines for better readability, and preventing super long lines, they were basically implicit concatenation. Dart differs in the fact that it also has 'normal' interpolation. I'll clarify that I'd still like to see "an identifier &ident is cool" for simple concatenation. |
Interesting. This makes me think of function application in Haskell and PureScript, as you pass a bunch of primitive values into a function without using parenthesis and commas. In fact, PureScript does not have special string interpolation syntax, and the function In your example, |
Why not use join right away? 🙂
But I like it. It's general and minimal, koka style 🕺🏼 |
The main difference is that it is not an Koka already has parameters separated by whitespace (trailing lambda arguments). But they are clearly delimited by indentation and the @kuchta By the way:
Would not be possible for the general ?debug example at the top where you mix types, it would be weird for arrays to allow mixed types like this in just special situations. This is how it would look with using " as our 'non-delimiter'.
Of course if two overloads could match we might want some way of distinguishing which one to use. Either we need #531 or we could allow something strange.
|
@TimWhiting I don't know if I would call it argument separation by whitespace if there could be just one trailing argument, but making them optional would leave that decision to the author... I see where are you heading, Tim. Yes, it definitely has it's usage... |
You can actually have multiple trailing arguments:
gets desugared to
Or slightly elongated, and with explicit
Or
|
Oh, true... You are right. But wouldn't it then be more consistent to allow even non-trailing arguments to be also separated by whitespace? All separeted by whitespace, just trailing arguments delimited by indentation, non-trailing by parenthesis... |
Whitespace separation is hard to do for general arguments, especially prior to type checking and with operators: For example
Does this mean
I guess we could allow both, but the error messages would have to be really good to help people find where to put the parentheses they forgot that they think they didn't need. And maybe a more extended discussion on this topic should go somewhere besides the string interpolation issue. Trailing lambdas don't have this problem because they must start with For string interpolation we can get around this issue by requiring a "" between non-string adjacent parts, or because there is a clearer expectation to delimit individual interpolated parts with |
This conversation leads me to another possible way to think about it. We'd like interpolation to be flexible, both syntactically and semantically, but we also want to keep the language grammar simple and be able to desugar it early on in the compilation process. |
Something we haven't discussed in the context of string interpolation is formatting, where you specify the formatting of arguments via a format specifier. For an example, see https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/tokens/interpolated |
I think formatting is the easy bit.
Since you can overload the As far as metaprogramming, let's discuss that in a new issue #536 |
@TimWhiting I haven't commented on this, because everything is already said in the next paragraph, maybe except one thing. If you put it this way, the second example is definitely more natural, but being able to use both syntaxes would be great for DSL like shell, which I'm quite interested in.... |
Just trying to get some ideas out of my head and put them down somewhere, so this is just a general proposal for string interpolation and tagged strings, which are in several languages notably Python's f strings
f"{something}"
.Koka has the advantage of it potentially being more user configurable and customizable due to the name based overloading we have.
Along the lines of #527 I think it would be good to allow prefixed commonly called tagged string literals that allow user customization of string interpolation.
For example a debug interpolator:
Which could be different than another interpolator.
I think we also need a string-builder / buffer type that allocates more memory than needed, and can add or concatenate efficiently, including optimizations for the C backend that don't create an intermediate koka string for concatenating (or at least makes them constant or static, including a static reference count).
The desugaring of string interpolation then would be to start out with calling
tag/string-interpolate-start()
with a simple string""
if there is no string prior to interpolation, and then continue to calltag/string-interpolate-value(intermediate, value)
ortag/string-interpolate-string(intermediate, str)
on the subsequent pieces (relying on overloading based on types for different values), with atag/string-literal-finish(intermediate)
call at the end - converting the intermediate value back to a string. Not sure if the intermediate type should always be astring-builder
, or if we just desugar it and expect that the designer of the interpolators makes sure that it type checks correctly. I'm kind of inclined towards the more flexible option, so that the intermediate could be arope
orstring-buffer
or any other datastructure. This way the only change to Koka is the desugaring in the parser.There also might be a use for not automatically finishing the literal (such as when you then want to build more pieces onto it, but not in the same expression), or if you want to reuse the buffers directly for file or network I/O instead of creating an intermediate string. I don't have any specific ideas on what the default would be or how this could be configured.
I think we should start with disallowing mixing raw strings with interpolation.
As far as the syntax itself, there are all sorts of common syntaxes.
Typically there is a start character (e.g. (#,$,`,@,%, or \), followed by some delimiters of some sort {}, and then allowing any expression in the middle. Dart also allows omission of the delimiters for simple identifiers (ends at a non identifier character). Of course it is an error if the identifier is not in scope. Other languages opt for no start character and only require delimiters. I prefer having the option for delimiters for longer expressions and a start character for identifiers.
It is preferable to use a start character that doesn't occur often in strings, which forces you to escape them. It is hard to know what choice that should be and in different situations there might be different needs. However, `, to me seems like a less commonly used character in strings. #,$, and % I can see often being used when interpolating with number values, and @ is often in emails or mentions. Swift uses \. It might be worth allowing different interpolators to define what their escape character should be similar to the infix operator notation Koka supports, but it also might be worth restricting it to a specific one or set. In particular making this configurable would be terrible for syntax highlighting and grammars for IDEs (though many IDEs also support semantic highlighting via a language server, which can add additional highlighting that cannot be determined in limited syntax rules).
UPDATE:
A consensus has sort of evolved among the participants of the discussion that no particular set of delimiters works really well, so a more drastic if not simpler proposal evolved that we just allow adjacent expressions starting with a tagged string.
With this change the above example changes as follows:
We also discussed that
&
looks nice when referring to a single identifier as it reminds of taking a reference to a variable (i.e. referring to it). So an alternate look at the example is this:You could add an explicit '&' by escaping with a backslash.
Formatting (such as padding or precision specifiers I argue should be implemented as part of the overloading) - but this requires that we just take the second to last part of the local qualifier as the "tag" so we don't have clashing names. Or we can just require explicit function calls / transformation into strings, which I personally think is fine.
The text was updated successfully, but these errors were encountered: