Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds enumerations #695

Open
wants to merge 2 commits into
base: wdl-1.3
Choose a base branch
from
Open

Conversation

claymcleod
Copy link
Collaborator

@claymcleod claymcleod commented Jan 15, 2025

This commit adds enumerations to WDL.

Based on my discussions and browsing previous threads surround the idea, it sounds like the value of adding enumerations to WDL is nearly universally agreed upon as an improvement regarding the user experience for limiting assignment to a set of valid values within a particular context; it's simply the details that need to be hashed out.

In this PR, enumerations in WDL are valued, meaning that they have an assigned valued for each variant therein. These values can either be explicitly or implicitly typed. When it comes to the details, I try to take a common sense approach and a "middle of the road" position on the spectrum of conciseness versus flexibility. In particular:

  • The inner types can be elided if they are trivially computed (leans towards conciseness).
  • When you'd like to be explicit with the inner type, you can be (leans towards flexibility).
  • Enumeration variants with no assigned value are assumed to be typed as a String with an assigned value equivalent to the variant name (leans towards common sense and conciseness).
  • Enumerations that have the same type can be compared by their inner value. Enumerations that are not of the same type cannot be compared, as that is equivalent to a type mismatch (leans towards common sense approach, as comparing two enums of different types implies different contexts, even if the inner values of the variants are the same).

Other detailed notes

On the removal of the .name accessor

In a previous iteration of the enum concept, the stringified variant name could be accessed with a .name accessor. With the adoption of flexibly valued enums, I'd argue that an enforced separation of concerns between the variant name and the assigned value for the variant is more clear:

  • The variant name itself is only relevant within the context of the enum within the type system. Strictly speaking, it is not intended to be serialized and has no meaning outside the type system in and of itself.
  • When a variant is evaluated (say, via serialization), the inner value assigned to that variant is substituted into the context. This allows for flexibility of a variety of situations based on how you design the enum. For example, you could use an enum like this to evaluate a verbosity flag:
enum VerbosityFlag {
  Quiet = "",
  Info = "-v",
  Debug = "-vv",
  Trace = "-vvv"
}
  • Furthermore, the behavior of the old model is still represented by the default case in the current model: simply define the enum with no type and assigned value, and you will be able to access the stringified version of the variant via the assigned value.

Checklist

  • Pull request details were added to CHANGELOG.md
  • Valid examples WDL's were added or updated to the SPEC.md (see the guide on writing markdown tests)

@claymcleod claymcleod force-pushed the enums branch 3 times, most recently from 94cb3e3 to 53d28e9 Compare January 15, 2025 06:07
@claymcleod claymcleod self-assigned this Jan 15, 2025
@claymcleod claymcleod force-pushed the enums branch 4 times, most recently from 6bc8f2b to 92e82ac Compare January 16, 2025 05:40
@claymcleod claymcleod marked this pull request as ready for review January 16, 2025 05:44
@claymcleod
Copy link
Collaborator Author

I think this is ready for a preliminary review—I haven't added any tests yet, but that's because I want to make sure the idea, as I have codified it here, has support from a good number of people before investing the time to do that.

@claymcleod claymcleod changed the title feat: add enums feat: adds enumerations Jan 16, 2025
@claymcleod claymcleod added the spec change A suggested change to the WDL specification. label Jan 16, 2025
Copy link
Contributor

@markjschreiber markjschreiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the proposal. I think you should add that Enum values cannot be mutated or reassigned (they are immutable) and that Enums are closed once declared (you can't extend them to add new values).

Copy link
Contributor

@markjschreiber markjschreiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question, can the values of an enum be scattered? In most programming languages Enums are iterable.

SPEC.md Outdated Show resolved Hide resolved
@claymcleod
Copy link
Collaborator Author

claymcleod commented Jan 16, 2025

Open question, can the values of an enum be scattered? In most programming languages Enums are iterable.

Great point. In keeping with the spirit of what was done for similar methods, like getting the keys of an object, I will add a variants standard library function.

@markjschreiber
Copy link
Contributor

Could the function just be called enumerate

@claymcleod
Copy link
Collaborator Author

claymcleod commented Jan 16, 2025

Could the function just be called enumerate

Personally, I think it makes me sense to call it variants. This is to be consistent with the choice of the function name matching the concept name for enumeration of those concepts within Objects. For example, keys defines what is being enumerated, not the literal name of the enumeration action (enumerate, enumerate_keys, enumerate_values). Since each member of an enum is called a "variant", the name variants would align with the pre-existing convention. Furthermore, you might want to enumerate more than just the variants, so it has the added benefit of being co-introduced with an overload to the values function for an enum.

@claymcleod claymcleod force-pushed the enums branch 5 times, most recently from 412fd7e to c63fe33 Compare January 18, 2025 17:00
@claymcleod
Copy link
Collaborator Author

claymcleod commented Jan 18, 2025

I've incorporated all of the feedback above, and this should be ready for a re-review. That being said, I don't like what I have on the page right now because I realized something while writing the example (well, maybe two things).

  • To my knowledge, the variants method would be a new paradigm being added to WDL, as it's a function that takes a custom type itself rather than an instance of the type. For example, you can iterate on the keys in a Map, but that's called on an instance of Map. There are, for example, no methods that iterate the fields in a Struct.
  • During enum assignment, you're really assigning the value to one of the variants within the enum, not the enum itself. So it's not strictly true to call this variable "an instance of the enum". It is probably more correct to say "an instance of one of the enum's variants".

Here's an example to make sure it's clear why this distinction is important.

enum FileType {
  FASTQ,
  BAM
}

FileType foo = FileType.FASTQ

# You wouldn't call `variants` on `foo`—why would that be necessary? And does it even really make sense? 
# `foo` represents a variant _within_ the enum rather than the `enum` itself, so enumerating the variants
# within the enum itself seems strange.
variants(foo)

# You'd more likely want to, say, scatter across the variants from the enum type directly.
# But this introduces sort of a strange-looking function call.
variants(FileType)

I will continue to chew on this but curious if others have thoughts on what should be done here. My gut reaction is that perhaps the definition custom types (enums and structs) need to introduce a namespace within the same name as the type (or alias) within which standard library functions can be constructed automatically. This would be similar to the :: operator in Rust, e.g.,

scatter(type in FileType::variants()) {
  # ...
}

@patmagee
Copy link
Member

I've incorporated all of the feedback above, and this should be ready for a re-review. That being said, I don't like what I have on the page right now because I realized something while writing the example (well, maybe two things).

  • To my knowledge, the variants method would be a new paradigm being added to WDL, as it's a function that takes a custom type itself rather than an instance of the type. For example, you can iterate on the keys in a Map, but that's called on an instance of Map. There are, for example, no methods that iterate the fields in a Struct.
  • During enum assignment, you're really assigning the value to one of the variants within the enum, not the enum itself. So it's not strictly true to call this variable "an instance of the enum". It is probably more correct to say "an instance of one of the enum's variants".

Here's an example to make sure it's clear why this distinction is important.

enum FileType {
  FASTQ,
  BAM
}

FileType foo = FileType.FASTQ

# You wouldn't call `variants` on `foo`—why would that be necessary? And does it even really make sense? 
# `foo` represents a variant _within_ the enum rather than the `enum` itself, so enumerating the variants
# within the enum itself seems strange.
variants(foo)

# You'd more likely want to, say, scatter across the variants from the enum type directly.
# But this introduces sort of a strange-looking function call.
variants(FileType)

I will continue to chew on this but curious if others have thoughts on what should be done here. My gut reaction is that perhaps the definition custom types (enums and structs) need to introduce a namespace within the same name as the type (or alias) within which standard library functions can be constructed automatically. This would be similar to the :: operator in Rust, e.g.,

scatter(type in FileType::variants()) {
  # ...
}

@claymcleod What about a new paradigm but one that I think will be easier to roll out: Adding class level methods

enum FileType {
  FASTQ,
  BAM
}

FileType.variants()

@jdidion
Copy link
Collaborator

jdidion commented Jan 22, 2025

What if we think of an enum as a special kind of Map whose keys are strings and values are of a hidden type.

So

enum Foo {
  BAR = "bar"
  BAZ = "baz"
}

would be equivalent to the following

# this is a hidden type
struct Foo {
  String value
}

# this is a singleton instance - hand-wave that the instance
# can have the same name as the type
Map[String, Foo] Foo = {
  "BAR": Foo { value = "bar" },
  "BAZ": Foo { value = "baz" }
}

If we wanted to, we could also extend existing methods to handle an argument of type enum, e.g.

  • keys would return the names of the variants.
  • Indexing ([]) would enable getting a variant by name
Foo bar1 = Foo.BAR
Foo bar2 = Foo["BAR"]

There's also the question of whether an enum variant is automatically coerced to it's value type, or if an explicit call to .value is required:

# this
String bar_string1 = bar1
# or this?
String bar_string2 = bar2.value
# or both?

@claymcleod
Copy link
Collaborator Author

claymcleod commented Jan 23, 2025

@claymcleod What about a new paradigm but one that I think will be easier to roll out: Adding class level methods

enum FileType {
  FASTQ,
  BAM
}

FileType.variants()

Based on my understanding of what you're saying, I think both suggestions are essentially equivalent. The only real question is "which operator should we chose"? I can only think of one relatively good argument, and it's in favor of :: (or, more strictly, anything other than .).

If we introduce these static methods with a . operator, the next logical question is "why can I not do something like <variable>.keys() instead of keys(<variable>)? After all, the use of . to access methods directly attached to an instance of a class is going to be something that users are used to (and perhaps even expect) from their experience using more general programming languages. In truth, I'm not sure if there is a good reason that something like that doesn't work other than simplicity. So, in this case, it might be quite discordant to find that the less obvious case of using . (accessing static methods) works but the more obvious case of using . (accessing methods on a class instance) doesn't work.

In the case of an operator like :: (or any operator other than .), I don't believe these expectations (and, thus, the perceived discordance) are introduced—because :: is a more uncommon operator, it will likely be more natural to think of this as much different than what a method accessor would do. In other words, it doesn't follow that the addition of :: to access namespaced methods will introduce frustration about the lack of a method accessor.

Just to be clear, I don't think this has to be :: specifically. If we don't like the way it looks or we don't want to lean too hard into what Rust is doing (I'm obviously biased here). I just think it should probably not be . based on the argument above. To that end, I don't have a better suggestion for what it should be, and I'd be open to hearing any.

@claymcleod
Copy link
Collaborator Author

claymcleod commented Jan 23, 2025

What if we think of an enum as a special kind of Map whose keys are strings and values are of a hidden type.

So

enum Foo {
  BAR = "bar"
  BAZ = "baz"
}

would be equivalent to the following

# this is a hidden type
struct Foo {
  String value
}

# this is a singleton instance - hand-wave that the instance
# can have the same name as the type
Map[String, Foo] Foo = {
  "BAR": Foo { value = "bar" },
  "BAZ": Foo { value = "baz" }
}

If we wanted to, we could also extend existing methods to handle an argument of type enum, e.g.

* `keys` would return the names of the variants.

* Indexing (`[]`) would enable getting a variant by name
Foo bar1 = Foo.BAR
Foo bar2 = Foo["BAR"]

There's also the question of whether an enum variant is automatically coerced to it's value type, or if an explicit call to .value is required:

# this
String bar_string1 = bar1
# or this?
String bar_string2 = bar2.value
# or both?

I think I'm not connecting what the proposed advantages of this design are over what I've got in the PR, so I'm not sure on my overall feeling on this. If you could share a bit more from your perspective, that'd be great. To address some smaller points:

Accessors

Foo bar1 = Foo.BAR
Foo bar2 = Foo["BAR"]

On this, I'm not sure I like enabled the second line here. The core value of introducing enums is that there is a way to define a closed set of values that can be checked at compile time. When you start accessing things as if they were a map, you run into the issue that the variant might not exist, causing a runtime exception. Beyond that, I just don't see a lot of value to being able to access variants this way: to me, enums seem much stronger if we keep them as a strongly typed and statically checked part of the language with limited scope (in other words, if you need a Map, then use a Map).

Coercion

# this
String bar_string1 = bar1
# or this?
String bar_string2 = bar2.value
# or both?

Leaning a bit on my philosophy I laid out above regarding the core value of enums, there's two contexts you can work with an enum in this proposal:

  • Within the WDL type system, wherein enumerations should be strictly typed and statically verifiable. So, if you are referring to an enum variant as a type, then it should be a unique, closed type that only exactly matches the enum variant itself—the value within is never even considered.
  • Within an evaluation context (e.g., expressions or the command section), there is a one-way conversion of the enum variant to its inner value. This decision is made because it is asserted that you never want to resolve the name of an enum variant type directly (and, if you do, the shorthand version where the value within is a stringified version of the variant name helps you).

@DavyCats
Copy link
Contributor

I think adding a new operator like :: would make the language look more intimidating to non programmers (and maybe even more casual programmers not familiar with Rust or R). This would go against one of the central premises behind the design of WDL. I'm wondering whether this is even a necessary feature to add (obtaining a list of variants that is; enums would be very useful). Is there an actual use case for it?

If you do want to add it, would it be an idea to simply add a reserved variants member variable to the enum type? Essentially, enums would simply have an additional implicitly added variant which equals a list of all other variants.

eg.

enum Animal {
    CAT,
    DOG
}

scatter (animal  in Animal.variants) {} # = [Animal.CAT, Animal.DOG]

Regarding coercion, one might be using a task which takes some integer as input. Perhaps some filtering threshold. The task and underlying tool support any integer value. However, in this particular workflow, only a handful of values might actually be supported:

# task.wdl (Some third party developed task)
task filter_variants {
    input {
        Int threshold
        File vcf
    }
[...]

# workflow.wdl
import "https://example.com/task.wdl" as filter

enum FilterValue[Int] {
  STRICT = 10,
  NORMAL = 5,
  WHO_CARES = 1
}

workflow somatic {
    input {
        FilterValue stictness
        File tumor_fastq
        File normal_fastq
    }
[...]
    call filter.filter_variants {
        input:
            threshold = strictness, # coercion from FilterValue to Int
            vcf = vcf
    }
[...]

Would/should this be allowed?

@claymcleod
Copy link
Collaborator Author

I think adding a new operator like :: would make the language look more intimidating to non programmers (and maybe even more casual programmers not familiar with Rust or R). This would go against one of the central premises behind the design of WDL. I'm wondering whether this is even a necessary feature to add (obtaining a list of variants that is; enums would be very useful). Is there an actual use case for it?

If you do want to add it, would it be an idea to simply add a reserved variants member variable to the enum type? Essentially, enums would simply have an additional implicitly added variant which equals a list of all other variants.

eg.

enum Animal {
    CAT,
    DOG
}

scatter (animal  in Animal.variants) {} # = [Animal.CAT, Animal.DOG]

I think this is a great point about not being approachable. So to me, that kills :: specifically. I do think there is going to be a use case for scattering over enum variants, but I'm also happy to make a case for it if we feel like we need to. In the meantime, I think there are three viable approaches to getting variants:

  • variants(FileType). I think this looks weird, but maybe it's okay.
  • FileType.variants(). We don't call functions/methods in this way historically, and perhaps it even runs afoul of the point you made above regarding being non-approachable (because it complicates the ways in which you can call a function on something—now I have to know what context I'm in and call it the right way).
  • FileType.variants. It feels strange to me to rather arbitrarily dictate that there is a field named variants that is reserved. It's also a bit magical to automatically get this field assigned to the enum.

Overall, I think I'm leaning towards coming back to the first one—it looks a bit weird, but it's consistent with the way we've called functions in the past, and I think it's the least magical. Perhaps once we put in the right syntax highlighting, this wouldn't seem as weird as I'm envisioning.

Regarding coercion, one might be using a task which takes some integer as input. Perhaps some filtering threshold. The task and underlying tool support any integer value. However, in this particular workflow, only a handful of values might actually be supported:

# task.wdl (Some third party developed task)
task filter_variants {
    input {
        Int threshold
        File vcf
    }
[...]

# workflow.wdl
import "https://example.com/task.wdl" as filter

enum FilterValue[Int] {
  STRICT = 10,
  NORMAL = 5,
  WHO_CARES = 1
}

workflow somatic {
    input {
        FilterValue stictness
        File tumor_fastq
        File normal_fastq
    }
[...]
    call filter.filter_variants {
        input:
            threshold = strictness, # coercion from FilterValue to Int
            vcf = vcf
    }
[...]

Would/should this be allowed?

In this case, as proposed, I think the automatic coercion would not work out of the box—doing the coercion implicitly within the type system is probably not a good idea. There might be a case for a method called value(<variant>) that gets the value out of a variant to satisfy the condition you're talking about here.

@peterhuene
Copy link
Contributor

peterhuene commented Jan 24, 2025

My preference is also for the first alternative (e.g. variants(FileType)), simply for consistency with the rest of WDL and to avoid adding new/overloaded syntax, but thinking about it from a programming language perspective, to treat it as a function call to variants, it would need to accept a value in that position as functions take values. I would greatly prefer not to go down the road of having "special" functions that aren't of the production form (value*) -> value.

In this instance, FileType would be a name reference expression and normally we'd try to resolve that against local names (there's nothing special about custom type names in WDL). For this to work, we'd have to also check the reference against custom type names and also prohibit a local name from conflicting with a custom type name; I'm not sure if the spec already requires that, but right now miniwdl warns about the name conflict and sprocket does not emit a diagnostic. The expression would then need to produce a "type reference" value which can be passed to the function.

Type references (i.e. the ability to reference a type as a runtime value) are not currently represented in WDL's value system.

The other syntaxes would not require modifying the value system to work as they could be accomplished by special-casing the access expression to see if we're operating on a name reference to a custom type and skip evaluation of the name reference expression (i.e. the expression FileType would therefore not need to produce a value).

It also means that the expression (FileType).variants() would not be equivalent as we'd treat that as a name reference to FileType, which wouldn't exist as a local name, assuming the conflicting name would be prohibited. This distinction can be seen in languages that use a single access operator for both static and instance member access (for example Java and C#).

@geoffjentry
Copy link
Member

One quick thought based on a skim: Given the primary domain in which WDL is used, .variants() maybe a bit too overloaded of a term

@patmagee
Copy link
Member

Given the contention that the variants function has, I wonder if we need to release the enum feature with it, or if we can release and gauge interest in that feature. If we see Workflows in the wilds that are doing something like:

scatter(type in [FileType.FASTQ,FileType.BAM]) {

}

Then we have a pretty good signal people actually want it and we can come back to it

@claymcleod
Copy link
Collaborator Author

claymcleod commented Jan 24, 2025

Yeah I think that's reasonable Patrick. So, I say we scratch the notion of enumerating variants for now and see whether people ask for it. As you show, it will be possible to enumerate over values explicitly like that, it's just annoying and not really scalable.

This also limits the impact of Geoff's observation, as the only time users will need to wrangle with them being called variants is in the documentation (I think variants is still the proper choice here, but I do agree that conflating the topics of variants in these two contexts on a day-to-day level are not helpful).

After that decision, I'm quite sure that all of the feedback has been addressed and now this is just ready for a review again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec change A suggested change to the WDL specification.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants