-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adds enumerations #695
base: wdl-1.3
Are you sure you want to change the base?
Conversation
94cb3e3
to
53d28e9
Compare
6bc8f2b
to
92e82ac
Compare
I think this is ready for a preliminary review—I haven't added any tests yet, but that's because I want to make sure the idea, as I have codified it here, has support from a good number of people before investing the time to do that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the proposal. I think you should add that Enum values cannot be mutated or reassigned (they are immutable) and that Enums are closed once declared (you can't extend them to add new values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question, can the values of an enum be scattered? In most programming languages Enums are iterable.
Great point. In keeping with the spirit of what was done for similar methods, like getting the |
Could the function just be called |
Personally, I think it makes me sense to call it |
412fd7e
to
c63fe33
Compare
I've incorporated all of the feedback above, and this should be ready for a re-review. That being said, I don't like what I have on the page right now because I realized something while writing the example (well, maybe two things).
Here's an example to make sure it's clear why this distinction is important. enum FileType {
FASTQ,
BAM
}
FileType foo = FileType.FASTQ
# You wouldn't call `variants` on `foo`—why would that be necessary? And does it even really make sense?
# `foo` represents a variant _within_ the enum rather than the `enum` itself, so enumerating the variants
# within the enum itself seems strange.
variants(foo)
# You'd more likely want to, say, scatter across the variants from the enum type directly.
# But this introduces sort of a strange-looking function call.
variants(FileType) I will continue to chew on this but curious if others have thoughts on what should be done here. My gut reaction is that perhaps the definition custom types (enums and structs) need to introduce a namespace within the same name as the type (or alias) within which standard library functions can be constructed automatically. This would be similar to the scatter(type in FileType::variants()) {
# ...
} |
@claymcleod What about a new paradigm but one that I think will be easier to roll out: Adding class level methods enum FileType {
FASTQ,
BAM
}
FileType.variants() |
What if we think of an So
would be equivalent to the following
If we wanted to, we could also extend existing methods to handle an argument of type
There's also the question of whether an
|
Based on my understanding of what you're saying, I think both suggestions are essentially equivalent. The only real question is "which operator should we chose"? I can only think of one relatively good argument, and it's in favor of If we introduce these static methods with a In the case of an operator like Just to be clear, I don't think this has to be |
I think I'm not connecting what the proposed advantages of this design are over what I've got in the PR, so I'm not sure on my overall feeling on this. If you could share a bit more from your perspective, that'd be great. To address some smaller points: Accessors
On this, I'm not sure I like enabled the second line here. The core value of introducing enums is that there is a way to define a closed set of values that can be checked at compile time. When you start accessing things as if they were a map, you run into the issue that the variant might not exist, causing a runtime exception. Beyond that, I just don't see a lot of value to being able to access variants this way: to me, enums seem much stronger if we keep them as a strongly typed and statically checked part of the language with limited scope (in other words, if you need a Coercion
Leaning a bit on my philosophy I laid out above regarding the core value of enums, there's two contexts you can work with an enum in this proposal:
|
I think adding a new operator like If you do want to add it, would it be an idea to simply add a reserved eg. enum Animal {
CAT,
DOG
}
scatter (animal in Animal.variants) {} # = [Animal.CAT, Animal.DOG] Regarding coercion, one might be using a task which takes some integer as input. Perhaps some filtering threshold. The task and underlying tool support any integer value. However, in this particular workflow, only a handful of values might actually be supported: # task.wdl (Some third party developed task)
task filter_variants {
input {
Int threshold
File vcf
}
[...]
# workflow.wdl
import "https://example.com/task.wdl" as filter
enum FilterValue[Int] {
STRICT = 10,
NORMAL = 5,
WHO_CARES = 1
}
workflow somatic {
input {
FilterValue stictness
File tumor_fastq
File normal_fastq
}
[...]
call filter.filter_variants {
input:
threshold = strictness, # coercion from FilterValue to Int
vcf = vcf
}
[...] Would/should this be allowed? |
I think this is a great point about not being approachable. So to me, that kills
Overall, I think I'm leaning towards coming back to the first one—it looks a bit weird, but it's consistent with the way we've called functions in the past, and I think it's the least magical. Perhaps once we put in the right syntax highlighting, this wouldn't seem as weird as I'm envisioning.
In this case, as proposed, I think the automatic coercion would not work out of the box—doing the coercion implicitly within the type system is probably not a good idea. There might be a case for a method called |
My preference is also for the first alternative (e.g. In this instance, Type references (i.e. the ability to reference a type as a runtime value) are not currently represented in WDL's value system. The other syntaxes would not require modifying the value system to work as they could be accomplished by special-casing the access expression to see if we're operating on a name reference to a custom type and skip evaluation of the name reference expression (i.e. the expression It also means that the expression |
One quick thought based on a skim: Given the primary domain in which WDL is used, |
Given the contention that the scatter(type in [FileType.FASTQ,FileType.BAM]) {
} Then we have a pretty good signal people actually want it and we can come back to it |
Yeah I think that's reasonable Patrick. So, I say we scratch the notion of enumerating variants for now and see whether people ask for it. As you show, it will be possible to enumerate over values explicitly like that, it's just annoying and not really scalable. This also limits the impact of Geoff's observation, as the only time users will need to wrangle with them being called variants is in the documentation (I think variants is still the proper choice here, but I do agree that conflating the topics of variants in these two contexts on a day-to-day level are not helpful). After that decision, I'm quite sure that all of the feedback has been addressed and now this is just ready for a review again. |
This commit adds enumerations to WDL.
Based on my discussions and browsing previous threads surround the idea, it sounds like the value of adding enumerations to WDL is nearly universally agreed upon as an improvement regarding the user experience for limiting assignment to a set of valid values within a particular context; it's simply the details that need to be hashed out.
In this PR, enumerations in WDL are valued, meaning that they have an assigned valued for each variant therein. These values can either be explicitly or implicitly typed. When it comes to the details, I try to take a common sense approach and a "middle of the road" position on the spectrum of conciseness versus flexibility. In particular:
String
with an assigned value equivalent to the variant name (leans towards common sense and conciseness).Other detailed notes
On the removal of the
.name
accessorIn a previous iteration of the enum concept, the stringified variant name could be accessed with a
.name
accessor. With the adoption of flexibly valued enums, I'd argue that an enforced separation of concerns between the variant name and the assigned value for the variant is more clear:Checklist