-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft spec for pangeneset-based gene list translation UI #19
Comments
How important is it that the UI match what you've mocked up? It'll be easier (and stylistically consistent with our existing components) if the entire form, including the gene list, is above the results table. Also, note that the GraphQL endpoint that was previously prototyped for this only returns genes with no notion of what genes corresponds to what input genes. This can be remedied by either introducing new types into the GraphQL API or requesting the output genes' pan gene sets and their genes and then doing some post-processing. Neither is ideal. An alternative is to revise the GraphQL endpoint such that it gets pan genes for a single input gene. The UI element would then send multiple requests to this endpoint when the form is submitted - one for each gene in the input list. Perhaps this belongs under "future enhancements"; do you want the table to be sortable by one or more columns? |
It's not important to match the UI layout, that just seemed compact to me. Having the actual correspondences between input genes and results is critical. The intermine queries I had specified did this, will have to double-check if they can be adapted to work with the "ONE OF" constraint instead of "IN LIST". Not sure what your third paragraph refers to- seems like your thought got truncated there? Sortable tables are always appreciated, but I think it's OK to leave that for a future enhancement. |
I second all parts of the response by @adf-ncgr. The main requirements are:
Caveat regarding table* for the output: other structures might be acceptable, but a table is what will be most familiar to users. The additional wrinkle is representing one-to-many relationships, e.g.
Also, we'll need to be able to report an error message for query genes not found |
Right. My point is that just because we can craft an intermine query that will give us all the results in one request doesn't mean it translates well into GraphQL. The most canonical way to handle this in GraphQL is via filtering. This would make our API more expressive, rather than more esoteric.
@adf-ncgr's example in the UI spec handles the one-to-many relationship by adding multiple rows for an input gene if it has multiple output genes - one row for each output gene. You're example here adds a single row and puts all of the output genes into the third column. Which way should this be handled? I lean towards what's currently in the spec as it prevents horizontal overflow when a particular input gene has many result genes.
The generic search component that this web component will be built on already supports this, although this particular error should be mentioned in the spec as it's specific to this component.
Sorry about that; remnants of a discarded thought. I edited it out. |
I'm confused. Didn't we implement the ONE OF constraint specifically for this purpose? The GraphQL page you linked seems fairly similar if you're talking about this bit:
Regarding how to handle 1-to-many, I don't feel super-strongly about it, but my argument for doing it in the 1 row = 1 gene pair style is that if we were to augment the rows with additional info about the corresponding pair (e.g. "allelic" info like gene length) this would seem more natural. Regarding query genes not found, I'm not completely sure how to deal with it in this context. I think @StevenCannon-USDA is referring to input tokens that do not match genes in the database, regardless of whether they belong to pangenes or correspond to genes in the target annotation via these pangenes. It almost seems like this would require the list to be validated separately, similar to how intermine list builder currently works. Could this be relegated to future enhancement land? |
|
Sam implemented the ONE OF constraint as a prototype. It never got merged (or scrutinized) because we didn't make it this far in the discussion.
That's not the bit I was referring to but it could be used to handle multiple input genes in a single request. What I'm really interested in is "While fetching nested linked objects, you can also apply a filter on them." query {
getGene(id: "...") {
name
panGeneSets {
identifier,
genes(filter: {
genus: "...",
species: "...",
strain: "...",
assembly: "...",
annotation: "..."
}) {
identifier
}
}
}
} The implementation details of this would be to add
If we take the "one-request-per-input-gene" approach I'm advocating here then reporting which genes weren't found becomes trivial since the GraphQL server throws an error when a gene can't be found for the given identifier. |
Ok, I think I understand the commentary so far. And I think I have few observations: |
As with the other web components, linking is a post-graphql, pre-web component step, i.e. the links are site specific and inserted after the data is fetched right before it's displayed. You link things wherever you want!
This is separate functionality that should be encapsulated in it's own component or utility script. We can pursue it in tandem with this UI element but it needs its own spec.
I bet you $1 they do. |
I'm only concerned that one request per input gene won't be very performant, but could be convinced otherwise if you can prototype it (because you know you'll win that $1 bet with @maxglycine when I come up to the plate) |
What you are specifying in the proposed UI is the target genus, species, assembly, and strain, not the query. It's possible that we should do similar for the query, but the current spec would allow mixed source inputs (which might not be all that useful, admittedly). |
OK. I'll see what I can come up with! |
If we are concerned that users will try to "convert" names en masse from one assembly to another, I am OK with telling them there is a limit of say 50 per request to make it too painful to execute. |
@maxglycine I think we ought to be able to handle 100s if not 1000s of genes whatever approach we decide to adopt here since this will (I hope) lay the foundation for other list-oriented services where imposing draconian limits might defeat the purpose (e.g. gene set enrichment analysis). Just my 2c though. |
I agree with @adf-ncgr in that we need to preserve the linkage between the gene model name query and all of its related "pan" genes. Whether that is multiple rows for each query gene model name ie: glyma.Wm82.gnm4.ann1.glyma.01g00100PanGene1 or |
I agree with @StevenCannon-USDA if they want to sort, download the list and do it in Excel :0 |
Good Example from MaizeGDBEthy talked about it and demoed her creation for MGDB yesterday at AgBioData meeting. It is at:
Impressed because it looks so complete a tool !! |
Hi everyone, I prototyped a way of doing pangene list queries per gene by adding arguments to the # query
query PangeneListQuery($identifier: ID!, $genus: String, $species: String, $strain: String, $assembly: String, $annotation: String) {
gene(identifier: $identifier) {
results {
panGeneSets {
genes(genus: $genus, species: $species, strain: $strain, assembly: $assembly, annotation: $annotation) {
identifier
}
}
}
}
} Here are the variables I used to verify the functionality: I'm sure someone here can come up with a more interesting pangene set to test this on: {
"identifier": "phavu.G19833.gnm1.ann1.Phvul.001G000200",
"genus": "Phaseolus",
"species": "vulgaris",
"strain": "G19833",
"assembly": "gnm1",
"annotation": "ann1"
} These changes have been pushed to the |
Very nice initial implementation, @alanclery! (June 18). Another feature that I think will be desired (albeit at the cost of some additional UI complexity and more development time) is to handle inputs that lack the full yuck, i.e. One way this could be accomplished is to provide two forms of this page -- one, taking unprefixed IDs, would have 10 specification fields -- five for the query and five for the target. Alternatively, provide an optional sixth input field on this page, in which the user can provide a prefix string to be added to the query elements, e.g. Of those two options, the second one looks better to me at the moment. (I'll add: the reason that |
Initial thoughts here: https://github.com/legumeinfo/website-ui-specs/tree/main/pangeneset-based-gene-id-translation
feel free to propose changes in this issue or signify your consent (silence by the end of the week will imply consent).
Minor note, I tried using @sdash-github's plantuml for UI mockup (as described here: https://plantuml.com/salt); not sure if we'll decide to adopt this for more complex cases, but it seemed worth a try. If nothing else, it reminded me what a PITA specifying UI layout through use of nested tables can be (one of the reasons I never became a web developer, I think, though I know that's no longer the way...)
The text was updated successfully, but these errors were encountered: