Draft spec for pangene search #9

StevenCannon-USDA · 2023-07-17T19:50:08Z

Please see draft spec for pangenes search query - to find ~paralogous/allelic genes (corresponding by homology and synteny):
https://github.com/legumeinfo/website-ui-specs/tree/main/pangenes-search

... and provide feedback. Please respond via this issue.
@sammyjava @That-Thing @maxglycine @jd-campbell @alancleary @adf-ncgr @sdash-github

The pangene sets we have in the Data Store currently are for: Arachis, Cicer, Glycine, Medicago, Phaseolus, Vigna. I've tried to make the spec suitable for use at LegumeInfo, SoyBase, and PeanutBase.

This spec may again come before the mine backend is ready ... but it sounds like it is on the way.

sammyjava · 2023-07-17T21:35:30Z

Yeah, the mine 5.1.0.3 graphql-server is ready, and we can test against the dev MiniMine, which is on 5.1.0.3. So nothing holding us back pangene set-wise. The dev MiniMine is at https://mines.dev.lis.ncgr.org/minimine/begin.do

sammyjava · 2023-07-17T21:37:35Z

FYI, here's what PanGeneSet looks like in the graphql-server branch, just a bucket o' genes and proteins.

<class name="PanGeneSet" extends="Annotatable" is-interface="true">
        <collection name="dataSets" referenced-type="DataSet"/>
        <collection name="genes" referenced-type="Gene" reverse-reference="panGeneSets"/>
        <collection name="proteins" referenced-type="Protein" reverse-reference="panGeneSets"/>
</class>

type PanGeneSet implements Annotatable {
  ## Annotatable
  id: ID!
  identifier: ID!
  ontologyAnnotations: [OntologyAnnotation!]!
  publications: [Publication!]!
  ## PanGeneSet
  dataSets: [DataSet]
  genes: [Gene]
  proteins: [Protein]
}

adf-ncgr · 2023-07-17T22:11:44Z

thanks @StevenCannon-USDA I have a couple of minor (maybe) comments/questions on the initial spec:

the results you show seem to be displaying transcript/protein isoform ids; is this intended or should we just focus on the gene ids in what we present (seems cleaner to me)?
might we want to provide any additional details about the member genes such as their locations or sizes (e.g. to give at least a crude sense for variability)?
is there any implied sorting in how the pangene members are listed?
should the accession dropdown support multi-selection (e.g. suppose I want to get allelic comparisons between two favorite lines). And note that your first example seems to imply it shouldn't be a dropdown, but a text box matched as "contains"?
might we want to make explicit when a given accession is absent from a pangene set? e.g. suppose I wanted to know about genes that are missing from my favorite soybean line- would I want to get empty pangene representations for those pangene sets in which a selected accession does not occur, or simply not get them in the returned results?
would we want a linkout for the set of genes belonging to a pangene (e.g. pushing them to the GCV multi-alignment view or to an intermine list)

some of these are probably just stuff to think about for future iterations.

maxglycine · 2023-07-18T18:49:06Z

May want to add an output option to download query results to the users computer. A query could return a large amount of identifiers and the user may want to save them. Otherwise, the user would have to copy html text and paste it somewhere.

sammyjava · 2023-07-18T19:55:11Z

Genes in this pangene set would be best implemented by adding "size" to the PanGeneSet object in the mines and populating it in a post-processor, as we do with GeneFamily. That is not currently present in PanGeneSet in 5.1.0.3. Nor are there any other aggregate quantities like we have in GeneFamily 5.1.0.3:

<class name="GeneFamily" extends="Annotatable" is-interface="true" term="">
        <attribute name="description" type="java.lang.String"/>
        <attribute name="version" type="java.lang.String"/>
        <attribute name="size" type="java.lang.Integer"/>
        <reference name="phylotree" referenced-type="Phylotree" reverse-reference="geneFamily"/>
        <collection name="genes" referenced-type="Gene"/>
        <collection name="proteins" referenced-type="Protein"/>
        <collection name="proteinDomains" referenced-type="ProteinDomain" reverse-reference="geneFamilies"/>
        <collection name="dataSets" referenced-type="DataSet"/>
        <collection name="tallies" referenced-type="GeneFamilyTally" reverse-reference="geneFamily"/>
</class>

If this is a Big Deal, stop me from building 5.1.0.3 mines. GlycineMine 5.1.0.3 is almost built, took two weeks.

sammyjava · 2023-07-18T20:13:36Z

May want to add an output option to download query results to the users computer. A query could return a large amount of identifiers and the user may want to save them. Otherwise, the user would have to copy html text and paste it somewhere.

This sounds like an across-the-board option that would be implemented for all results output like pagination. Thoughts, @alancleary ? After all, we all remember that "Every page should have a download button!" :)

StevenCannon-USDA · 2023-07-18T20:34:32Z

@sammyjava - "Genes in this pangene set" - I would say "not a big deal" (not a high priority in the first implementation).

sammyjava · 2023-08-04T16:37:21Z

@StevenCannon-USDA I'm a bit confused about the scope of this search. Are you saying that we'll have a list of pangene sets, each with its corresponding genes listed below it? For example, what happens if the only search element is "Glycine", all else left blank? A gigantic list of all Glycine pangene-sets with their genes? (Which is fine, if that's what you want.)

sammyjava · 2023-08-04T16:42:26Z

And, if so, are you specifying that pagination be on a pangene-set-to-pangene-set basis? Each page displays a single pangene set? (That's just setting the page size to 1, which is easy. The list of genes within a pangene set would be part of that pangene set record's display.) Just want some detail on pagination expectations when we've got results which are a list of lists.

jd-campbell added the enhancement New feature or request label Aug 28, 2023

jd-campbell assigned That-Thing Sep 7, 2023

alancleary mentioned this issue Oct 4, 2023

Versioning specs #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft spec for pangene search #9

Draft spec for pangene search #9

StevenCannon-USDA commented Jul 17, 2023

sammyjava commented Jul 17, 2023

sammyjava commented Jul 17, 2023 •

edited

Loading

adf-ncgr commented Jul 17, 2023

maxglycine commented Jul 18, 2023

sammyjava commented Jul 18, 2023 •

edited

Loading

sammyjava commented Jul 18, 2023

StevenCannon-USDA commented Jul 18, 2023

sammyjava commented Aug 4, 2023

sammyjava commented Aug 4, 2023

Draft spec for pangene search #9

Draft spec for pangene search #9

Comments

StevenCannon-USDA commented Jul 17, 2023

sammyjava commented Jul 17, 2023

sammyjava commented Jul 17, 2023 • edited Loading

adf-ncgr commented Jul 17, 2023

maxglycine commented Jul 18, 2023

sammyjava commented Jul 18, 2023 • edited Loading

sammyjava commented Jul 18, 2023

StevenCannon-USDA commented Jul 18, 2023

sammyjava commented Aug 4, 2023

sammyjava commented Aug 4, 2023

sammyjava commented Jul 17, 2023 •

edited

Loading

sammyjava commented Jul 18, 2023 •

edited

Loading