-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add v3 SV dataset #1209
Add v3 SV dataset #1209
Conversation
@mattsolo1 this one will definitely be easier to review commit by commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heya Phil,
Lookin' very slick. I love all the testing, and I selfishly I am excited to have such a diverse set of tests for me to personally reference when adding to the test suite. On the whole this PR is looking great.
I am requesting changes mainly so a few comments get addressed, and so there's record of the fact that this PR is awaiting some copy edits (and as discussed in standups, at least one further commit that replaces v3 -> v4). Further, I'd like to request that when the demo is back up, that I could get looped in to poke around and also confirm things are stable.
I always enjoy reviewing your PRs as I get to learn more design and testing patterns as I do so -- very nicely done.
// @ts-expect-error TS(7006) FIXME: Parameter 'variant' implicitly has an 'any' type. | ||
(variant) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be typed and the @ts-expect-error
be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rileyhgrant I defined this as a StructuralVariantPropType
rather than a StructuralVariant
. As a comment in this PR implies, I took a look at collapsing the two types into one, which turns out to be more finicky than it might seem. Given that we're under the gun I've decided not to touch that for now. Better two partially redundant types than none at all.
LOWQUAL_WHAM_SR_DEL: 'Wham And Spilt-Read Evidence Only', | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to read "Split" read, rather than "Spilt"? My domain knowledge fails me, so wanted to double check.
key: string | ||
) => textMapping[key] || `TEXT NEEDED FOR ${entityType.toUpperCase()} "${key}"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, so is the intended use case of this is to give the scientists reviewing demos the chance to see that text is missing? And then there are several pieces of copy that get wrapped in this function?
I'm not entirely sure how I feel about the need for this in production, as opposed to being a squeaky wheel to get copy, but I suppose I can see the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, so is the intended use case of this is to give the scientists reviewing demos the chance to see that text is missing?
Basically, yes. With this helper, it's at least clear that something is supposed to go there, rather than say me leaving something out of the frontend by accident. This came up a few times in the course of development, before I had all of the copy in question, where it would be confusing for the scientists QAing a demo if they saw a blank space where one of those missing pieces of copy should go. This way it's at least clear that something is supposed to go there, and there's a strong hint as to what exactly is missing.
I'd argue that we should keep it on the same principle: worst case, we miss some copy somewhere, someone sees this "TEXT NEEDED" message in prod, and they can give us a bug report that points directly to to the problem.
"structural_variants_step_1": "import_all_svs_from_vcfs", | ||
"structural_variants_step_2": "add_histograms", | ||
"structural_variants": "add_variant_id_upper_case", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a nitpick - I believe pipeline outputs are primarily used when referencing the pipeline in other modules to find the location of the created hailtables (e.g. to export the final table to elasticsearch).
As such, it's really needed to set outputs for each step of the pipeline, in this cause just the final step should suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TY this is one detail I have a habit of forgetting
const getContextType = (context: any) => { | ||
if (context.transcript_id) { | ||
return 'transcript' | ||
} | ||
if (context.gene_id) { | ||
return 'gene' | ||
} | ||
return 'region' | ||
} | ||
|
||
export const getColumnsForContext = (context: any) => { | ||
const contextType = getContextType(context) | ||
export const getColumnsForContext = (context: Context) => { | ||
const columns = structuralVariantTableColumns | ||
.filter( | ||
(column) => | ||
// @ts-expect-error TS(2554) FIXME: Expected 1 arguments, but got 2. | ||
column.shouldShowInContext === undefined || column.shouldShowInContext(context, contextType) | ||
(column) => column.shouldShowInContext === undefined || column.shouldShowInContext(context) | ||
) | ||
.map((column) => ({ | ||
...column, | ||
description: (column as any).descriptionInContext | ||
? (column as any).descriptionInContext(context, contextType) | ||
: (column as any).description, | ||
})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this logic go, was it unnecessary?
Apologies if I missed it, I didn't seem to see anything else in this PR that moved this logic to elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this logic go, was it unnecessary?
It was. This file here has similar functionality to several other modules in the codebase, and was originally done by copypasta from one of those. If you look over the various column definitions in structuralVariantTableColumns
, you'll see none of them have a description
or descriptionInContext
field set--meaning these lines at the end were dead weight. Another good example of why coding by copypasta is problematic.
1. ![](https://placehold.it/15/D43925/000000?text=+) **Predicted loss-of-function (pLoF)**: SV is predicted to delete the gene or truncate the gene product. | ||
2. ![](https://placehold.it/15/7459B2/000000?text=+) **Intragenic exonic duplication (IED)**: SV is predicted to result in duplicated exons within the gene, without extending beyond the boundaries of the open reading frame. (New in gnomAD v3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these icons placeholders so you can immediately see they need review? This comment is pretty much just personal curiosity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I'm not sure of why these icons are here or why it makes sense to use a random web service to render them, but that's how I found it in the original.
@rileyhgrant I believe I've addressed all your feedback, and I've done the "v3" to "v4" switch. Ready for re-review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Most of the key transformations in this new pipeline are taken from the v2 SV pipeline in `gnomad_sv_v2.py`.
The only difference (at least in this first iteration) between v2 and v3 SV queries is the index they use. Here we refactor so that we'll be able to re-use the existing SV queries, largely by folding two separate modules into one.
The hexidecimal suffixes for the new v3 IDs was in lowercase, which looked sloppy.
This includes the dataset ID `gnomad_sv_v4` itself as that appears in URLs.
5dcd503
to
d890f3b
Compare
This adds a data pipeline, dataset metadata, API support, and other miscellaneous things to add gnomAD v3 structural variants to the browser.