Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor MetaproteomicsAnalysis slots and implement migrator #2204

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

kheal
Copy link
Contributor

@kheal kheal commented Oct 8, 2024

This will address #2157.

Full discussion is here for interested parties: #2152.
Overall, this PR will significantly minimize the proteomics data stored as metadata by refactoring the MetaproteomicsAnalysis class's contents.

This PR

  1. Removes (deprecates) the following classes and slots:
    PeptideQuantification, all_proteins, best_protein, min_q_value, peptide_sequence, peptide_sum_masic_abundance, peptide_sequence_count.
  2. RenamesProteinQuantification to ProteinIdentification to better capture content.
  3. Add one slot to the MetaproteomicsAnalysis class - has_protein_identifications. This slot's range will be ProteinIdentification and that class will be populated from the best_protein and peptide_sequence_count slots from the now deprecated PeptideQuantification class.
  4. Adds valid and invalid examples for the MetaproteomicsAnalysis class (previously missing), based off of real data currently in Mongo.
  5. Constrains the structured pattern on the new razor_protein slot (to address Review id_locus pattern for best_protein and all_protein slots #2028)
  6. Includes a migrator for migrating current data to match the desired structure implemented in this PR.

This PR was moved from microbiomedata#236.

Reviewers

To review: mslarae13, eecavanna, SamuelPurvine
To keep informed: naglepuff, aclum, mbthornton-lbl, corilo, sujaypatil96, brynnz22

PR Information

What type of PR is this? (check all applicable)

  • Refactor
  • Documentation
  • Schema change: Structure and content
    • deleted a class and slots
    • added a slot to a class

Description

See above

Related Issues / Discussions

Did you add/update any tests?

  • Yes
    Added valid and invalid examples as well a doctest for migrator

Could this schema change make it so any valid data becomes invalid?

  • Yes (A migrator is required)

If you answered "Yes", does this PR branch include that migrator?

  • Yes

Does this PR have any downstream implications?

  • Yes
    This work will need to be followed by an update to the aggregator script, an update to the structure of the aggregation tables in the schema and associated migrator, and an update to the DataPortal's handling of the KEGG terms from the aggregations.

Copy link
Collaborator

@eecavanna eecavanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrator looks great! Thanks for including a variety of doctests.

nmdc_schema/migrators/migrator_from_X_to_PR236.py Outdated Show resolved Hide resolved
nmdc_schema/migrators/migrator_from_X_to_PR236.py Outdated Show resolved Hide resolved
nmdc_schema/migrators/migrator_from_X_to_PR236.py Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Oct 9, 2024

PR Preview Action v1.4.8
🚀 Deployed preview to https://microbiomedata.github.io/nmdc-schema/pr-preview/pr-2204/
on branch gh-pages at 2024-10-22 17:39 UTC

@kheal kheal changed the title [POST-berkeley-MERGER] Refactor MetaproteomicsAnalysis slots and implement migrator Refactor MetaproteomicsAnalysis slots and implement migrator Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants