Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Case study report automation #111

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Conversation

awasyn
Copy link
Collaborator

@awasyn awasyn commented Oct 26, 2024

Cases study report automation with MolEvolvR

This commit introduces a series of scripts and supporting code for automating case study reports via the MolEvolvR analysis pipeline. The added scripts enable end-to-end processing of pathogen and/or drug data from CARD data in FASTA format, automating key tasks like sequence retrieval, alignment, and protein annotation. Currently, the pipeline supports only the "full" analysis option,
Specifically, the following capabilities have been implemented:

  • BLAST and InterProScan Integration:

    • BLAST: Utilizes NCBI’s online BLAST API to retrieve sequence alignment results.
    • InterProScan: Supports local execution with dependencies on the command-line version of InterProScan. Future work will include developing an InterProScan API wrapper (iprscanr) for more accessible remote processing, reducing local resource demands.

This approach is based on the MolEvolvR webapp pipeline , with portions of code directly adapted from MolEvolvR scripts. The original authorship has been retained to acknowledge the foundational work and provide proper attribution.

Current dependencies not inlcuded in this commit :

  • lineage_lookup.txt, cln_lookup_tbl.tsv data files
  • blast+ for local sequence alignments
  • InterProScan (local installation)

To-Do:

  • Clean scripts and post-analysis report generation
  • Add a wrapper for the InterProScan API to reduce local resource constraints
  • Extend support for additional analysis options ("da" "dblasts", "phylo")
  • Extend support for other formats ("msa", "accnum", "ipr" )

@jananiravi @the-mayer @falquaddoomi

What kind of change(s) are included?

  • Feature (adds or updates new capabilities)
  • Enhancement (adds functionality).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • I have added comments to my code to help provide understanding.
  • I have added a test which covers the code changes found within this PR.
  • I have deleted all non-relevant text in this pull request template.
  • Reviewer assignment: Tag a relevant team member to review and approve the changes.

@awasyn awasyn changed the title WIP: Case study report Automation WIP: Case study report automation Oct 26, 2024
@awasyn awasyn self-assigned this Oct 27, 2024
@falquaddoomi falquaddoomi self-requested a review October 29, 2024 15:43
Signed-off-by: Awa Synthia <[email protected]>
@jananiravi jananiravi added enhancement New feature or request outreachy for outreachy interns package R package dev api Python, Plumber, R bioinfo Bioinformatics related coding Coding experience (of any sort) would be helpful labels Nov 18, 2024
@jananiravi jananiravi added this to the v0 | short-term fixes milestone Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Python, Plumber, R bioinfo Bioinformatics related coding Coding experience (of any sort) would be helpful enhancement New feature or request outreachy for outreachy interns package R package dev
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants