Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(dev-branch-pacbio) #3453

Merged
merged 76 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
5018359
first draft
ChrOertlin Jul 22, 2024
b04281b
Merge branch 'master' into dev-pacbio-flow
ChrOertlin Jul 23, 2024
6c69db4
add new classes
ChrOertlin Jul 23, 2024
6137696
flesh out skeleton
ChrOertlin Jul 23, 2024
6c07296
iniital pacbio setup
ChrOertlin Jul 23, 2024
1d2c34e
Merge branch 'master' into dev-pacbio-flow
ChrOertlin Jul 23, 2024
e83b91c
Apply suggestions from code review
ChrOertlin Jul 23, 2024
32517f0
Merge branch 'master' into dev-pacbio-flow
diitaz93 Jul 24, 2024
166a830
add pacbio run data generator (#3458)
ChrOertlin Jul 24, 2024
8162060
Merge branch 'master' into dev-pacbio-flow
ChrOertlin Jul 24, 2024
dbeeeae
add PacBioRunFileManager (#3460)
diitaz93 Jul 24, 2024
53a3cd2
Move the modules to the correct location (#3462)
diitaz93 Jul 25, 2024
d2b5835
refactor pacbio metrics parser (#3467)
diitaz93 Jul 26, 2024
25a53b8
PacBio - implement parsing of failed reads metrics (#3469)
diitaz93 Jul 26, 2024
2b49c33
setup dtos (#3472)
ChrOertlin Jul 26, 2024
ee9ab63
add(hk service to pacbio flow) (#3466)
ChrOertlin Jul 26, 2024
ca4b6b1
add sample dto (#3475)
ChrOertlin Jul 29, 2024
cf5a0e4
Merge branch 'master' into dev-pacbio-flow
ChrOertlin Jul 29, 2024
1a21ecb
small change in dto
ChrOertlin Jul 29, 2024
b9c23ea
add sample id to dto
ChrOertlin Jul 29, 2024
0103b45
fix types of pacbio dto attributes
diitaz93 Jul 29, 2024
597871b
remove unused import
diitaz93 Jul 29, 2024
c0c4ce6
black
diitaz93 Jul 29, 2024
7a5df2b
Add PacBio Data Transfer service (#3477)(patch)
diitaz93 Jul 29, 2024
4b380b9
abstract classes cleanup
ChrOertlin Jul 29, 2024
e842068
fix self
ChrOertlin Jul 29, 2024
8b78253
add(pacbio store service) (#3478)
ChrOertlin Jul 29, 2024
9e7c594
Merge branch 'master' into dev-pacbio-flow
diitaz93 Jul 29, 2024
94efad2
add abstract method decorator and removed unused imports
diitaz93 Jul 29, 2024
705a0be
add PacBio post processing service (#3481)
diitaz93 Jul 29, 2024
8380cf2
remove unused function in HK service
diitaz93 Jul 29, 2024
325b045
Add cli post processing (#3485)
diitaz93 Jul 30, 2024
e4872ee
Remove pbi file from post processing (#3491)
diitaz93 Jul 31, 2024
eec50e6
rename test module
diitaz93 Jul 31, 2024
7a41cf6
add error handlers pacbio flow (#3487)
ChrOertlin Jul 31, 2024
28183de
add dry run functionality to PacBio post-processing (#3483)
diitaz93 Jul 31, 2024
266822e
black
diitaz93 Jul 31, 2024
77ce384
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 7, 2024
9692bfa
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 7, 2024
a0c1775
black
diitaz93 Aug 7, 2024
171eef6
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 7, 2024
dba4415
Set PacBio sequencing times (#3518)
diitaz93 Aug 8, 2024
6940d27
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 8, 2024
4a3ff25
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 8, 2024
6720120
add CLI command to CLI
diitaz93 Aug 9, 2024
b0501e7
improve help docstrings
diitaz93 Aug 9, 2024
2606b19
add logging to post-process base
diitaz93 Aug 9, 2024
079d9c7
Improve error raising with wrong run name
diitaz93 Aug 9, 2024
53167d9
validate existence of run path
diitaz93 Aug 9, 2024
5ed5666
fix bug in PacBio service definition
diitaz93 Aug 9, 2024
fbe13d3
Merge branch 'master' into dev-pacbio-flow
ChrOertlin Aug 12, 2024
b04330e
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 14, 2024
abccba0
fix error handling logging
diitaz93 Aug 14, 2024
e90789b
rename dir (#3562)
ChrOertlin Aug 14, 2024
65c1995
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 14, 2024
487ccb8
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 14, 2024
9d3bd2b
reordered dto, metric and model attributes
diitaz93 Aug 14, 2024
a823c99
add missing parameter to sample DTO
diitaz93 Aug 14, 2024
463b2d4
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 14, 2024
17052d4
fix names in model
diitaz93 Aug 14, 2024
06482a3
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 14, 2024
3f36046
switch pacbio model column names back
diitaz93 Aug 14, 2024
d4a6de1
apply column name change in crud
diitaz93 Aug 14, 2024
e26ef8e
change dry run logging from debug to info
diitaz93 Aug 14, 2024
848eba2
fix sample run metrics table name
diitaz93 Aug 14, 2024
c616758
Improve docsrings
diitaz93 Aug 14, 2024
3fbf44f
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 15, 2024
bc46e44
Make hifi_mean_read_length int
diitaz93 Aug 15, 2024
445c21c
Make failed_mean_read_length int
diitaz93 Aug 15, 2024
ebc3ae3
Make polymerase reads int
diitaz93 Aug 15, 2024
f4eb5d2
Make sample metrics int
diitaz93 Aug 15, 2024
3af0845
Make control metrics int
diitaz93 Aug 15, 2024
f0a4d42
Add cell tag to bam file
diitaz93 Aug 15, 2024
f1ed67b
Add cell tag to bam file and fix tag concatenation (#3571)
diitaz93 Aug 15, 2024
758d862
Merge branch 'master' into dev-pacbio-flow
diitaz93 Aug 15, 2024
063bd25
Merge remote-tracking branch 'origin/dev-pacbio-flow' into dev-pacbio…
diitaz93 Aug 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions cg/services/post_processing/abstract_classes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
"""Post-processing service abstract classes."""

from abc import abstractmethod, ABC
from pathlib import Path

from pydantic import BaseModel

from cg.apps.housekeeper.hk import HousekeeperAPI
from cg.services.post_processing.abstract_models import PostProcessingDTOs, RunMetrics, RunData
from cg.store.store import Store


class RunDataGenerator(ABC):
"""Abstract class for that holds functionality to create a run data model."""

@abstractmethod
def _validate_run_name(self, run_name: str) -> None:
diitaz93 marked this conversation as resolved.
Show resolved Hide resolved
pass

@abstractmethod
def get_run_data(self, run_name: str, sequencing_dir: str) -> RunData:
pass


class RunFileManager(ABC):
"""Abstract class that manages files related to an instrument run."""

@abstractmethod
def get_files_to_parse(self, run_data: RunData) -> list[Path]:
"""Get the files required for the PostProcessingMetricsService."""
pass

@abstractmethod
def get_files_to_store(self, run_data: RunData) -> list[Path]:
"""Get the files to store for the PostProcessingHKService."""
pass


class PostProcessingMetricsParser(ABC):

@abstractmethod
def parse_metrics(self, metrics_paths: list[Path]) -> RunMetrics:
pass


class PostProcessingDataTransferService(ABC):
def __init__(self, metrics_service: PostProcessingMetricsParser):
self.metrics_service = metrics_service

def get_post_processing_dtos(self) -> PostProcessingDTOs:
pass


class PostProcessingStoreService(ABC):
def __init__(self, store: Store, data_transfer_service: PostProcessingDataTransferService):
self.store: Store = store
self.data_transfer_service: PostProcessingDataTransferService = data_transfer_service

def _create_run_device(self, run_name):
pass

def _create_instrument_run(self, run_name):
pass

def _create_sample_run_metrics(self, run_name):
pass

def store_post_processing_data(self, run_name):
pass


class PostProcessingHKService(ABC):
def __init__(self, hk_api: HousekeeperAPI):
self.hk_api = HousekeeperAPI

def store_files_in_housekeeper(self, file_to_store: list[Path]):
pass


class PostProcessingService(ABC):

def __init(
self, store_service: PostProcessingStoreService, hk_service: PostProcessingHKService
):
self.store_service = store_service
self.hk_service = hk_service

@abstractmethod
def post_process(self, run_name):
"""Store sequencing metrics in statusdb and relevant files in housekeeper"""
pass
13 changes: 13 additions & 0 deletions cg/services/post_processing/abstract_models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from pydantic import BaseModel


class RunData(BaseModel):
pass


class RunMetrics(BaseModel):
pass


class PostProcessingDTOs(BaseModel):
pass
5 changes: 5 additions & 0 deletions cg/services/post_processing/exc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from cg.exc import CgError


class PostProcessingRunValidationError(CgError):
pass
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
from pathlib import Path
from cg.services.post_processing.abstract_classes import RunDataGenerator
from cg.services.post_processing.pacbio.run_data_generator.run_data import PacBioRunData
from cg.services.post_processing.validators import (
validate_name_pre_fix,
validate_has_expected_parts,
)
from cg.utils.string import get_element_from_split


class PacBioRunDataGenerator(RunDataGenerator):

def _validate_run_name(self, run_name) -> None:
validate_name_pre_fix(run_name)
validate_has_expected_parts(run_name=run_name, expected_parts=2)

def get_run_data(self, run_name: str, sequencing_dir: str) -> PacBioRunData:
"""
Get the run data for a PacBio SMRT cell run.
run_name should include the PacBio run including plate well, e.g. 'r84202_20240522_133539/1_A01'
"""
self._validate_run_name(run_name)
full_path = Path(sequencing_dir, run_name)

return PacBioRunData(
full_path=full_path,
sequencing_run_name=self._get_sequencing_run_name(run_name),
well_name=self._get_well(run_name),
plate=self._get_plate(run_name),
)

@staticmethod
def _get_sequencing_run_name(run_name: str) -> str:
return get_element_from_split(value=run_name, element_position=0, split="/")

@staticmethod
def _get_plate_well(run_name: str) -> str:
return get_element_from_split(value=run_name, element_position=-1, split="/")

def _get_plate(self, run_name: str) -> str:
plate_well: str = self._get_plate_well(run_name)
return get_element_from_split(value=plate_well, element_position=0, split="_")

def _get_well(self, run_name: str) -> str:
plate_well: str = self._get_plate_well(run_name)
return get_element_from_split(value=plate_well, element_position=-1, split="_")
12 changes: 12 additions & 0 deletions cg/services/post_processing/pacbio/run_data_generator/run_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from pathlib import Path

from cg.services.post_processing.abstract_models import RunData


class PacBioRunData(RunData):
"""Holds information on a single SMRTcell of a PacBio run."""

full_path: Path
sequencing_run_name: str
well_name: str
plate: int
diitaz93 marked this conversation as resolved.
Show resolved Hide resolved
11 changes: 11 additions & 0 deletions cg/services/post_processing/validators.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from cg.services.post_processing.exc import PostProcessingRunValidationError


def validate_name_pre_fix(run_name: str) -> None:
if not run_name.startswith("r"):
raise PostProcessingRunValidationError


def validate_has_expected_parts(run_name: str, expected_parts: int) -> None:
if len(run_name.split("/")) != expected_parts:
raise PostProcessingRunValidationError
10 changes: 10 additions & 0 deletions cg/utils/string.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""Utils related to string manipulation."""

from cg.exc import CgError


def get_element_from_split(value: str, element_position: int, split: str) -> str:
elements: list[str] = value.split(split)
if len(elements) < element_position:
raise CgError(message="Provided element position out of bounds.")
return elements[element_position]
diitaz93 marked this conversation as resolved.
Show resolved Hide resolved
5 changes: 5 additions & 0 deletions tests/fixture_plugins/pacbio_fixtures/name_fixtures.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
from cg.constants.pacbio import PacBioDirsAndFiles


@pytest.fixture
def pac_bio_smrt_cell_name() -> str:
return "1_A01"


@pytest.fixture
def pac_bio_test_run_name() -> str:
"""Return the name of a PacBio SMRT cell."""
Expand Down
4 changes: 2 additions & 2 deletions tests/fixture_plugins/pacbio_fixtures/path_fixtures.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ def pac_bio_test_run_dir(pac_bio_runs_dir: Path, pac_bio_test_run_name: str) ->


@pytest.fixture
def pac_bio_smrt_cell_dir_1_a01(pac_bio_test_run_dir: Path) -> Path:
def pac_bio_smrt_cell_dir_1_a01(pac_bio_test_run_dir: Path, pac_bio_smrt_cell_name: str) -> Path:
"""Return the path to a PacBio SMRT cell directory."""
return Path(pac_bio_test_run_dir, "1_A01")
return Path(pac_bio_test_run_dir, pac_bio_smrt_cell_name)


@pytest.fixture
Expand Down
19 changes: 19 additions & 0 deletions tests/services/post_processing/pacbio/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
"""Fixtures for the PacBio post processing services."""

from pathlib import Path

import pytest

from cg.services.post_processing.pacbio.run_data_generator.run_data import PacBioRunData


@pytest.fixture
def expected_pac_bio_run_data(
pac_bio_test_run_name: str, pac_bio_fixtures_dir: Path, pac_bio_smrt_cell_name: str
) -> PacBioRunData:
return PacBioRunData(
full_path=Path(pac_bio_fixtures_dir, pac_bio_test_run_name, pac_bio_smrt_cell_name),
sequencing_run_name=pac_bio_test_run_name,
well_name="A01",
plate="1",
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
"""Tests for the PacBioRunDataGenerator"""

from pathlib import Path

from cg.services.post_processing.pacbio.run_data_generator.pacbio_run_data_generator import (
PacBioRunDataGenerator,
)
from cg.services.post_processing.pacbio.run_data_generator.run_data import PacBioRunData


def test_get_run_data(
pac_bio_fixtures_dir: Path,
pac_bio_test_run_name: str,
pac_bio_smrt_cell_name: str,
expected_pac_bio_run_data: PacBioRunData,
):
# GIVEN a run directory, a run name and a SMRT cell name
run_name: str = "/".join([pac_bio_test_run_name, pac_bio_smrt_cell_name])

# WHEN Generating run data
run_data_generator = PacBioRunDataGenerator()
run_data: PacBioRunData = run_data_generator.get_run_data(
run_name=run_name, sequencing_dir=pac_bio_fixtures_dir.as_posix()
)

# THEN the correct run data are returned
assert run_data == expected_pac_bio_run_data
30 changes: 30 additions & 0 deletions tests/utils/test_string_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
"""Test for the string utilities,"""

import pytest

from cg.exc import CgError
from cg.utils.string import get_element_from_split


def test_get_element_from_split():

# GIVEN a string with a seperator
separated_string: str = "zero_one_two_three"

# WHEN getting an element divided by a separator based on the position
element: str = get_element_from_split(value=separated_string, element_position=2, split="_")

# THEN the expected element is returned
assert element == "two"


def test_get_element_from_split_error():

# GIVEN a string with a seperator
separated_string: str = "zero_one_two_three"

# WHEN getting an element divided by a separator based on the position that is out of bounds
with pytest.raises(CgError):
get_element_from_split(value=separated_string, element_position=12, split="_")

# THEN an error is raised
Loading