-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(dev-branch-pacbio) #3453
Conversation
# Descriptions Adds the implementation of the RunDataGenerator interface for the Pacbio post processing.
cg/services/post_processing/pacbio/run_data_generator/run_data.py
Outdated
Show resolved
Hide resolved
## Description Closes Clinical-Genomics/add-new-tech#67 Add PacBioRunFileManager, fixture and tests --------- Co-authored-by: Christian Oertlin <[email protected]>
## Description Refactor PacBio metrics parser to compile with the post-processing flow ### Changed - Condensed all individual parsers into one - Refactored tests
## Description Closes Clinical-Genomics/add-new-tech#64 The read metrics we parsed came from a file containing only HiFi data. It is important to parse the failed read metrics too. There is a file containing both HiFi and failed metrics (`m84202_240522_135641_s1.ccs_report.json`). It is, however, generated by **another software** so the values are not exactly the same as the metrics parsed before. @J35P312 assured that the difference was negligible. ### Added - New parameters to parse from ccs file: - [x] <Q20 Reads - [x] <Q20 Yield (bp) - [x] <Q20 Read Length (mean, bp) ### Changed - Renamed `HiFiMetrics` model to `ReadMetrics` - The path to the ccs file ### Fixed - Removed old ccs file usage and fixture
cg/services/post_processing/pacbio/run_file_manager/run_file_manager.py
Outdated
Show resolved
Hide resolved
# Description implement housekeeper service for pacbio
# description add sample dto
Test on stage: Non-existent smrt cell should fail with correct error message$ cg -l DEBUG post-process run r84202_20241119_150802/1_A01
Running cg post-processing.
Instantiating post-processing services
Instantiating PacBio post-processing service
Instantiating status db
Instantiating housekeeper api
Initializing Store
Starting PacBio post-processing for run: r84202_20241119_150802/1_A01
File or directory /home/proj/stage/sequencing_data/pacbio/r84202_20241119_150802/1_A01 does not exist
Traceback (most recent call last):
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/error_handler.py", line 16, in wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/pacbio/run_file_manager/run_file_manager.py", line 21, in get_files_to_parse
validate_files_or_directories_exist([run_path])
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/validators.py", line 25, in validate_files_or_directories_exist
raise FileNotFoundError("Some of the provided paths do not exist")
FileNotFoundError: Some of the provided paths do not exist
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/error_handler.py", line 16, in wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/pacbio/metrics_parser/metrics_parser.py", line 41, in parse_metrics
metrics_files: list[Path] = self.file_manager.get_files_to_parse(run_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/error_handler.py", line 18, in wrapper
raise to_raise(error) from error
cg.services.run_devices.exc.PostProcessingRunFileManagerError: Some of the provided paths do not exist
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/error_handler.py", line 16, in wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/pacbio/data_transfer_service/data_transfer_service.py", line 35, in get_post_processing_dtos
metrics: PacBioMetrics = self.metrics_service.parse_metrics(run_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/error_handler.py", line 18, in wrapper
raise to_raise(error) from error
cg.services.run_devices.exc.PostProcessingParsingError: Some of the provided paths do not exist
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/error_handler.py", line 16, in wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/pacbio/data_storage_service/pacbio_store_service.py", line 55, in store_post_processing_data
dtos: PacBioDTOs = self.data_transfer_service.get_post_processing_dtos(run_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/error_handler.py", line 20, in wrapper
raise CgError(f"{error}") from error
cg.exc.CgError: Some of the provided paths do not exist
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/error_handler.py", line 16, in wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/pacbio/post_processing_service.py", line 53, in post_process
self.store_service.store_post_processing_data(run_data=run_data, dry_run=dry_run)
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/error_handler.py", line 20, in wrapper
raise CgError(f"{error}") from error
cg.exc.CgError: Some of the provided paths do not exist
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/proj/stage/bin/miniconda3/envs/S_cg/bin/cg", line 8, in <module>
sys.exit(base())
^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/click/decorators.py", line 45, in new_func
return f(get_current_context().obj, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/cli/post_process/post_process.py", line 35, in post_process_sequencing_run
post_processing_service.post_process(run_name=run_name, dry_run=dry_run)
File "/home/proj/stage/bin/miniconda3/envs/S_cg/lib/python3.11/site-packages/cg/services/run_devices/error_handler.py", line 20, in wrapper
raise CgError(f"{error}") from error
cg.exc.CgError: Some of the provided paths do not exist |
Test on stage: Dry run$ cg -l DEBUG post-process run r84202_20240319_150802/1_A01 --dry-run
Running cg post-processing.
Instantiating post-processing services
Instantiating PacBio post-processing service
Instantiating status db
Instantiating housekeeper api
Initializing Store
Starting PacBio post-processing for run: r84202_20240319_150802/1_A01
Dry run, no entries will be added to database for SMRT cell /home/proj/stage/sequencing_data/pacbio/r84202_20240319_150802/1_A01.
Dry run: would have added /home/proj/stage/sequencing_data/pacbio/r84202_20240319_150802/1_A01/statistics/unzipped_reports/control.report.json to Housekeeper.
Dry run: would have added /home/proj/stage/sequencing_data/pacbio/r84202_20240319_150802/1_A01/statistics/unzipped_reports/loading.report.json to Housekeeper.
Dry run: would have added /home/proj/stage/sequencing_data/pacbio/r84202_20240319_150802/1_A01/statistics/unzipped_reports/raw_data.report.json to Housekeeper.
Dry run: would have added /home/proj/stage/sequencing_data/pacbio/r84202_20240319_150802/1_A01/statistics/unzipped_reports/smrtlink-datasets.json to Housekeeper.
Dry run: would have added /home/proj/stage/sequencing_data/pacbio/r84202_20240319_150802/1_A01/statistics/m84202_240319_154410_s1.ccs_report.json to Housekeeper.
Dry run: would have added /home/proj/stage/sequencing_data/pacbio/r84202_20240319_150802/1_A01/hifi_reads/m84202_240319_154410_s1.hifi_reads.bam to Housekeeper. |
Tests on stage:
|
* Add cell tag to bam file * patch to fix tag accumulation
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Herculean effort 🦁 Well done ⭐
device: str = get_item_by_pattern_in_source( | ||
source=run_name, pattern_map=PATTERN_TO_DEVICE_MAP | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want to use the same function for mapping files to tags as we do to map directories to post-process classes. I feel like that function might be too general. I don't think having a helper function that only applies PATTERN_TO_DEVICE_MAP
is problematic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can wrap the map plus the function that returns the device.
tests/services/run_devices/pacbio/run_data_generator/test_pacbio_run_data_generator.py
Show resolved
Hide resolved
## Description Address last comments of #3453 regarding PacBio post-processing
Description
This PR introduces a new structure for the run devices post-processing:
Added
cg post-process run <run-name>
which currently works with PacBio SMRT cells but will work for any run in the futureChanged
store_fastq_path_in_housekeeper
in housekeeper modified intocreate_bundle_and_add_file_with_tags
so that it works in a more general way (not only for fastqs)How to prepare for test
us
paxa
How to test
See below
Review
Thanks for filling in who performed the code review and the test!
This version is a
Implementation Plan
INFO [alembic.runtime.migration] Context impl MySQLImpl. INFO [alembic.runtime.migration] Will assume non-transactional DDL. fatal: Not a git repository (or any parent up to mount point /home) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). repository is clean Logging deploy ... Getting deployer... done. Getting last commit message and SHA... done. Getting version of deploy scripts... /home/js.diazboada done. Log deploy... done. cg, version 62.1.0 [js.diazboada@hasta:~] [S_base] $ up