Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Codebase Improvements - Modularization and Maintainability
While refactoring `compute_summary_stats()`, we found it would be helpful if the parent function tracked a list of output strings, so each helper function can return a single string (or `str | None`) feat: add PixyArgs dataclass to hold pixy args (ksamuk#40) This PR adds a dataclass, `PixyArgs` to store arguments given to `pixy` from the user at the command line. Addresses the first checkbox on ksamuk#36. Highlights: * Any arguments that were listed as `optional` in `__main__.py.main()` are represented by a `Union[T, None]`. * There are 2 string-based arguments here that I used `Enum` for: the `Stats` (one or more of: `pi`, `dxy`, and `fst`) and `FSTEstimator` (one of either `wc` or `hudson`). * The `--bypass-invariant-check` that expects a "yes" or "no" value was converted into a `bool`. No additional tests are added in this PR -- this dataclass is only being added here, not used. The next PR will start replacing current functionality and thus, tests will be included in that PR. refactor: Extract `precompute_filtered_variant_array` (ksamuk#43) This is the first of several planned PRs to refactor the monolithic `compute_summary_stats()` function. It extracts a helper function to precompute the filtered genotype and position arrays. refactor: extract compute_summary_pi (ksamuk#50) This PR extracts the computation of the summary `pi` statistics. It also introduces a new dataclass, `PiResult`, to capture the results of the `calc_pi` function. To reduce the overhead associated with refactoring, I propose using `Union[T, Literal["NA"]]` in most situations where we'd use `Union[T, None]` refactor: extract compute_summary_dxy (ksamuk#51) Companion to ksamuk#50 , this PR extracts `compute_summary_dxy()` feat: `PixyTempResult` (ksamuk#49) Closes ksamuk#42. This PR adds `PixyTempResult`, a dataclass that stores output from `pixy` and helps write out results to a tab-delimited file. One test for the `__str__()` method is added. feat: use `PixyTempResult` (ksamuk#52) Uses the `PixyTempResult` object introduced in ksamuk#49. **Only used with reworked `pi` and `dxy`-based functions (not `fst`, which is pending additional updates). We can hold this PR in draft form until we finalize the fst functions. refactor: extract `validate_populations_path()` (ksamuk#59) This PR extracts out the checks related to a user-specific populations_path. I wrote the docstring to reflect what the function does now with the intention of future refactoring (i.e., I don't want to raise a base Exception or use `print` forever). No underlying code is changed as a result of this PR -- it's just code movement. Existing tests cover these changes, albeit indirectly (another item that will be fixed in a future refactoring). refactor: extract `validate_bed_path()` (ksamuk#54) This PR moves BED file-related validation and code out of `check_and_validate_args` in `core.py` to a new function, `validate_bed_path` in `args_validation.py`. No underlying code is changed, only moved. A future PR will add unit-tests, additional error handling, and other code changes. refactor: extract `validate_sites_path()` (ksamuk#56) This PR moves sites file-related validation and code out of `check_and_validate_args()` in `core.py` to a new function, `validate_sites_path` in args_validation.py. No underlying code is changed, only moved. A future PR will add unit-tests, additional error handling, and other code changes. There is technically testing coverage here but it's a little indirect -- in `test_main.py` we have a test, `test_malformed_sites_file`, that fails the assertions that were previously in `check_and_validate_args()`. In a future PR we could refactor `run_pixy_helper` to instead be `validate_sites_path`, happy to make that change now or in the future. refactor: extract `validate_vcf_path()` (ksamuk#58) This PR adds `validate_vcf_path()` to `args_validation.py` and moves code from `core.py` into that function. No underlying code is changed, this is just code movement. refactor: extract out `validate_output_path()` (ksamuk#60) This PR extracts functionality related to the output_path (output_folder, output_prefix, and temp_file) . As with the other similar PRs, I wrote the docstring to reflect what the function does now with the intention of future refactoring. No underlying code is changed as a result of this PR -- it's just code movement. Existing tests cover these changes, albeit indirectly (another item that will be fixed in a future refactoring). refactor: extract window/interval validation (ksamuk#64) Refactored during sync refactor: Extract `compute_summary_fst()` (ksamuk#55) Companion to ksamuk#50 and ksamuk#51 refactor: move and clean up `check_and_validate_args()` (ksamuk#69) This PR moves `check_and_validate_args()` out of `core.py` and into `args_validation.py`. Next, it updates `check_and_validate_args()` to return an instance of `PixyArgs`. `PixyArgs` is then used throughout `__main__.py` instead of the large tuple that was previously returned. Additionally, this PR adds some error handling on the `PixyArgs` class. While refactoring that, I updated the tests to make sure that we were catching bad values passed to `run_pixy_helper`. I added one more unit-test about multiple chromosomes. @msto this PR grew bigger than I planned for, so please let me know if you would prefer multiple, smaller PRs. --------- Co-authored-by: Matt Stone <[email protected]> Co-authored-by: Erin McAuley <[email protected]>
- Loading branch information