Skip to content

Commit

Permalink
alternative to BwaMemOptions (#3)
Browse files Browse the repository at this point in the history
* alternative to BwaMemOptions
  • Loading branch information
nh13 authored and emmcauley committed Jan 14, 2025
1 parent edf5766 commit ea0c197
Show file tree
Hide file tree
Showing 8 changed files with 658 additions and 510 deletions.
26 changes: 14 additions & 12 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ or
from pybwa import BwaMem
mem = BwaMem(prefix="/path/to/genome.fasta")
The :class:`~pybwa.BwaIndex` object is useful when re-using the same index, such that it only needs to be loaded into
memory once. Both constructors for the :class:`~pybwa.BwaAln` and :class:`~pybwa.BwaMem` objects accept an index.

The :meth:`pybwa.BwaAln.align` method accepts a list of reads (as either strings or :class:`pysam.FastxRecord` s) to
align and return a *single* :class:`pysam.AlignedSegment` per input read:

Expand Down Expand Up @@ -66,20 +69,22 @@ It is constructed directly and options set on the object:
recs = aln.align(queries=["GATTACA"], opt=opt)
The :meth:`pybwa.BwaMem.align` method accepts custom options provided as a :class:`~pybwa.BwaMemOptions` object.
It is constructed via the :class:`~pybwa.BwaMemOptionsBuilder` class, to support scaling gap open and extend penalties
when a using custom match score, or the specification of presets (via `mode`).
Similarly, the :meth:`pybwa.BwaMem.align` method accepts custom options provided as a :class:`~pybwa.BwaMemOptions` object.
It is constructed directly and options set on the object:

.. code-block:: python
builder = BwaMemOptionsBuilder()
builder.min_seed_len = 32
opt: BwaMemOptions = builder.build()
opt = BwaMemOptions()
opt.min_seed_len = 32
recs = aln.align(queries=["GATTACA"], opt=opt)
The :class:`~pybwa.BwaIndex` object is useful when re-using the same index, such that it only needs to be loaded into memory
once.
Both constructors for the :class:`~pybwa.BwaAln` and :class:`~pybwa.BwaMem` objects accept an index.
Note: the :meth:`~pybwa.BwaMemOptions.finalize` method will both apply the presets as specified by the
:meth:`~pybwa.BwaMemOptions.mode` option, as well as scale various other options (:code:`-TdBOELU`) based on the
:attr:`~pybwa.BwaMemOptions.match_score`. The presets and scaling will only be applied to other options that have not
been modified from their defaults. After calling the :meth:`~pybwa.BwaMemOptions.finalize` method, the options are
immutable, unless :code:`copy=True` is passed to :meth:`~pybwa.BwaMemOptions.finalize` method, in which case a copy
of the options are returned by the method. Regardless, the :meth:`~pybwa.BwaMemOptions.finalize` method *does not*
need to be called before the :meth:`pybwa.BwaMem.align` is invoked, as the latter will do so (making a local copy).

API versus Command-line Differences
===================================
Expand Down Expand Up @@ -109,9 +114,6 @@ Bwa Aln
Bwa Mem
=======

.. autoclass:: pybwa.BwaMemOptionsBuilder
:members:

.. autoclass:: pybwa.BwaMemOptions
:members:

Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ Documentation Contents
.. toctree::
:maxdepth: 2

install.rst
api.rst

43 changes: 43 additions & 0 deletions docs/install.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
============
Installation
============

1. Install the environment manager :code:`mamba`
2. Install the Python build tool :code:`poetry`

3. Create an environment with Python, Cython, and Pysam:

.. code-block:: bash
mamba env create -f pybwa.yml
4. Activate the environment:

.. code-block:: bash
mamba activate pybwa
5. Clone the :code:`bwa` repo

.. code-block:: bash
git clone https://github.com/lh3/bwa
6. Configure poetry to install into pre-existing virtual environments:

.. code-block:: bash
poetry config virtualenvs.create false
7. Install :code:`pybwa` into the virtual environment:

.. code-block:: bash
poetry install
8. Check your build:

.. code-block:: bash
poetry run pytest
13 changes: 11 additions & 2 deletions pybwa/libbwaaln.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ cdef class BwaAlnOptions:
free(self._delegate)

cdef gap_opt_t* gap_opt(self):
"""Returns the options struct to use with the bwa C library methods"""
return self._delegate

property max_mismatches:
Expand Down Expand Up @@ -138,6 +139,14 @@ cdef class BwaAln:
cdef BwaIndex _index

def __init__(self, prefix: str | Path | None = None, index: BwaIndex | None = None):
"""Constructs the :code:`bwa aln` aligner.
One of `prefix` or `index` must be specified.
Args:
prefix: the path prefix for the BWA index (typically a FASTA)
index: the index to use
"""
if prefix is not None:
assert Path(prefix).exists()
self._index = BwaIndex(prefix=prefix)
Expand All @@ -148,7 +157,6 @@ cdef class BwaAln:

bwase_initialize()

# TODO: a list of records...
def align(self, queries: List[FastxRecord] | List[str], opt: BwaAlnOptions | None = None) -> List[AlignedSegment]:
"""Align one or more queries with `bwa aln`.

Expand All @@ -157,7 +165,8 @@ cdef class BwaAln:
opt: the alignment options, or None to use the default options

Returns:
one alignment per query
one alignment (:class:`~pysam.AlignedSegment`) per query
:code:`List[List[AlignedSegment]]`.
"""
if len(queries) == 0:
return []
Expand Down
2 changes: 1 addition & 1 deletion pybwa/libbwaindex.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ cdef class BwaIndex:
with :code:`samtools dict <fasta>`).
Args:
prefix (str | Path): the path prefix for teh BWA index
prefix (str | Path): the path prefix for the BWA index (typically a FASTA)
bwt (bool): load the BWT (FM-index)
bns (bool): load the BNS (reference sequence metadata)
pac (bool): load the PAC (the actual 2-bit encoded reference sequences with 'N' converted to a
Expand Down
164 changes: 129 additions & 35 deletions pybwa/libbwamem.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -18,42 +18,136 @@ class BwaMemMode(enum.Enum):

class BwaMemOptions:
def __init__(self, finalize: bool = False) -> None: ...
_finalized: bool
_ignore_alt: bool
min_seed_len: int
mode: BwaMemMode
band_width: int
match_score: int
mismatch_penalty: int
minimum_score: int
unpaired_penalty: int
n_threads: int
skip_pairing: bool
output_all_for_fragments: bool
interleaved_paired_end: bool
short_split_as_secondary: bool
skip_mate_rescue: bool
soft_clip_supplementary: bool
with_xr_tag: bool
query_coord_as_primary: bool
keep_mapq_for_supplementary: bool
with_xb_tag: bool
max_occurrences: int
off_diagonal_x_dropoff: float
ignore_alternate_contigs: bool
internal_seed_split_factor: float
drop_chain_fraction: float
max_mate_rescue_rounds: int
min_seeded_bases_in_chain: int
seed_occurrence_in_3rd_round: int
xa_max_hits: int | tuple[int, int]
xa_drop_ratio: float
gap_open_penalty: int | tuple[int, int]
gap_extension_penalty: int | tuple[int, int]
clipping_penalty: int | tuple[int, int]

class BwaMemOptionsBuilder(BwaMemOptions):
def __init__(self, options: BwaMemOptions | None = None) -> None: ...
def build(self) -> BwaMemOptions: ...
_mode: BwaMemMode | None
@property
def finalized(self) -> bool: ...
@property
def min_seed_len(self) -> int: ...
@min_seed_len.setter
def min_seed_len(self, value: int) -> None: ...
@property
def mode(self) -> BwaMemMode: ...
@mode.setter
def mode(self, value: BwaMemMode) -> None: ...
@property
def band_width(self) -> int: ...
@band_width.setter
def band_width(self, value: int) -> None: ...
@property
def match_score(self) -> int: ...
@match_score.setter
def match_score(self, value: int) -> None: ...
@property
def mismatch_penalty(self) -> int: ...
@mismatch_penalty.setter
def mismatch_penalty(self, value: int) -> None: ...
@property
def minimum_score(self) -> int: ...
@minimum_score.setter
def minimum_score(self, value: int) -> None: ...
@property
def unpaired_penalty(self) -> int: ...
@unpaired_penalty.setter
def unpaired_penalty(self, value: int) -> None: ...
@property
def n_threads(self) -> int: ...
@n_threads.setter
def n_threads(self, value: int) -> None: ...
@property
def skip_pairing(self) -> bool: ...
@skip_pairing.setter
def skip_pairing(self, value: bool) -> None: ...
@property
def output_all_for_fragments(self) -> bool: ...
@output_all_for_fragments.setter
def output_all_for_fragments(self, value: bool) -> None: ...
@property
def interleaved_paired_end(self) -> bool: ...
@interleaved_paired_end.setter
def interleaved_paired_end(self, value: bool) -> None: ...
@property
def short_split_as_secondary(self) -> bool: ...
@short_split_as_secondary.setter
def short_split_as_secondary(self, value: bool) -> None: ...
@property
def skip_mate_rescue(self) -> bool: ...
@skip_mate_rescue.setter
def skip_mate_rescue(self, value: bool) -> None: ...
@property
def soft_clip_supplementary(self) -> bool: ...
@soft_clip_supplementary.setter
def soft_clip_supplementary(self, value: bool) -> None: ...
@property
def with_xr_tag(self) -> bool: ...
@with_xr_tag.setter
def with_xr_tag(self, value: bool) -> None: ...
@property
def query_coord_as_primary(self) -> bool: ...
@query_coord_as_primary.setter
def query_coord_as_primary(self, value: bool) -> None: ...
@property
def keep_mapq_for_supplementary(self) -> bool: ...
@keep_mapq_for_supplementary.setter
def keep_mapq_for_supplementary(self, value: bool) -> None: ...
@property
def with_xb_tag(self) -> bool: ...
@with_xb_tag.setter
def with_xb_tag(self, value: bool) -> None: ...
@property
def max_occurrences(self) -> int: ...
@max_occurrences.setter
def max_occurrences(self, value: int) -> None: ...
@property
def off_diagonal_x_dropoff(self) -> float: ...
@off_diagonal_x_dropoff.setter
def off_diagonal_x_dropoff(self, value: float) -> None: ...
@property
def ignore_alternate_contigs(self) -> bool: ...
@ignore_alternate_contigs.setter
def ignore_alternate_contigs(self, value: bool) -> None: ...
@property
def internal_seed_split_factor(self) -> float: ...
@internal_seed_split_factor.setter
def internal_seed_split_factor(self, value: float) -> None: ...
@property
def drop_chain_fraction(self) -> float: ...
@drop_chain_fraction.setter
def drop_chain_fraction(self, value: float) -> None: ...
@property
def max_mate_rescue_rounds(self) -> int: ...
@max_mate_rescue_rounds.setter
def max_mate_rescue_rounds(self, value: int) -> None: ...
@property
def min_seeded_bases_in_chain(self) -> int: ...
@min_seeded_bases_in_chain.setter
def min_seeded_bases_in_chain(self, value: int) -> None: ...
@property
def seed_occurrence_in_3rd_round(self) -> int: ...
@seed_occurrence_in_3rd_round.setter
def seed_occurrence_in_3rd_round(self, value: int) -> None: ...
@property
def xa_max_hits(self) -> int | tuple[int, int]: ...
@xa_max_hits.setter
def xa_max_hits(self, value: int | tuple[int, int]) -> None: ...
@property
def xa_drop_ratio(self) -> float: ...
@xa_drop_ratio.setter
def xa_drop_ratio(self, value: float) -> None: ...
@property
def gap_open_penalty(self) -> int | tuple[int, int]: ...
@gap_open_penalty.setter
def gap_open_penalty(self, value: int | tuple[int, int]) -> None: ...
@property
def gap_extension_penalty(self) -> int | tuple[int, int]: ...
@gap_extension_penalty.setter
def gap_extension_penalty(self, value: int | tuple[int, int]) -> None: ...
@property
def clipping_penalty(self) -> int | tuple[int, int]: ...
@clipping_penalty.setter
def clipping_penalty(self, value: int | tuple[int, int]) -> None: ...
def finalize(self, copy: bool = False) -> BwaMemOptions: ...

class BwaMem:
_index: BwaIndex
Expand Down
Loading

0 comments on commit ea0c197

Please sign in to comment.