Skip to content

Commit

Permalink
release v0.10
Browse files Browse the repository at this point in the history
  • Loading branch information
pkufool committed Jan 10, 2024
1 parent 104f29c commit 005968b
Show file tree
Hide file tree
Showing 4 changed files with 28 additions and 7 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
cmake_minimum_required(VERSION 3.12 FATAL_ERROR)
project(textsearch)

set(TS_VERSION "0.9")
set(TS_VERSION "0.10")

set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib")
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib")
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "fasttextsearch"
version = "0.9"
version = "0.10"
authors = [
{ name="Next-gen Kaldi development team", email="[email protected]" },
]
Expand Down
8 changes: 8 additions & 0 deletions textsearch/python/textsearch/match.py
Original file line number Diff line number Diff line change
Expand Up @@ -994,6 +994,10 @@ def _split_into_segments(
preceding or succeeding silence length greater than this value, we will
add it as a possible breaking point.
Caution: Only be used when there are no punctuations in target_source.
overlap_ratio:
The ratio of overlapping part to the query or existing segments. If the
ratio is greater than `overlap_ratio` we will drop the query or existing
segment.
min_duration:
The minimum duration (in second) allowed for a segment.
max_duration:
Expand Down Expand Up @@ -1276,6 +1280,10 @@ def split_aligned_queries(
preceding or succeeding silence length greater than this value, we will
add it as a possible breaking point.
Caution: Only be used when there are no punctuations in target_source.
overlap_ratio:
The ratio of overlapping part to the query or existing segments. If the
ratio is greater than `overlap_ratio` we will drop the query or existing
segment.
min_duration:
The minimum duration (in second) allowed for a segment.
max_duration:
Expand Down
23 changes: 18 additions & 5 deletions textsearch/python/textsearch/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,26 +113,39 @@ def is_overlap(
query: Tuple[float, float],
segment_index: int,
overlap_ratio: float = 0.25,
) -> Tuple[bool, int]:
) -> Tuple[bool, Union[int, None]]:
"""
Return if the given range overlaps with the existing ranges.
Return True if the given range overlaps with the existing ranges.
Caution:
`ranges` will be modified in this function (when returning False)
`ranges` and `indexes` will be modified in this function.
Note: overlapping here means the length of overlapping area is greater than
some threshold (currently, the threshold is `overlap_ratio` multiply the length
of the shorter overlapping ranges).
of the query or existing ranges).
Args:
ranges:
The existing ranges, it is sorted in ascending order on input, and we will
keep it sorted in this function.
indexes:
The index (into the selected segments) of each range belongs to.
query:
The given range.
segment_index:
The index (into the selected segments) of query to be inserted.
overlap_ratio:
The ratio of overlapping part to the query or existing segments. If the
ratio is greater than `overlap_ratio` we will drop the query or existing
segment.
Return:
Return True if having overlap otherwise False.
Return (False, None) if no overlapping between query and existing ranges.
Return (True, None) if the ratio of overlapping part to query is greater
than `overlap_ratio`.
Return (True, dindex) if the ratio of overlapping part to existing range
is greater than `overlap_ratio`, `dindex` is the index (can get from indexes)
of the existing range.
"""
index = bisect_left(ranges, query)
if not ranges:
Expand Down

0 comments on commit 005968b

Please sign in to comment.