Skip to content

Commit

Permalink
Add more documentations (#71)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhu-han authored Aug 19, 2024
1 parent 7c452ed commit 973009a
Show file tree
Hide file tree
Showing 7 changed files with 102 additions and 44 deletions.
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# Introduction

The text_search project can be used to create ASR (automatic speech recognition) dataset with long-form audios and even longer texts.

The core of text_search is a general audio alignment pipeline, which aims to align the audio files to the corresponding text and split them into short segments, while also excluding segments of audio that do not correspond exactly with the aligned text.


# Installation

## With pip
Expand Down Expand Up @@ -36,3 +43,23 @@ python3 -c "import textsearch; print(textsearch.__file__)"
We only set the environment variable `PYTHONPATH`.



# Recipes

- [libriheavy](examples/libriheavy)
- [subtitle](examples/subtitle)


# References
More explainations are available in the following paper:

```
@misc{kang2023libriheavy,
title={Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context},
author={Wei Kang and Xiaoyu Yang and Zengwei Yao and Fangjun Kuang and Yifan Yang and Liyong Guo and Long Lin and Daniel Povey},
year={2023},
eprint={2309.08105},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
```
59 changes: 59 additions & 0 deletions docs/source/getting-started/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
Getting started
===============

About
-----

The text_search project can be used to create ASR (automatic speech recognition) dataset with long-form audios and even longer texts.

The core of text_search is a general audio alignment pipeline, which aims to align the audio files to the corresponding text and split them into short segments, while also excluding segments of audio that do not correspond exactly with the aligned text.

Installation
------------

With pip
********

.. code-block:: bash
pip install fasttextsearch
For developers
**************

Please use the following commands to install `fasttextsearch`_:

.. code-block:: bash
pip install numpy
git clone https://github.com/k2-fsa/text_search
cd text_search
mkdir build
cd build
cmake ..
make -j
make test
# set PYTHONPATH so that you can use "import textsearch"
export PYTHONPATH=$PWD/../textsearch/python:$PWD/lib:$PYTHONPATH
To test the you have installed `fasttextsearch`_ successfully, please run:

.. code-block:: bash
python3 -c "import textsearch; print(textsearch.__file__)"
It should print something like below:

.. code-block:: bash
/Users/fangjun/open-source/text_search/textsearch/python/textsearch/__init__.py
.. hint::
We did not use either `python3 setup.py install` or `pip install`.
We only set the environment variable `PYTHONPATH`.

2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@ Welcome to fasttextsearch's documentation!
:maxdepth: 2
:caption: Contents:

./install/index.rst
./getting-started/index.rst
./tutorials/index.rst
./python-api/index.rst
35 changes: 0 additions & 35 deletions docs/source/install/developers.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/source/install/index.rst

This file was deleted.

14 changes: 13 additions & 1 deletion docs/source/python-api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Python API

This section lists Python APIs in `fasttextsearch`_.

.. currentmodule:: textsearch
.. currentmodule:: textsearch.python.textsearch


create_suffix_array
Expand All @@ -25,3 +25,15 @@ get_nice_alignments
-------------------

.. autofunction:: get_nice_alignments

align_queries
-------------------
.. autofunction:: align_queries

get_longest_increasing_pairs
-------------------
.. autofunction:: get_longest_increasing_pairs

split_aligned_queries
-------------------
.. autofunction:: split_aligned_queries
2 changes: 2 additions & 0 deletions docs/source/tutorials/index.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
Tutorials
============

This section provides tutorials for core concepts of text_search as follows.

.. toctree::
:maxdepth: 2

Expand Down

0 comments on commit 973009a

Please sign in to comment.