Skip to content

Commit

Permalink
Doc improvements, fix non-16-bit AIFF loading on Python 2, improve Po…
Browse files Browse the repository at this point in the history
…cketSphinx language install procedures
  • Loading branch information
Uberi committed Apr 4, 2016
1 parent 14b3b5d commit e4acf97
Show file tree
Hide file tree
Showing 3 changed files with 62 additions and 17 deletions.
36 changes: 27 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ See the ``examples/`` directory for usage examples:
Installing
----------

First, make sure you have all the requirements listed in the "Requirements" section.
First, make sure you have all the requirements listed in the "Requirements" section.

The easiest way to install this is using ``pip install SpeechRecognition``.

Expand All @@ -75,13 +75,20 @@ In the folder, run ``python setup.py install``.
Requirements
------------

In summary, this library requires:
To use all of the functionality of the library, you should have:

* **Python** 2.6, 2.7, or 3.3+
* **PyAudio** 0.2.9+ (required only if you need to use microphone input)
* **PocketSphinx** (required only if you need to use the Sphinx recognizer)
* **Python** 2.6, 2.7, or 3.3+ (required)
* **PyAudio** 0.2.9+ (required only if you need to use microphone input, ``Microphone``)
* **PocketSphinx** (required only if you need to use the Sphinx recognizer, ``recognizer_instance.recognize_sphinx``)
* **FLAC encoder** (required only if the system is not x86-based Windows/Linux/OS X)

The following requirements are optional, but can improve or extend functionality in some situations:

* On Python 2, and only on Python 2, some functions (like ``recognizer_instance.recognize_bing``) will run slower if you do not have **Monotonic for Python 2** installed.
* If using CMU Sphinx, you may want to `install additional language packs <https://github.com/Uberi/speech_recognition/blob/master/reference/pocketsphinx.rst#installing-other-languages>`__ to support languages like International French or Mandarin Chinese.

The following sections go over the details of each requirement.

Python
~~~~~~

Expand All @@ -90,7 +97,7 @@ The first software requirement is `Python 2.6, 2.7, or Python 3.3+ <https://www.
PyAudio (for microphone users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you want to use audio input from microphones, `PyAudio <http://people.csail.mit.edu/hubert/pyaudio/#downloads>`__ is also necessary. Version 0.2.9+ is required in order to avoid overflow issues with recording on certain machines.
`PyAudio <http://people.csail.mit.edu/hubert/pyaudio/#downloads>`__ is required if and only if you want to use microphone input (``Microphone``). PyAudio version 0.2.9+ is required, as earlier versions have overflow issues with recording on certain machines.

If not installed, everything in the library will still work, except attempting to instantiate a ``Microphone`` object will throw an ``AttributeError``.

Expand All @@ -107,7 +114,7 @@ PyAudio `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Pytho
PocketSphinx-Python (for Sphinx users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`PocketSphinx-Python <https://github.com/bambocher/pocketsphinx-python>`__ is required if and only if you want to use the Sphinx recognizer (``recognizer_instance.recognize_sphinx``).
`PocketSphinx-Python <https://github.com/bambocher/pocketsphinx-python>`__ is **required if and only if you want to use the Sphinx recognizer** (``recognizer_instance.recognize_sphinx``).

PocketSphinx-Python `wheel packages <https://pypi.python.org/pypi/wheel>`__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows are included for convenience, under the ``third-party/`` directory. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder.

Expand All @@ -120,7 +127,7 @@ See `Notes on using PocketSphinx <https://github.com/Uberi/speech_recognition/bl
FLAC (for some systems)
~~~~~~~~~~~~~~~~~~~~~~~

A `FLAC encoder <https://xiph.org/flac/>`__ is required to encode the audio data to send to the API. If using Windows (x86 or x86-64), OS X (Intel Macs only, OS X 10.6 or higher), or Linux (x86 or x86-64), the encoder is already bundled with this library - you do not need to install anything else.
A `FLAC encoder <https://xiph.org/flac/>`__ is required to encode the audio data to send to the API. If using Windows (x86 or x86-64), OS X (Intel Macs only, OS X 10.6 or higher), or Linux (x86 or x86-64), this is **already bundled with this library - you do not need to install anything**.

Otherwise, ensure that you have the ``flac`` command line tool, which is often available through the system package manager.

Expand All @@ -141,10 +148,21 @@ The included ``flac-linux-x86`` executable is built from the `FLAC 1.3.1 source
make
exit # return to the original shell
The resulting executable can then be found at ``flac-1.3.1/src/flac`` in the build directory. A copy of the source code can also be found at ``third-party/flac-1.3.1.tar.xz``.
The resulting executable can then be found at ``./flac-1.3.1/src/flac`` relative to the working directory. A copy of the source code can also be found at ``third-party/flac-1.3.1.tar.xz``.

The included ``flac-mac`` executable is extracted from `xACT 2.37 <http://xact.scottcbrown.org/>`__, which is a frontend for FLAC that conveniently includes binaries for all of its encoders. Specifically, it is a copy of ``xACT 2.37/xACT.app/Contents/Resources/flac`` in ``xACT2.37.zip``.

Monotonic for Python 2 (for faster operations in some functions on Python 2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Python 2, and only on Python 2, if you do not install the `Monotonic for Python 2 <https://github.com/atdt/monotonic>`__ library, some functions will run slower than they otherwise could (though everything will still work correctly).

On Python 3, that library's functionality is built into the Python standard library, which makes it unnecessary.

This is because monotonic time is necessary to handle cache expiry properly in the face of system time changes and other time-related issues. If monotonic time functionality is not available, then things like access token requests will not be cached.

To install, use `Pip <https://pip.readthedocs.org/>`__: execute ``pip install monotonic`` in a terminal.

Troubleshooting
---------------

Expand Down
14 changes: 14 additions & 0 deletions reference/pocketsphinx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,20 @@ By default, SpeechRecognition's Sphinx functionality supports only US English. A

To install a language pack, download the ZIP archives and extract them directly into the module install directory (you can find the module install directory by running ``python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))"``).

Here is a simple Bash script to install all of them:

.. code:: bash
#!/usr/bin/env bash
SR_LIB=$(python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))")
sudo apt-get install --yes wget unzip
sudo wget https://db.tt/tVNcZXao -O "$SR_LIB/fr-FR.zip"
sudo unzip -o "$SR_LIB/fr-FR.zip" -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/fr-FR/"
sudo wget https://db.tt/2YQVXmEk -O "$SR_LIB/zh-CN.zip"
sudo unzip -o "$SR_LIB/zh-CN.zip" -d "$SR_LIB"
sudo chmod --recursive a+r "$SR_LIB/zh-CN/"
Once installed, you can simply specify the language using the ``language`` parameter of ``recognizer_instance.recognize_sphinx``. For example, French would be specified with ``"fr-FR"`` and Mandarin with ``"zh-CN"``.

Building PocketSphinx-Python from source
Expand Down
29 changes: 21 additions & 8 deletions speech_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"""Library for performing speech recognition, with support for several engines and APIs, online and offline."""

__author__ = "Anthony Zhang (Uberi)"
__version__ = "3.4.1"
__version__ = "3.4.2"
__license__ = "BSD"

import io, os, subprocess, wave, aifc, base64
Expand Down Expand Up @@ -184,7 +184,12 @@ def __enter__(self):

# run the FLAC converter with the FLAC data to get the AIFF data
flac_converter = get_flac_converter()
process = subprocess.Popen([flac_converter, "--stdout", "--totally-silent", "--decode", "--force-aiff-format", "-"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
process = subprocess.Popen([
flac_converter,
"--stdout", "--totally-silent", # put the resulting AIFF file in stdout, and make sure it's not mixed with any program output
"--decode", "--force-aiff-format", # decode the FLAC file into an AIFF file
"-", # the input FLAC file contents will be given in stdin
], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
aiff_data, stderr = process.communicate(flac_data)
aiff_file = io.BytesIO(aiff_data)
self.audio_reader = aifc.open(aiff_file, "rb")
Expand Down Expand Up @@ -218,7 +223,7 @@ def read(self, size = -1):
if hasattr(audioop, "byteswap"): # ``audioop.byteswap`` was only added in Python 3.4
buffer = audioop.byteswap(buffer, sample_width)
else: # manually reverse the bytes of each sample, which is slower but works well enough as a fallback
buffer = buffer[sample_width - 1::-1] + b"".join(buffer[i + sample_width:i:-1] for i in range(1, len(buffer), sample_width))
buffer = buffer[sample_width - 1::-1] + b"".join(buffer[i + sample_width:i:-1] for i in range(sample_width - 1, len(buffer), sample_width))
if self.audio_reader.getnchannels() != 1: # stereo audio
buffer = audioop.tomono(buffer, sample_width, 1, 1) # convert stereo audio data to mono
return buffer
Expand Down Expand Up @@ -310,7 +315,7 @@ def get_aiff_data(self, convert_rate = None, convert_width = None):
if hasattr(audioop, "byteswap"): # ``audioop.byteswap`` was only added in Python 3.4
raw_data = audioop.byteswap(raw_data, sample_width)
else: # manually reverse the bytes of each sample, which is slower but works well enough as a fallback
raw_data = raw_data[sample_width - 1::-1] + b"".join(raw_data[i + sample_width:i:-1] for i in range(1, len(raw_data), sample_width))
raw_data = raw_data[sample_width - 1::-1] + b"".join(raw_data[i + sample_width:i:-1] for i in range(sample_width - 1, len(raw_data), sample_width))

# generate the AIFF-C file contents
with io.BytesIO() as aiff_file:
Expand Down Expand Up @@ -338,7 +343,12 @@ def get_flac_data(self, convert_rate = None, convert_width = None):
# run the FLAC converter with the WAV data to get the FLAC data
wav_data = self.get_wav_data(convert_rate, convert_width)
flac_converter = get_flac_converter()
process = subprocess.Popen([flac_converter, "--stdout", "--totally-silent", "--best", "-"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
process = subprocess.Popen([
flac_converter,
"--stdout", "--totally-silent", # put the resulting FLAC file in stdout, and make sure it's not mixed with any program output
"--best", # highest level of compression available
"-", # the input FLAC file contents will be given in stdin
], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
flac_data, stderr = process.communicate(wav_data)
return flac_data

Expand Down Expand Up @@ -688,7 +698,10 @@ def recognize_bing(self, audio_data, key, language = "en-US", show_all = False):
try:
from time import monotonic # we need monotonic time to avoid being affected by system clock changes, but this is only available in Python 3.3+
except ImportError:
expire_time = None # monotonic time not available, don't cache access tokens
try:
from monotonic import monotonic # use time.monotonic backport for Python 2 if available (from https://pypi.python.org/pypi/monotonic)
except (ImportError, RuntimeError):
expire_time = None # monotonic time not available, don't cache access tokens
if expire_time is None or monotonic() > expire_time: # first credential request, or the access token from the previous one expired
# get an access token using OAuth
credential_url = "https://oxford-speech.cloudapp.net/token/issueToken"
Expand Down Expand Up @@ -891,7 +904,7 @@ def shutil_which(pgm):
return p

# backwards compatibility shims
WavFile = AudioFile
WavFile = AudioFile # WavFile was renamed to AudioFile in 3.4.1
def recognize_att(self, audio_data, app_key, app_secret, language = "en-US", show_all = False):
authorization_url = "https://api.att.com/oauth/v4/token"
authorization_body = "client_id={0}&client_secret={1}&grant_type=client_credentials&scope=SPEECH".format(app_key, app_secret)
Expand All @@ -912,4 +925,4 @@ def recognize_att(self, audio_data, app_key, app_secret, language = "en-US", sho
for entry in result["Recognition"]["NBest"]:
if entry.get("Grade") == "accept" and "ResultText" in entry: return entry["ResultText"]
raise UnknownValueError() # no transcriptions available
Recognizer.recognize_att = classmethod(recognize_att)
Recognizer.recognize_att = classmethod(recognize_att) # AT&T API is deprecated and shutting down as of 3.4.0

0 comments on commit e4acf97

Please sign in to comment.