Using the GMM Specializer

Note: this is a work in progress, things will be updated soon...

Here we describe how to use the GMM specializer in your Python code and give examples. Please refer to our HotPar'11 and ASRU'11 papers for details on the specializer and the speaker diarization example respectively. Our specializer uses numpy to store and manipulate arrays. Note that this code is still under development.

Contact egonina at eecs dot berkeley dot edu with questions and comments.

Importing the specializer

After installing Asp and the GMM specializer, you need to import it in your Python script like so:

from em import *

Creating the GMM object

Creating a GMM object is just like creating an object of any class in Python. You can either create an empty GMM object, specifying its dimensions (M = number of components, D = dimension) and if it is diagonal or full (third parameter, True = diagonal, False = full):

gmm = GMM(M, D, True)

The parameters will be initialized randomly from the data when the train() function is called (see below). GMM can also be initialized with existing parameters, like so:

gmm = GMM(M, D, True, means, vars, weights)

Where means, vars and weights are numpy arrays. Note: when training the GMM, these parameters will get overwritten by new parameters after training, if you are using parameters from a different GMM, make sure to make a copy of the parameters first and pass that to the GMM constructor.

GMM Training

To train the GMM object using the Expectation-Maximization (EM) algorithm on a set of observations, use the train() function:

lkld = gmm.train(data, num_iters)

Where data is an N by D numpy array of observation vectors (N vectors, each of D dimensions) and num_iters is the number of EM iterations. It returns the likelihood of the trained GMM fitting the data.

Computing likelihood given the trained GMM

To compute the (log)likelihood of the trained GMM on a new set of observations use the score() function:

log_lklds = gmm.score(data)

Where data is an N by D numpy array. The function returns a numpy array of N log-likelihoods, one for each observation vector. To get cummulative statistics about the data, you can use numpy.average() or numpy.sum().

Accessing the GMM parameters

You can access the GMM mean, covariance and weight parameters like so:

means = gmm.components.means

covariance = gmm.components.covars

weights = gmm.components.weights

means is an M by D array (number of components by number of dimensions), covariance is an M by D by D array (number of components by number of dimensions by number of dimensions) and weights is an array of size M (number of components).

Testing

To make sure things work correctly, you can run the test scripts in the gmm/tests/ directory.

Example: Simple Training and Evaluation

This is a simple example that takes a training dataset training_data, creates a 32-component GMM and trains it on the data, and then computes the average log_likelihood of a testing dataset:

      from em import *
      import numpy as np

      training_data = np.array(get_training_data()) # training_data.shape = (N1, D)
      testing_data = np.array(get_testing_data()) # testing_data.shape = (N2, D)

      M = 32
      D = training_data.shape[1] # get the D dimension from the data

      gmm = GMM(M, D) # create new GMM object

      gmm.train(training_data) # train the GMM on the training data

      log_lklds = gmm.score(testing_data) # compute the log likelihoods of the testing data obsevations

      print "Average log likelihood for testing data = ", np.average(log_lklds)

Other Examples

The gmm/examples/ directory includes two example applications: Speaker Diarization in cluster.py and a Song Recommendation Engine song_recommendation.py.

Speaker Diarization

We have implemented a speaker diarization application using the GMM specializer. The task of the application is to determine "who spoke when?" in an audio recording. The algoritm is based on agglomerative hierarchical clustering of GMMs using the Bayesian Information Criterion (BIC) to segment the audio feature files into speaker-homogeneous regions. Here we briefly describe the imlementation in Python using the GMM specializer. For more details on the applications, please see our ASRU'11 paper.

The script for diarization is in examples/cluster.py. After reading the config file (see below) The __main__ function creates a Diarizer object, which then creates an initial list of GMMs used for clustering. It then calls the cluster() to perform the main clustering computation. The algorithm is outlined as follows:

Initialization: Train a set of GMMs, one per initial segment, using the expectation-maximization(EM) algorithm.
Re-segmentation: Re-segment the audio track using majority vote over the GMMs’ likelihoods for 2.5s duration.
Re-training: Retrain the GMMs on the new segmentation.
Agglomeration: Select the most similar GMMs and merge them. At each iteration, the algorithm checks all possible pairs of GMMs, looking to obtain an improvement in BIC scores by merging the pair and retraining it on the pair’s combined audio segments. The GMM clusters of the pair with the largest improvement in BIC scores are permanently merged. The algorithm then repeats from the re-segmentation step until there are no remaining pairs whose merging would lead to an improved BIC score.

The script has the ability to choose between using the KL-divergence-based approximation for choosing the GMM pairs to merge, or comparing all pairs of GMMs (see paper). This setting can be specified in the config file (see below).

Finally, the script outputs two types of files, the segmentation result (in the NIST RTTM format) and the final parameters of the trained GMMs.

To call the script use regular python script execution call: python examples/cluster.py.

Using the config file

The script takes in a config file to assist in setting all the parameters for diarization. The default script name that the script takes is diarizer.cfg. You can also pass it your own config file by using the -c option: python examples/cluster.py -c my_config.cfg. We are using the Python ConfigParser library, so the script requires the parameters in the config file to go under the [Diarizer] section tag. To display the config file settings, you can use the --help option when running the script: python examples/cluster.py --help.

Here's an example diarizer.cfg file on a sample AMI meeting:

      [Diarizer]
      basename = IS1000a
      mfcc_feats = /AMI/featuresIS1000a_seg.feat.htk
      spnsp_file = /AMI/spnsp/IS1000a_seg.spch
      output_cluster = IS1000a.rttm
      gmm_output = IS1000a.gmm

      em_iterations = 3
      initial_clusters = 16
      M_mfcc = 5

      KL_ntop = 3
      num_seg_iters_init = 1
      num_seg_iters = 1
      seg_length = 250

Some of the parameters are required and some are optional (and have some default values):

Reqiured parameters

basename: meeting base name
mfcc_feats: HTK feature file for the audio recording
output_cluster: name of the output RTTM file
gmm_output: name of the GMMs parameters file
initial_clusters: number of initial clusters
M_mfcc: number of gaussians per model

Optional parameters

em_iterations: number of EM iteration for trainig (3 by default)
spnsp_file: Speech/nonspeech file
KL_ntop: number of GMM pairs to evaluate BIC on (0 to deactivate KL-divergency)
num_seg_iters_init: number of majority vote segmentation iterations for the initial phase (2 by default)
num_seg_iters: number of majority vote segmetnation iterations for the main clustering loop (3 by default)
seg_length: segment length for majority vote (250 by default)

Song Recommendation Engine

We have implemented a simple song recommendation engine using the Million Song Dataset (MSD). The idea is, given a tag (for example a genre like "metal" or "jazz", or mood like "sad", "romantic") to find all songs that match that tag in the Dataset. Then, we select top 20 most similar songs to recommend to the listener using a GMM-UBM approach both from the labeled set of songs and the unlabeled (i.e. the songs in the Dataset that do not contain the tag) set of songs.

The algorithm outline is as follows:

Training Phase:

Given all songs that match the tag (labeled examples), split the set into 70% songs for training set and 30% for testing set.
Collect all songs from the Dataset that don't contain the tag (we call these unlabeled examples)
Train a (32 component) GMM on the timbre features of the songs in the trainign set.
Collect features from all the songs in the Dataset for the UBM (Universal Background Model)
Train a (32 component) GMM for the UBM on 30% of all song features.

Recommendation Phase:

Compute the log likelihood for the songs in the testing set (labeled examples) for both the tag-GMM and the UBM
Compute the log likelihood for the unlabeled example songs for both the tag-GMM and the UBM
Display the top 20 recommended songs from the labeled example set
Display the top 20 recommended songs from the unlabeled example set

The example script is examples/song_recommendation.py. The __main__ function performs all the computation and prints the recommendations to the screen. We use Python pickle objects to store the features and the song dictionary for faster access. The script assumes the MSD is loaded locally on the machine and the get_song_dict() takes a root directory to the MSD data. Please see the script for further details.

To call the script use regular python script execution call: python examples/song_recommendation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly