-
Notifications
You must be signed in to change notification settings - Fork 1
Using the GMM Specializer
Note: this is a work in progress, things will be updated soon...
Here we describe how to use the GMM specializer in your Python code and give examples. Please refer to our HotPar'11 and ASRU'11 papers for details on the specializer and the speaker diarization example respectively. Our specializer uses numpy to store and manipulate arrays. Note that this code is still under development.
Contact egonina at eecs dot berkeley dot edu
with questions and comments.
After installing Asp and the GMM specializer, you need to import it in your Python script like so:
from em import *
Creating a GMM object is just like creating an object of any class in Python. You can either create an empty GMM object, specifying its dimensions (M = number of components, D = dimension) and if it is diagonal or full (third parameter, True = diagonal, False = full):
gmm = GMM(M, D, True)
The parameters will be initialized randomly from the data when the train()
function is called (see below). GMM can also be initialized with existing parameters, like so:
gmm = GMM(M, D, True, means, vars, weights)
Where means, vars and weights are numpy arrays. Note: when training the GMM, these parameters will get overwritten by new parameters after training, if you are using parameters from a different GMM, make sure to make a copy of the parameters first and pass that to the GMM constructor.
To train the GMM object using the Expectation-Maximization (EM) algorithm on a set of observations, use the train()
function:
lkld = gmm.train(data, num_iters)
Where data
is an N by D numpy array of observation vectors (N vectors, each of D dimensions) and num_iters is the number of EM iterations. It returns the likelihood of the trained GMM fitting the data.
To compute the (log)likelihood of the trained GMM on a new set of observations use the score()
function:
log_lklds = gmm.score(data)
Where data
is an N by D numpy array. The function returns a numpy array of N log-likelihoods, one for each observation vector. To get cummulative statistics about the data, you can use numpy.average() or numpy.sum().
You can access the GMM mean, covariance and weight parameters like so:
means = gmm.components.means
covariance = gmm.components.covars
weights = gmm.components.weights
means
is an M by D array (number of components by number of dimensions), covariance
is an M by D by D array (number of components by number of dimensions by number of dimensions) and weights
is an array of size M (number of components).
To make sure things work correctly, you can run the test scripts in the gmm/tests/
directory.
This is a simple example that takes a training dataset training_data
, creates a 32-component GMM and trains it on the data, and then computes the average log_likelihood of a testing dataset:
from em import *
import numpy as np
training_data = np.array(get_training_data()) # training_data.shape = (N1, D)
testing_data = np.array(get_testing_data()) # testing_data.shape = (N2, D)
M = 32
D = training_data.shape[1] # get the D dimension from the data
gmm = GMM(M, D) # create new GMM object
gmm.train(training_data) # train the GMM on the training data
log_lklds = gmm.score(testing_data) # compute the log likelihoods of the testing data obsevations
print "Average log likelihood for testing data = ", np.average(log_lklds)
The gmm/examples/
directory includes two example applications: Speaker Diarization in cluster.py
and a Song Recommendation Engine song_recommendation.py
.
We have implemented a speaker diarization application using the GMM specializer. The task of the application is to determine "who spoke when?" in an audio recording. The algoritm is based on agglomerative hierarchical clustering of GMMs using the Bayesian Information Criterion (BIC) to segment the audio feature files into speaker-homogeneous regions. Here we briefly describe the imlementation in Python using the GMM specializer. For more details on the applications, please see our ASRU'11 paper.
The script for diarization is in examples/cluster.py
. After reading the config file (see below) The __main__
function creates a Diarizer
object, which then creates an initial list of GMMs used for clustering. It then calls the cluster()
to perform the main clustering computation. The algorithm is outlined as follows:
- Initialization: Train a set of GMMs, one per initial segment, using the expectation-maximization(EM) algorithm.
- Re-segmentation: Re-segment the audio track using majority vote over the GMMs’ likelihoods for 2.5s duration.
- Re-training: Retrain the GMMs on the new segmentation.
- Agglomeration: Select the most similar GMMs and merge them. At each iteration, the algorithm checks all possible pairs of GMMs, looking to obtain an improvement in BIC scores by merging the pair and retraining it on the pair’s combined audio segments. The GMM clusters of the pair with the largest improvement in BIC scores are permanently merged. The algorithm then repeats from the re-segmentation step until there are no remaining pairs whose merging would lead to an improved BIC score.
The script has the ability to choose between using the KL-divergence-based approximation for choosing the GMM pairs to merge, or comparing all pairs of GMMs (see paper). This setting can be specified in the config file (see below).
Finally, the script outputs two types of files, the segmentation result (in the NIST RTTM format) and the final parameters of the trained GMMs.
To call the script use regular python script execution call: python examples/cluster.py
.
Using the config file
The script takes in a config file to assist in setting all the parameters for diarization. The default script name that the script takes is diarizer.cfg
. You can also pass it your own config file by using the -c
option: python examples/cluster.py -c my_config.cfg
. We are using the Python ConfigParser library, so the script requires the parameters in the config file to go under the [Diarizer]
section tag. To display the config file settings, you can use the --help
option when running the script: python examples/cluster.py --help
.
Here's an example diarizer.cfg
file on a sample AMI meeting:
[Diarizer]
basename = IS1000a
mfcc_feats = /AMI/featuresIS1000a_seg.feat.htk
spnsp_file = /AMI/spnsp/IS1000a_seg.spch
output_cluster = IS1000a.rttm
gmm_output = IS1000a.gmm
em_iterations = 3
initial_clusters = 16
M_mfcc = 5
KL_ntop = 3
num_seg_iters_init = 1
num_seg_iters = 1
seg_length = 250
Some of the parameters are required and some are optional (and have some default values):
Reqiured parameters
- basename: meeting base name
- mfcc_feats: HTK feature file for the audio recording
- output_cluster: name of the output RTTM file
- gmm_output: name of the GMMs parameters file
- initial_clusters: number of initial clusters
- M_mfcc: number of gaussians per model
Optional parameters
- em_iterations: number of EM iteration for trainig (3 by default)
- spnsp_file: Speech/nonspeech file
- KL_ntop: number of GMM pairs to evaluate BIC on (0 to deactivate KL-divergency)
- num_seg_iters_init: number of majority vote segmentation iterations for the initial phase (2 by default)
- num_seg_iters: number of majority vote segmetnation iterations for the main clustering loop (3 by default)
- seg_length: segment length for majority vote (250 by default)
We have implemented a simple song recommendation engine using the Million Song Dataset (MSD). The idea is, given a tag (for example a genre like "metal" or "jazz", or mood like "sad", "romantic") to find all songs that match that tag in the Dataset. Then, we select top 20 most similar songs to recommend to the listener using a GMM-UBM approach both from the labeled set of songs and the unlabeled (i.e. the songs in the Dataset that do not contain the tag) set of songs.
The algorithm outline is as follows:
Training Phase:
- Given all songs that match the tag (labeled examples), split the set into 70% songs for training set and 30% for testing set.
- Collect all songs from the Dataset that don't contain the tag (we call these unlabeled examples)
- Train a (32 component) GMM on the timbre features of the songs in the trainign set.
- Collect features from all the songs in the Dataset for the UBM (Universal Background Model)
- Train a (32 component) GMM for the UBM on 30% of all song features.
Recommendation Phase:
- Compute the log likelihood for the songs in the testing set (labeled examples) for both the tag-GMM and the UBM
- Compute the log likelihood for the unlabeled example songs for both the tag-GMM and the UBM
- Display the top 20 recommended songs from the labeled example set
- Display the top 20 recommended songs from the unlabeled example set
The example script is examples/song_recommendation.py
. The __main__
function performs all the computation and prints the recommendations to the screen. We use Python pickle objects to store the features and the song dictionary for faster access. The script assumes the MSD is loaded locally on the machine and the get_song_dict()
takes a root directory to the MSD data. Please see the script for further details.
To call the script use regular python script execution call: python examples/song_recommendation.py