Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
keyword_based.py		keyword_based.py
keyword_based_test.py		keyword_based_test.py
method.py		method.py
run_baseline.py		run_baseline.py
run_baselines.ipynb		run_baselines.ipynb
run_baselines.sh		run_baselines.sh
vector_based.py		vector_based.py
vector_based_test.py		vector_based_test.py

README.md

Baseline methods for conversational response selection

This directory provides several baseline methods for conversational response selection. These help to benchmark the performance on the various datasets.

Baselines are intended to be relatively fast to run locally, and are not intended to be highly competitive with state of the art methods. As such they are limited to using a small portion of the training set, typically ten thousand randomly sampled examples.

Note that baselines only use the context feature to rank the response, and do not take into account extra_contexts.

Keyword-based

The keyword-based baselines use keyword similarity metrics to rank responses given a context. The TF_IDF method computes inverse document frequency statistics on the training set. Responses are scored using their tf-idf cosine similarity to the context.

The BM25 method builds on top of the tf-idf similarity, applying an adjustment to the term weights. See Okapi BM25: a non-binary model for further discussion of the approach.

Vector-based

The vector-based methods use publicly available neural network embedding models to embed contexts and responses into a vector space. The models implemented currently are:

USE - the universal sentence encoder
USE_LARGE - a larger version of the universal sentence encoder
ELMO - the Embeddings from Language Models approach
BERT_SMALL - the Bidirectional Encoder Representations from Transformers approach
BERT_LARGE - a larger version of BERT
USE_QA - The dual question/answer encoder version of the universal sentence encoder. Note this encodes contexts and responses using separate subnetworks, and USE_QA_SIM amounts to ranking with the pre-trained dot-product score.

all of which are loaded from Tensorflow Hub.

There are two vector-based baseline methods, one for each of the above models. The SIM method ranks responses according to their cosine similarity with the context vector. This method does not use the training set at all.

The MAP method learns a linear mapping on top of the response vector. The final score of a response with vector given a context with vector is the cosine similarity of the context vector with the mapped response vector:

where

and are learned parameters. This allows for learning an arbitrary linear mapping on the context side, while making it easy for the model to interpolate with the SIM baseline using the residual connection gated by . Vectors are L2-normalized before being fed to the MAP method, so that the method is invariant to scaling.

The parameters are learned on the training set, using the dot product loss from Henderson et al 2017. A sweep over learning rate and regularization parameters is performed using a held-out dev set. The final learned parameters are used on the evaluation set.

The combination of the five embedding models with the two vector-based methods gives ten baseline methods: USE_SIM, USE_MAP, USE_LARGE_SIM, USE_LARGE_MAP, ELMO_SIM, ELMO_MAP, BERT_SMALL_SIM, BERT_SMALL_MAP, BERT_LARGE_SIM and BERT_LARGE_MAP.

Running the baselines

Get the data

To get the standard random sampling of the train and test sets, please get in touch with Matt.

You can also generate the data yourself, and then copy it locally, though this may result in slightly different results:

mkdir data
gsutil cp ${DATADIR?}/train-00001-* data/
gsutil cp ${DATADIR?}/test-00001-* data/

For Amazon QA data, you will need to copy two shards of the test set to get enough examples.

This provides a random subset of the train and test set to use for the baselines. Recall that conversational datasets are always randomly shuffled and sharded.

Run the baselines

We recommend using run_baselines.ipynb to run the baselines on Google Colab, using a free GPU.

When running vector-based methods, make use of tensorflow hub's caching to speed up results:

export TFHUB_CACHE_DIR=~/.tfhub_cache

Then run an individual baseline with:

python baselines/run_baseline.py  \
  --method TF_IDF \
  --train_dataset data/train-* \
  --test_dataset data/test-*

Note that the USE_LARGE, ELMO and all BERT-based models baselines are slow, and may benefit from faster hardware. For these methods set --eval_num_batches 100.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

baselines

baselines

README.md

Baseline methods for conversational response selection

Keyword-based

Vector-based

Running the baselines

Get the data

Run the baselines

Files

baselines

Directory actions

More options

Directory actions

More options

Latest commit

History

baselines

Folders and files

parent directory

README.md

Baseline methods for conversational response selection

Keyword-based

Vector-based

Running the baselines

Get the data

Run the baselines