Cross-Modal Speech Segment Retrieval

Description

We tackle the problem of learning a multimodal representation space for language in the form of text as well as speech. We contrastively align semantically similar text and speech segments in the representation space in order to enable cross-modal retrieval of speech segments given a text query and vice versa.

How to run

Install dependencies

# clone project
git clone https://github.com/marcomoldovan/cross-modal-speech-segment-retrieval
cd cross-modal-speech-segment-retrieval

# [OPTIONAL] create python virtual environment
# Requires Python 3.7-3.9 on Windows or Python 3.7 or higher on Linux and MacOS
python3 -m venv myenv # uses default python version
virtualenv --python=/usr/bin/<python3.x> myenv # to specify python version
myenv\Scripts\activate.bat # for Windows
source myenv/bin/activate # for Linux or MacOS

# [ALTERNATIVE] create conda environment
conda create -n myenv python=3.8
conda activate myenv

# install pytorch according to instructions
# https://pytorch.org/get-started/

# install requirements
pip install -r requirements.txt

Train model with default configuration

# train on CPU
python train.py trainer.gpus=0

# train on GPU
python train.py trainer.gpus=1

Train model with chosen experiment configuration from configs/experiment/

python train.py experiment=experiment_name.yaml

You can override any parameter from command line like this

python train.py trainer.max_epochs=20 datamodule.batch_size=64

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
configs		configs
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
preprocess.py		preprocess.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross-Modal Speech Segment Retrieval

Description

How to run

About

Releases

Packages

Languages

marcomoldovan/cross-modal-speech-segment-retrieval

Folders and files

Latest commit

History

Repository files navigation

Cross-Modal Speech Segment Retrieval

Description

How to run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages