This is the PyTroch implementation of Sequence to Sequence Video to Text for training and testing model.
clone coco-caption repository https://github.com/tylin/coco-caption.git rename folder coco-caption to coco_caption
I use Python 3.6 in this project. Recommend installing pytorch and python packages using Anaconda.
- PyTorch
- Numpy
- tqdm
- pretrainedmodels
- ffmpeg (can install using anaconda)
Download YouTubeClips dataset from : https://www.cs.utexas.edu/users/ml/clamp/videoDescription/YouTubeClips.tar Extract YouTubeClips in folder ./data
Type in terminal (you can change with you directory path) :
python optical_flow.py --video_path [YouTubeClips path]
-
Extract RGB Features using VGG-16 Type in terminal (you can change with you directory path) :
python extract_features.py --video_path [YouTubeClips path] --features_path [ ./data/msvd_vgg16_bn]
-
Extract Flow Features using Alexnet Type in terminal (you can change with you directory path) :
python extract_features.py --video_path [OpticalFlow path] --features_path [ .\data\feats\msvd_alexnet_flow]
-
Preprocess Video Caption by run preproces_caption.py Note : you need to adjust the path within the code
python preproces_caption.py
- Edit directory in training_rgb_flow.py , and you can adjust with your own directory
- For training the model, you can just type this in terminal :
python training_rgb_flow.py
- Edit directory in training_rgb_flow.py , and you can adjust with your own directory
- For tetsing or evaluation the model, you can just type this in terminal :
python evaluation.py
Scheme | METEOR | Bleu_4 | ROUGUE_L | CIDEr |
---|---|---|---|---|
Original Paper | 0.298 | N/A | N/A | N/A |
Baseline | 0.2965893042 | 0.3152643061 | 0.6666146162 | 0.5998173136 |
Sequence to Sequence - Video to Text
S. Venugopalan, M. Rohrbach, J. Donahue, T. Darrell, R. Mooney, K. Saenko
The IEEE International Conference on Computer Vision (ICCV) 2015