SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation

Updates

Upload SynthRef-YouTube-VIS dataset
Upload pre-trained scene graph generation model which is able to detect object attributes
TODO: Upload code for generating SynthRef-YouTube-VIS dataset
TODO: Provide PyTorch dataloader for running RefVOS model with SynthRef-YouTube-VIS dataset
TODO: Provide weights of best models trained with SynthRef-YouTube-VIS dataset

Datasets

1. YouTube-VIS

In this work we used SynthRef method to generate synthetic referring expressions for the training set of YouTube-VIS dataset [1]. We have used the 2019 version of YouTube-VIS.

YouTube-VIS dataset (2019 version) can be downloaded here.

SynthRef

SynthRef generates synthetic referring expressions for objects by using the ground-truth annotations (object classes and bounding boxes) of objects in an image/video dataset as well as detected attributes of target objects.

Attributes for target objects are predicted by the model of Unbiased Scene Graph Generation from Biased Training [3], a scence graph generation model base on Faster R-CNN.

Similarly to our work, where we used this model for predicting attributes for the objects in YouTube-VIS, this model can be used to predict attributes on any image/video dataset. Please refer to these guidelines on how to detect scene graphs (including attributes of objects) for your dataset.

We provide the pre-trained scene graph generation model with attribute detection head which you can use in order to detect attributes for your dataset, following the guidelines mentioned above.

In our paper we only take advantage of the detected attributes but we encourage the community to explore also the scene graph relationships for the generation of referring expressions.

SynthRef-YouTube-VIS Dataset

Our dataset with synthetic referring expressions for YouTube-VIS training set is called SynthRef-YouTube-VIS. It can be found in the data/ folder in .csv format.

The video_id and annotation_id columns of the dataset correspond to the video and annotation IDs of the "train.json" file of the YouTube-VIS dataset, which you have to download from the link provided here.

Experiments

The model used in our experiments is RefVOS [2]

We will provide the PyTorch Dataloader file which can be used in order to train RefVOS with SynthRef-YouTube-VIS dataset.

We will also upload the weights of the best RefVOS models trained with SynthRef-YouTube-VIS.

References

[1] Video instance segmentation (2019). Linjie Yang, Yuchen Fan, and Ning Xu. In Proceedings of the IEEE International Conference on Computer Vision, pages 5188–5197.

[2] Refvos: A closer look at referring expressions for video object segmentation (2020). Miriam Bellver, Carles Ventura, Carina Silberer, Ioannis Kazakos, Jordi Torres, and Xavier Giro-i Nieto. arXiv preprint arXiv:2010.00263

[3] Unbiased Scene Graph Generation from Biased Training (2020). Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, Hanwang Zhang. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3716–3725.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation

Updates

Datasets

1. YouTube-VIS

SynthRef

SynthRef-YouTube-VIS Dataset

Experiments

References

About

Releases

Packages

Contributors 2

License

imatge-upc/synthref

Folders and files

Latest commit

History

Repository files navigation

SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation

Updates

Datasets

1. YouTube-VIS

SynthRef

SynthRef-YouTube-VIS Dataset

Experiments

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages