Vision-Transformer-Presentation

Presentation on Visual Transformer conducted at Wrocław University of Science and Technology on 21 April. 2021.

We discuss findings presented in An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Dosovitskiy, et al. and some earlier works. We introduce the concept of Vision Transformer, evaluate its advantages and disadvantages and make a few predictions on the future of the domain.

Presentation available at: tugot17.github.io/Vision-Transformer-Presentation

Gif by: lucidrains

TL;DR

In October 2020 Dosovitskiy et al. proposed a new computer vision architecture - Vision Transformer. Even though it is used for computer vision tasks, it has no convolutional layers. Authors show that it is possible to build a SOTA model using a pure transformer approach - something that until now was not possible.

To avoid the quadratic complexity issue present in transformers, authors divide input images into a sequence of small patches of local pixels. This plays a similar role to input word embedding in NLP problems. Later that embedding is fed into the transformer-encoder layer and the output of this operation is used for image classification.

The obtained results are very impressive. Using the transformer-only architecture it was possible to outperform the much bigger CNN models when given enough data. However, the same effect was not present when trained on smaller datasets (e.g ImageNet). It suggests that similarly to the situation present in NLP problems transformers require much bigger amounts of data to perform well but when given they achieve outstanding results.

The publication of a pure transformer architecture that is capable of surpassing CNN may announce a paradigm shift in the design of computer vision models. The number of publications already extends the concept presented by Dosovitski et al. and achieves SOTA results in tasks traditionally solved with CNNs. The demand for large chunks of data can further widen the gap between large laboratories and independent researchers in terms of the results they achieve and might shape the future of the domain.

Author

Piotr Mazurek

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
css		css
dist		dist
examples		examples
lib		lib
plugin		plugin
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.html		demo.html
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision-Transformer-Presentation

TL;DR

Author

About

Languages

License

tugot17/Vision-Transformer-Presentation

Folders and files

Latest commit

History

Repository files navigation

Vision-Transformer-Presentation

TL;DR

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Languages