In this video we go back to the original important paper from Google that introduced Vision Transformers (ViT). Up until vision transformers, CNNs were dominating the computer vision domain. Since the invention of transformers with the Attention Is All You Need paper, various attempts were made to utilize transformers in computer vision. We explain the challenge with doing so and how ViT architecture is able to deal with that challenge.
We also review the reduction of inductive bias in vision transformers comparing to convolutional neural networks.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale - [ Ссылка ]
Blog post - [ Ссылка ]
-----------------------------------------------------------------------------------------------
✉️ Join the newsletter - [ Ссылка ]
👍 Please like & subscribe if you enjoy this content
We use VideoScribe to edit our videos - [ Ссылка ] (affiliate)
-----------------------------------------------------------------------------------------------
Chapters:
0:00 Introduction
0:55 Using Transformers as-is?
2:13 How ViT Works?
3:30 Inductive Bias
Ещё видео!