In this video, I explain the paper “an image is worth 16x16 words” in which Vision Transformer is Introduced.
I first describe one of the biggest flaws in attention mechanism; which is the fact that it is computation-hungry. Then I show you how the authors of this paper got around this difficulty. We learn about Vision Transformer, a large model that can be fine-tuned for a variety of computer vision tasks; and observed that, given enough data, vision transformer outperforms CNNs.
📑 Chapters:
0:00 Abstract
0:19 Introduction
2:27 Related Works
2:48 Method
5:06 Results
6:17 Conclusion
📝 Link to the paper:
[ Ссылка ]
👥 Authors:
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani et al.
🔗 Helpful Links:
- My Video on the Paper "Attention is All you Need"
[ Ссылка ]
🙏 I'd like to express my gratitude to Dr. Nasersharif, my supervisor, for suggesting this paper to me.
🙋♂️ Find me on:
- Find me on: [ Ссылка ]
#transformer #vision_transformer #computer_vision
An Image Is Worth 16x16 Words - Paper Explained
Теги
an image is worth 16x16 wordsAn Image is Worth 16x16 Words explainedan image is worth 16x16 words explainationAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scalevision transformer explanationvision transformercomputer visionattention is all you needvision transformer explainedML paper explainationvisual transformertransformer computer visiongoogle researchcomputer scienceneural networksImage recognition