This video explains how CLIP from OpenAI transforms Image Classification into a Text-Image similarity matching task. This is done with Contrastive Training and Zero-Shot Pattern-Exploiting Training. Thanks for watching!
Paper Links:
Clip (Blog Post): [ Ссылка ]
VirTex: [ Ссылка ]
ConVIRT: [ Ссылка ]
Pattern-Exploiting Training: [ Ссылка ]
Vision Transformer (Blog Post, Nice Animation): [ Ссылка ]
Thanks for watching! Please Subscribe!
![](https://i.ytimg.com/vi/u0HG77RNhPE/maxresdefault.jpg)