There is a lot of emerging interest in developing multimodal foundation models similar to foundation models for language which are LLMs. LLAVA which stands for Large Language and Vision Assistant is the first paper to apply instruction tuning to visual data thereby pushing the possibilities of Large Multimodal Models (LMMs). This video explains the first paper in the LLaVA series of papers such as LLaVA, LLaVA-RLFH, LLaVA-Med and the latest LLaVA 1.5
RELATED LINKS
LLaVA project page: [ Ссылка ]
LLaVA code: [ Ссылка ]
LLaVA demo: [ Ссылка ]
LLaVA dataset: [ Ссылка ]
LLaVA 1 paper: [ Ссылка ]
LLaVA 1.5 paper: [ Ссылка ]
LLAVA RLHF: [ Ссылка ]
LLAVA Med: [ Ссылка ]
🛠 🛠 🛠 MY SOFTWARE TOOLS 🛠 🛠 🛠
✍️ Notion - [ Ссылка ]
✍️ Notion AI - [ Ссылка ]
📹 OBS Studio for video editing - [ Ссылка ]
📼 Manim for some animations - [ Ссылка ]
🎵 My music - [ Ссылка ] and
📚 📚 📚 BOOKS I HAVE READ, REFER AND RECOMMEND 📚 📚 📚
📖 Deep Learning by Ian Goodfellow - [ Ссылка ]
📙 Pattern Recognition and Machine Learning by Christopher M. Bishop - [ Ссылка ]
📗 Machine Learning: A Probabilistic Perspective by Kevin Murphy - [ Ссылка ]
📘 Multiple View Geometry in Computer Vision by R Hartley and A Zisserman - [ Ссылка ]
MY KEY LINKS
YouTube: [ Ссылка ]
Twitter: [ Ссылка ]
Patreon: [ Ссылка ]
Github: [ Ссылка ]
WHO AM I?
I am a Machine Learning Researcher / Practioner who has seen the grind of academia and start-ups equally. I started my career as a software engineer 15 years back. Because of my love for Mathematics (coupled with a glimmer of luck), I graduated with a Master's in Computer Vision and Robotics in 2016 when the now happening AI revolution just started. Life has changed for the better ever since.
#machinelearning #deeplearning #aibites
![](https://i.ytimg.com/vi/GMOREHJTbR4/maxresdefault.jpg)