Multimodality is the ability of an AI model to work with different types (or "modalities") of data, like text, audio, and images. Multimodality is what allows for a model like GPT-4 to write code given a diagram, and models like DALL-E 3 to generate an image given a description.
In this video, we'll learn about how multimodality works in AI, and the distinction between multimodal models and multimodal interfaces.
Links:
Intro repository: [ Ссылка ]
Introduction to Diffusion Models: [ Ссылка ]
How DALL-E works: [ Ссылка ]
Build your own text-to-image model: [ Ссылка ]
How RLHF works: [ Ссылка ]
▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬
🖥️ Website: [ Ссылка ]
🐦 Twitter: [ Ссылка ]
🦾 Discord: [ Ссылка ]
▶️ Subscribe: [ Ссылка ]
🔥 We're hiring! Check our open roles: [ Ссылка ]
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#MachineLearning #deeplearning
0:00 Writing code with GPT-4
0:31 Generating music with MusicLM
0:48 What is multimodality?
1:15 Fundamental concepts of multimodality
2:30 Representations and meaning
4:00 A problem with multimodality
4:50 Multimodal models vs. multimodal interfaces
6:21 Outro
Ещё видео!