When it comes to text-to-video models, the way they create clips is very similar to the way text-to-image models create images from simple text prompts. In this episode of Hidden Layers, we take a look at how these models operate under the hood - understanding how it uses Temporal Super Resolution and Spatial Super Resolution (SSR) models to create high resolution videos from frames of images. Moreover, you’ll learn how text-to-video models - like Imagen- are
an orchestration of various models working together to produce high resolution videos from a single image and text prompt.
Resources:
Watch our previous episode → [ Ссылка ]
Check out Imagen → [ Ссылка ]
Chapters:
0:00 - Intro
0:16 - What are text-to-video models?
0:34 - How do text-to-video models create videos?
1:40 - What are the complexities of modeling video?
2:04 - How do we get high-resolution videos from text-to-video models like Imagen?
3:38 - Recap of how Imagen works
3:47 - Leave us questions in the comments!
Watch more episodes of Hidden Layers→ [ Ссылка ]
Subscribe to the Google Research Channel → [ Ссылка ]
Ещё видео!