Palm-E is a multimodal embodied large language model based on the transformer architecture. Its language model capabilities are similar to GPT-4 and other similar models. It is multimodal so it can work with text, images and robot states.
I present some examples of this breakthrough and explain how it is connected to to Teslas bot efforts like the optimus.
0:00 Intro
0:29 Model Parameters
1:33 Model Architecture
2:25 Example - Autonomous Movement
4:22 Image Analysis
References:
Blog Post:
[ Ссылка ]
Github Post:
[ Ссылка ]
Research Paper:
[ Ссылка ]
#googleai #palme #gpt4
![](https://i.ytimg.com/vi/kEbY7Gzpt18/maxresdefault.jpg)