For ADVANCED fine-tuning scripts, function-calling Llama 2 and more... check out Trelis.com
Slides: [ Ссылка ]
Binary-tree/FFF paper: [ Ссылка ]
MoE papers: [ Ссылка ] ; [ Ссылка ]
Reddit thread: [ Ссылка ]
Chapters
0:00 GPT-3, GPT-4 and Mixture of Experts
0:55 Why Mixture of Experts?
2:35 The idea behind Mixture of Experts
3:59 How to train MoE
5:41 Problems training MoE
7:54 Adding noise during training
9:06 Adjusting the loss function for router evenness
10:56 Is MoE useful for LLMs on laptops?
12:37 How might MoE help big companies like OpenAI?
14:22 Disadvantages of MoE
15:42 Binary tree MoE (fast feed forward)
18:15 Data on GPT vs MoE vs FFF
21:55 Inference speed up with binary tree MoE
23:48 Recap - Does MoE make sense?
25:05 Why might big companies use MoE?
Ещё видео!