This video explains the BERT Transformer model! BERT restructures the self-supervised language modeling task on massive datasets like Wikipedia. Bi-directional prediction describes masking intermediate tokens and using tokens on the left and right of the mask for predicting what was masked. This video also explores the input and output representations and how this facilitates fine-tuning the BERT transformer!
Links Mentioned in Video:
The Illustrated Transformer: [ Ссылка ]
Tokenizers: How Machines Read: [ Ссылка ]
SQuAD: [ Ссылка ]
BERT: [ Ссылка ]
Thanks for watching! Please Subscribe!
![](https://i.ytimg.com/vi/OR0wfP2FD3c/maxresdefault.jpg)