Pre-training of BERT-based Transformer architectures explained – language and vision!