ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation