Let's reproduce GPT-2 (124M)