Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer