[Short Review] Fully Sharded Data Parallel: faster AI training with fewer GPUs