Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications