Text To Speech Synthesis using VALL E