Generate speech from text using reference audio
Launch a web interface for text-to-speech and SSML processing
A Step Towards Music Generation Foundation Model