Generate audio from text with optional emotion
Generate audio from text using a reference voice
Generate speech from text