Generate mix audio contains audio,speech and music.
Generate audio from omni-modalities in a single model.