Generate expressive speech audio from text with emotion control
Generate detailed captions for any image