Generate saliency maps from images
Detect UI elements in images
Generate object masks on images with SAM2
Generate detailed captions for any image
OmniParser, turn your LLM into GUI agent
Generate personalized photos with your face
Generate speech in a cloned voice