Scalable and Versatile 3D Generation from images
Generate MIDI music from prompts
Transcribe audio to text