Voxtral TTS Demo
Generate realistic speech from text with custom or preset voices
Generate 3D models from images
Text-to-3D and Image-to-3D Generation
A unified multimodal understanding and generation model.
Scalable and Versatile 3D Generation from images
Audio Conditioned LipSync with Latent Diffusion Models