Multimodal input, text gen Qwen/Qwen2.5-VL-72B-Instruct Image-Text-to-Text • 73B • Updated Jun 6, 2025 • 260k • • 598 Qwen/Qwen2.5-VL-7B-Instruct Image-Text-to-Text • 8B • Updated Apr 6, 2025 • 5.07M • • 1.47k
Tts hexgrad/Kokoro-82M Text-to-Speech • Updated Apr 10, 2025 • 8.99M • • 5.82k HKUSTAudio/Llasa-3B Text-to-Speech • 4B • Updated May 10, 2025 • 849 • 526 coqui/XTTS-v2 Text-to-Speech • Updated Dec 11, 2023 • 6.92M • 3.43k fishaudio/s1-mini Text-to-Speech • Updated Feb 6 • 7.45k • 622
Tts hexgrad/Kokoro-82M Text-to-Speech • Updated Apr 10, 2025 • 8.99M • • 5.82k HKUSTAudio/Llasa-3B Text-to-Speech • 4B • Updated May 10, 2025 • 849 • 526 coqui/XTTS-v2 Text-to-Speech • Updated Dec 11, 2023 • 6.92M • 3.43k fishaudio/s1-mini Text-to-Speech • Updated Feb 6 • 7.45k • 622
Multimodal input, text gen Qwen/Qwen2.5-VL-72B-Instruct Image-Text-to-Text • 73B • Updated Jun 6, 2025 • 260k • • 598 Qwen/Qwen2.5-VL-7B-Instruct Image-Text-to-Text • 8B • Updated Apr 6, 2025 • 5.07M • • 1.47k