Good Audio Generation space, model, dataset
Good Audio Generation space, model, dataset collection
-
Audio-to-Audio • Updated • 108k • 101 -
KittenML/kitten-tts-nano-0.1
Updated • 35.5k • 504 -
FunAudioLLM/ThinkSound
Video-to-Video • Updated • 50 -
ThinkSound
🔊318Generate audio for a video from a caption or description
-
Higgs Audio Demo
🎤398Higgs Audio Demo
-
bosonai/higgs-audio-v2-generation-3B-base
Text-to-Speech • Updated • 271k • 659 -
Song Generation
🎵655Generate a song from custom lyrics and prompts
-
Vui
🏢185NotebookLM conversational speech model
-
Hibiki Samples
🤗52Translate speech in real-time with high fidelity
-
kyutai/moshiko-pytorch-bf16
Updated • 156k • 231 -
kyutai/mimi
Feature Extraction • 96.2M • Updated • 567k • • 291 -
maya-research/Veena
Text-to-Speech • Updated • 12.3k • 228 -
MiniMax Speech Tech Report
🎙104Generate high-quality speech from text with voice cloning
-
google/magenta-realtime
Updated • 343 • 540 -
PlayDiffusion
🎨120Generate modified audio from text and voice
-
Qwen2.5 Omni 7B Demo
🏆371Chat with AI using text, audio, images, and video
-
Open ASR Leaderboard
🏆1.23kExplore ASR model performance across languages and datasets
-
Open NotebookLM
🎙143Generate a podcast to discuss the topic of your choice!
-
Voila Demo
💻43Chat with a voice-clone AI
-
Voice Clone
🗣2.6kClone a voice and generate speech from any text
-
moonshotai/Kimi-Audio-7B-Instruct
Text-to-Speech • Updated • 1.47k • 386 -
moonshotai/Kimi-Audio-7B
Text-to-Speech • 10B • Updated • 148 • 77 -
Dia 1.6B
👯1.75kGenerate realistic dialogue from a script, using Dia!
-
nari-labs/Dia-1.6B
Text-to-Speech • Updated • 83.1k • • 2.83k -
ByteDance/MegaTTS3
Text-to-Speech • Updated • 83 • 414 -
Di♪♪Rhythm
🎶678Blazingly Fast and Embarrassingly Simple Song Generation
-
Gemini Audio Video
♊35Gemini understands audio and video!
-
nvidia/diar_sortformer_4spk-v1
Automatic Speech Recognition • 0.1B • Updated • 4.51k • 132 -
ACE Step
😻649A Step Towards Music Generation Foundation Model
-
ACE-Step/ACE-Step-v1-3.5B
Text-to-Audio • Updated • 717 -
stepfun-ai/Step-Audio-2-mini
Any-to-Any • Updated • 2.12k • 252 -
neuphonic/neutts-air
Text-to-Speech • 0.7B • Updated • 10.4k • 855 -
NeuTTS-Air
☁313Generate speech in a chosen voice from text
-
KaniTTS
😻113Generate expressive speech from your text in seconds
-
microsoft/UserLM-8b
Text Generation • Updated • 377 • 363 -
pipecat-ai/smart-turn-v3
Voice Activity Detection • Updated • 127 -
meituan-longcat/LongCat-Audio-Codec
Updated • 41 -
Qwen3 TTS Voice Design
📈108Generate custom voice audio from text and description
-
Qwen TTS Clone Demo
👀62Create a custom voice clone and synthesize speech
-
ResembleAI/chatterbox-turbo
Text-to-Speech • Updated • 612 -
Chatterbox Turbo Demo
⚡480Chatterbox Turbo Demo
-
zai-org/GLM-TTS
Text-to-Speech • Updated • 210 • 323 -
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
Text-to-Speech • 2B • Updated • 1.08M • 1.24k -
Qwen3-TTS Demo
🎙1.62kGenerate custom speech from text, voice descriptions, or samples
-
Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice
Text-to-Speech • Updated • 269k • 118 -
FlashLabs/Chroma-4B
Any-to-Any • Updated • 4.04k • 339 -
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning
Paper • 2601.11141 • Published • 23 -
MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization
Paper • 2601.01554 • Published • 57 -
FunAudioLLM/Fun-Audio-Chat-8B
Any-to-Any • 9B • Updated • 2.31k • 176