speech

StyleTTS2 trained on Ukrainian multispeaker dataset

hon9kon9ize/yue_emo_speech

Viewer • Updated Oct 28, 2024 • 727k • 395 • 15

walkerhyf/NCSSD

Updated Nov 12, 2024 • 39 • 25

NhutP/VietSpeech

Viewer • Updated Apr 25, 2025 • 1.03M • 1.52k • 35

FBK-MT/mosel

Viewer • Updated Oct 7, 2025 • 2.2M • 1.35k • 90

ARTPARK-IISc/Vaani

Viewer • Updated May 4 • 22.6M • 50.4k • 125

D4ve-R/bundestag-asr

Viewer • Updated Jun 9, 2024 • 250k • 168 • 3

Racoci/CORAA-v1.1

Viewer • Updated Jun 1, 2024 • 402k • 413 • 1

overflowwwww/yt-danish-public-v2

Viewer • Updated May 10, 2024 • 127k • 1.06k

parler-tts/mls_eng

Viewer • Updated Apr 9, 2024 • 10.8M • 6.64k • 37

SKNahin/open-large-bengali-asr-data

Viewer • Updated Mar 26, 2024 • 3.73M • 942 • 9

linhtran92/viet_bud500

Viewer • Updated Feb 29, 2024 • 649k • 819 • 70

facebook/multilingual_librispeech

Viewer • Updated Aug 12, 2024 • 1.49M • 20.6k • 182

sambal/lava_dataset

Viewer • Updated Nov 6, 2024 • 592k • 23

xacer/vox-pretrain

Viewer • Updated Oct 18, 2024 • 18.3M • 1.68k

nguyenvulebinh/av-vox2

Viewer • Updated Nov 11, 2024 • 1.13M • 55 • 2

oza75/bambara-tts

Viewer • Updated Jan 18, 2025 • 275k • 13 • 5

arsaporta/symile-m3

Viewer • Updated Nov 26, 2024 • 53.4M • 43.8k • 8

FILM6912/STT-v1

Viewer • Updated Feb 23, 2025 • 797k • 91

tiennguyenbnbk/en_kid_voice

Viewer • Updated Dec 23, 2024 • 10.7k • 14 • 5

BELLE-2/Belle-whisper-large-v3-turbo-zh

Automatic Speech Recognition • 0.8B • Updated Dec 16, 2024 • 375 • 77

MERaLiON/Multitask-National-Speech-Corpus-v1

Viewer • Updated Jan 21, 2025 • 15.2M • 15.6k • 21

danjacobellis/audioset_opus_24kbps

Viewer • Updated Jan 6, 2025 • 1.91M • 530 • 1

novateur/cosyvoice2_en

Viewer • Updated Jan 13, 2025 • 449k • 25

alvanlii/cantonese-radio

Viewer • Updated Jan 24, 2025 • 2.23M • 345 • 23

hhoangphuoc/switchboard

Viewer • Updated Aug 22, 2025 • 258k • 1.75k • 18

georgechang8/code_switch_yodas_zh

Viewer • Updated May 15, 2024 • 94.5k • 41 • 4

hexgrad/Kokoro-82M

Text-to-Speech • Updated Apr 10, 2025 • 15.9M • • 6.39k

deepseek-ai/DeepSeek-R1

Text Generation • 685B • Updated Mar 27, 2025 • 7.2M • • 13.4k

HKUSTAudio/Llasa-3B

Text-to-Speech • 4B • Updated May 10, 2025 • 297 • 528

speechbrain/LoquaciousSet

Viewer • Updated Feb 11 • 14.7M • 7.27k • 62

FireRedTeam/FireRedASR-AED-L

Automatic Speech Recognition • Updated Mar 5, 2025 • 289 • 70

krishnakalyan3/emo_parler

Viewer • Updated Jul 17, 2024 • 2.42M • 164 • 2

deepghs/arknights_voices_zh

Viewer • Updated Aug 28, 2024 • 12.4k • 93 • 5

ylacombe/accent-classifier

Audio Classification • 1.0B • Updated Feb 13, 2025 • 83 • 7

stepfun-ai/Step-Audio-TTS-3B

Text-to-Speech • 4B • Updated Feb 17, 2025 • 67 • 197

KBLab/rixvox-v2

Viewer • Updated May 1, 2025 • 3.56M • 1.38k • 11

amphion/Metis

Text-to-Speech • Updated Apr 13, 2025 • 13 • 31

KBLab/kb-whisper-large

Automatic Speech Recognition • 2B • Updated Aug 27, 2025 • 13.9k • 62

nyrahealth/CrisperWhisper

Automatic Speech Recognition • 2B • Updated Apr 7 • 57.9k • 335

speech-uk/voice-of-america

Updated Sep 6, 2025 • 9 • 1

aadel4/kid-whisper-medium-en-myst_cslu

Automatic Speech Recognition • Updated Apr 9, 2024 • 43 • 2

ajd12342/paraspeechcaps

Viewer • Updated Nov 22, 2025 • 1.07M • 293 • 21

OOPPEENN/ASMR_Dataset

Updated Mar 13, 2025 • 29 • 16

BAAI/ChildMandarin

Viewer • Updated May 19, 2025 • 40.7k • 196 • 38

sawradip/Englisg-ASR-Collection

Viewer • Updated Mar 23, 2025 • 1.13M • 719

WTForbes/ASCEND_ZH

Viewer • Updated Mar 23, 2025 • 6.14k • 8

WTForbes/ASCEND_EN

Viewer • Updated Mar 23, 2025 • 2.85k • 8

WTForbes/ASCEND_MIXED

Viewer • Updated Mar 23, 2025 • 3.33k • 15

nc33/genshin_voice_long

Viewer • Updated Mar 23, 2025 • 3.02k • 7

uyiosa/SER-WavLM-Multi-Attributes

Audio Classification • Updated Sep 28, 2025 • 2

Sh1man/silero_open_stt

Viewer • Updated Apr 13, 2025 • 28.1k • 27 • 4

Sh1man/golos_opus

Viewer • Updated Apr 13, 2025 • 1.12M • 206

DataoceanAI1/dolphin-base

Automatic Speech Recognition • Updated May 8 • 7 • 38

ByteDance/MegaTTS3

Text-to-Speech • Updated Apr 4, 2025 • 99 • 419

BAAI/SeniorTalk

Viewer • Updated Jan 18 • 60.1k • 723 • 38

SparkAudio/voxbox

Viewer • Updated Apr 15, 2025 • 23.8M • 8.64k • 74

nguyenvulebinh/MSA-ASR

0.6B • Updated Apr 10, 2025 • 27

W4ng1204/Nonspeech7k

Viewer • Updated Apr 20, 2025 • 7.01k • 30 • 1

ivrit-ai/audio-v2

Updated Feb 2 • 617 • 2

chenjoya/Live-WhisperX-526K

Preview • Updated Aug 4, 2025 • 7.41k • 10

MLCommons/unsupervised_peoples_speech

Updated Feb 27, 2025 • 30.4k • 78

ibm-granite/granite-speech-3.2-8b

Automatic Speech Recognition • 8B • Updated Apr 16, 2025 • 737 • 88

HamzaSidhu786/speech-accent-detection

Audio Classification • 94.6M • Updated Apr 7, 2025 • 29 • 2

moonshotai/Kimi-Audio-7B-Instruct

Text-to-Speech • 10B • Updated May 29, 2025 • 76.4k • 402

SparkAudio/Spark-TTS-0.5B

Text-to-Speech • Updated Mar 7, 2025 • 742 • 737

FunAudioLLM/InspireMusic-Base

Text-to-Audio • 0.5B • Updated 27 days ago • 33 • 17

baichuan-inc/Baichuan-Audio-Base

10B • Updated Feb 25, 2025 • 26 • 12

ASLP-lab/DiffRhythm-base

Updated Mar 26, 2025 • 39 • 171

m-a-p/YuE-s1-7B-anneal-jp-kr-cot

Text Generation • 6B • Updated Mar 12, 2025 • 143 • 23

IndexTeam/Index-TTS

Text-to-Speech • Updated Apr 27, 2025 • 150 • 152

fishaudio/fish-speech-1.5

Text-to-Speech • Updated Mar 25, 2025 • 3.9k • 754

zai-org/glm-4-voice-9b

10B • Updated Oct 25, 2024 • 22.4k • 119

SWivid/F5-TTS

Text-to-Speech • Updated Mar 21, 2025 • 720k • 1.18k

gpt-omni/mini-omni2

Any-to-Any • Updated Oct 24, 2024 • 108 • 285

2Noise/ChatTTS

Text-to-Audio • Updated Oct 22, 2024 • 3.74k • 1.66k

FunAudioLLM/SenseVoiceSmall

Automatic Speech Recognition • Updated 6 days ago • 9.98k • 423

mtspeech/MooER-MTL-80K

Automatic Speech Recognition • Updated Aug 27, 2024 • 1

amphion/Vevo1.5

Updated Apr 13, 2025 • 20 • 27

HKUSTAudio/AudioX

Text-to-Audio • Updated 9 days ago • 135

OrcinusOrca/YouTube-English

Viewer • Updated Aug 27, 2025 • 949k • 2.46k • 2

OrcinusOrca/YouTube-Cantonese

Viewer • Updated Aug 27, 2025 • 496k • 779 • 4

netease-youdao/Confucius-o1-14B

Text Generation • 15B • Updated Jan 23, 2025 • 29 • • 43

VocalNet/VocalNet-8B

12B • Updated Apr 23, 2025 • 4

kotoba-tech/kotoba-whisper-v2.2

Automatic Speech Recognition • 0.8B • Updated Oct 23, 2024 • 220k • 108

erax-ai/EraX-WoW-Turbo-V1.1

Automatic Speech Recognition • 0.8B • Updated Mar 31, 2025 • 55 • 14

NbAiLab/nb-whisper-large

Automatic Speech Recognition • 2B • Updated Jul 13, 2024 • 5.81k • 38

litagin/anime-whisper

Automatic Speech Recognition • 0.8B • Updated Nov 24, 2024 • 37.2k • 141

mesolitica/Malaysian-whisper-large-v3-turbo-v3

0.8B • Updated Jun 4, 2025 • 2.3k • 9

MERaLiON/MERaLiON-AudioLLM-Whisper-SEA-LION

Automatic Speech Recognition • 10B • Updated Feb 2 • 59 • 29

BELLE-2/Belle-whisper-large-v3-zh-punct

Automatic Speech Recognition • 2B • Updated Apr 16, 2025 • 170 • 50

MohamedRashad/Arabic-Whisper-CodeSwitching-Edition

Automatic Speech Recognition • 2B • Updated Jul 7, 2024 • 1.02k • 32

HebArabNlpProject/WhisperLevantine

Automatic Speech Recognition • Updated May 25, 2025 • 57 • 6

ghost613/whisper-large-v3-turbo-korean

Automatic Speech Recognition • 0.8B • Updated Oct 25, 2024 • 853 • 15

mjwong/whisper-large-v3-turbo-singlish

Automatic Speech Recognition • 0.8B • Updated May 3, 2025 • 71 • 2

MU-NLPC/whisper-large-v2-audio-captioning

Updated Mar 11, 2024 • 34 • 11

mispeech/dasheng-0.6B

Audio Classification • 0.6B • Updated Mar 19 • 201 • 4

mispeech/ced-base

Audio Classification • 85.7M • Updated Mar 30 • 17.3k • 14

meetween/Llama-speechlmm-1.0-l-ST

Translation • 9B • Updated Apr 30, 2025 • 6

alkiskoudounas/voc2vec

Audio Classification • Updated Apr 14, 2025 • 2.73k • 4

Revai/reverb-asr

Automatic Speech Recognition • Updated Dec 9, 2024 • 20 • 93

junnei/gemma-3-4b-it-speech

Automatic Speech Recognition • 5B • Updated Apr 10, 2025 • 34 • 29

maitrix-org/Voila-million-voice

Updated May 6, 2025 • 241 • 2

skit-ai/speechllm-1.5B

Image Feature Extraction • 1B • Updated Jun 25, 2024 • 31 • 7

kijjjj/audio_data_russian

Viewer • Updated May 21, 2025 • 995k • 420 • 8

WhissleAI/Meta_STT_ZH_AIShell3

Viewer • Updated May 3, 2025 • 88k • 118

VocalNet/VocalNet-1B

3B • Updated Apr 23, 2025 • 18

amu-cai/CAMEO

Viewer • Updated May 20, 2025 • 41.3k • 598 • 15

agufsamudra/tts-indo

Viewer • Updated May 18, 2025 • 114k • 313 • 8

disco-eth/EuroSpeech

Viewer • Updated May 4 • 12.3M • 24.3k • 94

edwixxx/indian_ted_talks_chunked

Viewer • Updated Apr 18, 2024 • 3.98k • 1 • 1

ishands/commonvoice-indian_accent

Viewer • Updated Mar 26, 2025 • 110k • 290

skbose/indian-english-nptel-v0

Viewer • Updated Sep 29, 2024 • 544k • 719 • 3

WhaleDolphin/MIKU-EmoBench

Viewer • Updated May 22, 2025 • 89.4k • 145 • 2

slprl/TinyStress-15K

Viewer • Updated Jun 5, 2025 • 16k • 117 • 6

MBZUAI/ArVoice

Viewer • Updated Oct 31, 2025 • 46.2k • 374 • 32

facebook/wav2vec2-xlsr-53-espeak-cv-ft

Automatic Speech Recognition • Updated Dec 10, 2021 • 329k • 49

FBK-MT/fama-medium

Automatic Speech Recognition • 1B • Updated Jun 4, 2025 • 18 • 4

ResembleAI/chatterbox

Text-to-Speech • Updated 16 days ago • 2.2M • • 1.65k

slprl/StresSLM

Audio-Text-to-Text • Updated Nov 11, 2025 • 3

espnet/yodas_owsmv4

Viewer • Updated Sep 1, 2025 • 4 • 733 • 17

clatter-1/XF-Denoise

Viewer • Updated Jun 3, 2025 • 115k • 5 • 1

tsinghua-ee/QualiSpeech

Viewer • Updated Aug 4, 2025 • 14.6k • 379 • 23

nvidia/hifitts-2

Viewer • Updated Nov 18, 2025 • 16.6M • 359 • 31

fishaudio/s1-mini

Text-to-Speech • Updated Feb 6 • 3.04k • 661

deepvk/NonverbalTTS

Viewer • Updated Oct 4, 2025 • 6.26k • 652 • 66

cmu-mlsp/TEARS

Viewer • Updated Jun 9, 2025 • 116k • 21 • 2

amphion/Emilia-Dataset

Viewer • Updated Feb 28, 2025 • 54.8M • 77k • 460

Itbanque/ScreenTalk_JA2ZH

Updated May 7, 2025 • 8

laion/BUD-E-Whisper

0.2B • Updated Jun 19, 2025 • 1.61k • 40

kyutai/stt-1b-en_fr

Automatic Speech Recognition • 1.0B • Updated Nov 18, 2025 • 129

jordand/whisper-d-v1a

Updated Nov 1, 2024 • 653 • 47

viewfinder-annn/ccf-aatc-track1

Updated Jun 21, 2025 • 1

webbigdata/C3TR-Adapter

Translation • Updated Aug 16, 2024 • 31 • 41

Atotti/Google-USM

Feature Extraction • 0.7B • Updated Aug 12, 2025 • 403 • 23

imprt/idol-songs-jp

Updated Apr 9 • 69 • 13

badrex/kinyarwanda-speech-1000h

Viewer • Updated Jul 3, 2025 • 199k • 14

kyutai/tts-1.6b-en_fr

Text-to-Speech • Updated Sep 11, 2025 • 45.8k • 379

kyutai/mimi

Feature Extraction • 96.2M • Updated Jul 2, 2025 • 810k • • 309

laion/Emilia-with-Emotion-Annotations

Preview • Updated Jul 20, 2025 • 4.12k • 28

slprl/PAST

Updated Sep 15, 2025 • 4

mteb/vocalsound

Viewer • Updated Jul 6, 2025 • 5.45k • 41

padmalcom/wav2vec2-large-nonverbalvocalization-classification

Audio Classification • Updated Jan 11, 2023 • 1.8k • 10

xunyi/SMIIP-NV

Updated Jun 25, 2025 • 104 • 5

xunyi/SMIIP-NV_finetune_CosyVoice2

Updated Sep 1, 2025 • 4

suno/bark

Text-to-Speech • Updated Oct 4, 2023 • 17.1k • 1.53k

speech-seq2seq/ami

Updated Sep 6, 2022 • 2.24k

edinburghcstr/ami

Viewer • Updated Jan 13 • 267k • 7.42k • 90

muhtasham/whisper-non-verbal-debug

Viewer • Updated Jul 5, 2025 • 33k • 108 • 1

ICTNLP/StreamUni

Viewer • Updated Jul 14, 2025 • 9.63k • 516 • 2

ICTNLP/StreamUni-Phi4

Audio-Text-to-Text • 6B • Updated Jul 14, 2025 • 7

bond005/audioset-nonspeech

Viewer • Updated Jul 10, 2025 • 14k • 24 • 3

nvidia/audio-flamingo-3

Audio-Text-to-Text • Updated Nov 28, 2025 • 426 • 151

laion/in-the-wild-sound-events

Updated Nov 9, 2025 • 34

mistralai/Voxtral-Small-24B-2507

Audio-Text-to-Text • 24B • Updated Dec 20, 2025 • 105k • 501

BUT-FIT/DiCoW_v3_2

Automatic Speech Recognition • 1.0B • Updated Sep 2, 2025 • 1.82k • 9

nvidia/canary-qwen-2.5b

Automatic Speech Recognition • 3B • Updated Apr 21 • 111k • 441

ByteDance-Seed/Seed-X-Instruct-7B

Translation • 8B • Updated Jul 28, 2025 • 146 • 128

nairaxo/japanese-tts

Viewer • Updated Jul 17, 2025 • 326k • 3

webbigdata/VoiceCore

Text-to-Speech • 3B • Updated Mar 29 • 60 • 17

Atotti/miipher-2-HuBERT-HiFi-GAN-v0.1

Updated Jan 3 • 5 • 15

Cnam-LMSSC/vibravox

Viewer • Updated Nov 7, 2025 • 26.6k • 1.19k • 29

AISHELL/RealMAN

Preview • Updated Dec 5, 2024 • 2.28k • 3

bigdefence/bigvox

Audio-Text-to-Text • 2B • Updated Jul 19, 2025 • 4 • 1

OmniAICreator/ASMR-Archive-Processed

Preview • Updated Apr 3 • 16.9k • 92

t-tech/T-one

Automatic Speech Recognition • 71.7M • Updated Jul 30, 2025 • 1.29k • 96

SebastianBodza/Kartoffelbox-v0.1

Text-to-Speech • Updated 9 days ago • 27 • 72

sarulab-speech/MSR-UTMOS_w2v2_fold0

Updated Jul 19, 2025 • 2

OOPPEENN/56697375616C4E6F76656C5F44617461736574

Updated Jul 5, 2025 • 3.39k • 58

PapaRazi/id-tts-v2

Updated Jul 15, 2025 • 32 • 8

bosonai/higgs-tts-2-3b-base

Text-to-Speech • 6B • Updated about 22 hours ago • 181k • 684

bosonai/higgs-audio-v2-tokenizer

Feature Extraction • 0.2B • Updated 26 days ago • 39.8k • • 52

BAAI/CS-Dialogue

Updated Jul 22, 2025 • 863 • 10

AaronZ345/GTSinger

Viewer • Updated Jul 24, 2025 • 28.6k • 5.8k • 16

pipecat-ai/smart-turn-v2

Voice Activity Detection • 94.8M • Updated Sep 3, 2025 • 6.02k • 80

saurabhati/DASS_medium_AudioSet_50.2

Audio Classification • 48.6M • Updated Apr 26, 2025 • 7.25k • 4

saurabhati/DASS_small_AudioSet_50.1

Audio Classification • 29.9M • Updated Oct 14, 2025 • 23

drakrig/vad_emotion_scorer

Audio Classification • Updated Jan 14 • 1

distil-whisper/librispeech_asr-noise

Viewer • Updated Sep 27, 2023 • 117k • 1.46k • 2

Myrtle/CAIMAN-ASR-BackgroundNoise

Viewer • Updated Feb 19, 2024 • 1.16k • 2.08k • 9

declare-lab/JAM-0.5

Text-to-Audio • Updated Aug 1, 2025 • 32 • 35

FunAudioLLM/CosyVoice-ttsfrd

Text-to-Speech • Updated 27 days ago • 6

zhifeixie/Audio-Reasoner

8B • Updated Mar 5, 2025 • 473 • 18

allenai/OLMoASR

Audio-Text-to-Text • Updated Mar 20 • 1 • 78

CodecSR/vocalset_synth

Viewer • Updated Mar 23, 2024 • 75.9k • 8

joujiboi/Galgame-VisualNovel-Reupload

Viewer • Updated Jun 21, 2025 • 7.05M • 2.61k • 35

EarthSpeciesProject/NatureLM-audio-training

Viewer • Updated Jun 3, 2025 • 26.4M • 13.5k • 18

nguyenvulebinh/spk-attribute

Viewer • Updated Apr 10, 2025 • 15.5M • 322 • 2

amphion/Emilia-NV

Viewer • Updated Sep 18, 2025 • 174k • 576 • 47

diabolocom/talkbank_4_stt

Viewer • Updated Jun 17, 2025 • 170k • 1.44k • 2

laion/laions_got_talent

Viewer • Updated Jan 5, 2025 • 461k • 1.76k • 41

laion/synthetic_vocal_bursts

Viewer • Updated Jan 1, 2025 • 326k • 682 • 6

cksqs/SynStard-1000

Updated Nov 25, 2025 • 276 • 2

KE-Team/Ke-Omni-R-3B

Audio-Text-to-Text • 5B • Updated Jun 16, 2025 • 6 • 22

KE-Team/KE-SemanticVAD

0.5B • Updated Mar 28, 2025 • 38 • 8

mispeech/midashenglm-7b-0804-fp32

Audio-Text-to-Text • 8B • Updated Mar 17 • 120k • 81

shunyalabs/pingala-v1-universal

Automatic Speech Recognition • 0.8B • Updated Aug 28, 2025 • 63 • 32

biodatlab/whisper-th-medium-combined

Automatic Speech Recognition • 0.8B • Updated Feb 20, 2024 • 3.36k • 20

nonverbalspeech/nonverbalspeech38k

Viewer • Updated Dec 10, 2025 • 38.7k • 1.21k • 36

naijavoices/naijavoices-dataset

Viewer • Updated Aug 7, 2025 • 1.92M • 1.4k • 25

qiangchunyu/SecoustiCodec

Updated Aug 7, 2025 • 7

bond005/whisper-podlodka-turbo

Automatic Speech Recognition • 0.8B • Updated Jan 29 • 1.74k • 41

ATH-MaaS/CSEMOTIONS

Viewer • Updated Aug 12, 2025 • 4.16k • 915 • 31

Respair/Higgs_Codec_Extended

Updated Aug 14, 2025 • 5 • 5

nvidia/parakeet-tdt-0.6b-v3

Automatic Speech Recognition • 0.6B • Updated May 20 • 163k • • 949

k2-fsa/OpenDialog

Viewer • Updated Apr 18 • 996k • 400 • 22

wcy1122/MGM-Omni-TTS-4B

Text-to-Speech • 5B • Updated Aug 17, 2025 • 7 • 6

AudioLLMs/Multitask-National-Speech-Corpus-v1-extend

Viewer • Updated Mar 31, 2025 • 15.2M • 12.8k • 5

alvanlii/audio-llm-train

Viewer • Updated Nov 6, 2024 • 2.07M • 4 • 2

jungsanghyun/taiwanspeech

Viewer • Updated Aug 18, 2025 • 140k • 58

japanese-asr/en_asr.mls

Viewer • Updated Sep 4, 2024 • 10.4M • 3.13k • 2

ASLP-lab/WenetSpeech-Yue

Updated Feb 5 • 516 • 42

amphion/TaDiCodec

Audio-to-Audio • 0.5B • Updated Sep 2, 2025 • 24 • 28

microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Jan 22 • 237k • 2.41k

amphion/TaDiCodec-TTS-AR-Qwen2.5-3B

Text-to-Speech • 3B • Updated Aug 26, 2025 • 8 • 6

ASLP-lab/WSYue-ASR

Updated Sep 12, 2025 • 15

ByteDance/Attention2Probability

Automatic Speech Recognition • Updated Aug 27, 2025 • 4

ESpeech/ESpeech-podcasts

Viewer • Updated Nov 25, 2025 • 2.81M • 45 • 11

ayush-shunyalabs/hindi-speech-dataset

Viewer • Updated Aug 25, 2025 • 146k • 73

allenai/OLMoASR-Pool

Viewer • Updated Mar 20 • 16.9M • 70 • 14

stepfun-ai/Step-Audio-2-mini

Any-to-Any • 8B • Updated Feb 14 • 12.4k • 259

cmots/UniSS

Audio-to-Audio • 2B • Updated 27 days ago • 34 • 3

NandemoGHS/Japanese-Eroge-Voice

Viewer • Updated Aug 31, 2025 • 221k • 655 • 34

vibevoice/VibeVoice-7B

Text-to-Speech • 9B • Updated Sep 5, 2025 • 8.73k • 189

gasmichel/LibriQuote

Updated 10 days ago • 575 • 6

fixie-ai/ultraVAD

Image Feature Extraction • Updated Jan 1 • 385k • 38

nirmoh/accent-whisper

Audio Classification • Updated Sep 7, 2025

IndexTeam/IndexTTS-2

Text-to-Speech • Updated Jan 20 • 14.8k • 738

RMSnow/Vevo2

Updated Mar 25 • 9

mcshao/EThai-ASR

Updated May 21, 2025 • 54 • 9

FireRedTeam/FireRedTTS2

Updated Sep 17, 2025 • 67

ufal/parczech4speech-segmented

Viewer • Updated Jun 16, 2025 • 751k • 89 • 1

typhoon-ai/typhoon-asr-realtime

Automatic Speech Recognition • Updated 15 days ago • 1.93k • 28

nvidia/diar_sortformer_4spk-v1

Automatic Speech Recognition • 0.1B • Updated Dec 15, 2025 • 7.13k • 142

utter-project/SpireFull

7B • Updated Sep 9, 2025 • 14 • 2

pipecat-ai/smart-turn-data-v3-train

Viewer • Updated Sep 11, 2025 • 227k • 1.57k • 8

ASLP-lab/SongFormDB

Updated May 15 • 3.47k • 9

FireRedTeam/FireRedChat-pvad

Voice Activity Detection • Updated Sep 22, 2025 • 15

opedromartins/speaker-datasets

Viewer • Updated Sep 21, 2025 • 150k • 17 • 1

YirongSun/LLaSO-Align

Updated Oct 2, 2025 • 4.18k • 3

openbmb/VoxCPM-0.5B

Text-to-Speech • Updated Sep 19, 2025 • 7.69k • 806

byan/cs-fleurs

Updated Sep 17, 2025 • 868 • 16

XiaomiMiMo/MiMo-Audio-Tokenizer

1B • Updated 9 days ago • 3.88k • 38

XiaomiMiMo/MiMo-Audio-7B-Instruct

Any-to-Any • 8B • Updated 9 days ago • 24.1k • 157

XiaomiMiMo/MiMo-Audio-7B-Base

Any-to-Any • 8B • Updated 9 days ago • 156 • 54

sarulab-speech/sidon-v0.1

Updated Dec 15, 2025 • 23

FireRedTeam/FireRedChat-punc

Updated Sep 28, 2025 • 2

FireRedTeam/FireRedChat-turn-detector

Updated Sep 22, 2025 • 5

shawnpi/SynParaSpeech

Viewer • Updated Apr 27 • 56k • 295 • 7

oreva/blab_long_audio

Updated May 12 • 57 • 1

Qwen/Qwen3-Omni-30B-A3B-Captioner

Any-to-Any • 32B • Updated Sep 22, 2025 • 5.34k • 230

Qwen/Qwen3-Omni-30B-A3B-Instruct

Any-to-Any • 35B • Updated Sep 22, 2025 • 2.01M • 946

herimor/voxtream

Text-to-Speech • 0.4B • Updated Mar 14 • 144 • 23

tencent/SongPrep-7B

Automatic Speech Recognition • 8B • Updated Oct 23, 2025 • 42 • 45

amphion/SingVERSE

Preview • Updated Sep 29, 2025 • 339 • 5

ASLP-lab/WSC-Train

Preview • Updated Apr 21 • 370 • 127

QingyuLiu1/Cross-Lingual_F5-TTS

Text-to-Speech • Updated Feb 8 • 7 • 4

maimai11/MNV_17

Viewer • Updated Oct 13, 2025 • 2.6k • 502 • 18

levicu/whisat

Automatic Speech Recognition • Updated Sep 29, 2025 • 2

LiquidAI/LFM2-Audio-1.5B

Audio-to-Audio • 1B • Updated Mar 27 • 329 • 350

inclusionAI/MingTok-Audio

1B • Updated Oct 4, 2025 • 97 • 29

inclusionAI/Ming-UniAudio-16B-A3B

Any-to-Any • 18B • Updated Nov 24, 2025 • 69 • 80

inclusionAI/Ming-UniAudio-16B-A3B-Edit

18B • Updated Oct 2, 2025 • 18 • 30

intronhealth/afrispeech-200

Updated Nov 20, 2023 • 941 • 34

adi-gov-tw/Taiwan-Tongues-ASR-CE-dataset-zhtw

Viewer • Updated Dec 22, 2025 • 120k • 244 • 1

adi-gov-tw/Taiwan-Tongues-ASR-CE-dataset-en

Viewer • Updated Dec 22, 2025 • 32k • 89

smallbraineng/smalltts

Updated Feb 15 • 9

kxxia/KALL-E

Updated Sep 24, 2025 • 2

ASLP-lab/Easy-Turn-Trainset

Viewer • Updated Oct 18, 2025 • 1.91k • 822 • 9

pklumpp/CommonPhoneDataset

Viewer • Updated Oct 29, 2025 • 76.3k • 202 • 3

Hui519/WildElder

Viewer • Updated May 2 • 23.7k • 979 • 1

Banafo/Kroko-ASR

Automatic Speech Recognition • Updated Oct 6, 2025 • 86

MohammadJRanjbar/ParsVoice

Viewer • Updated Oct 15, 2025 • 916 • 32 • 13

Higobeatz/FlexSED

Updated Nov 11, 2025

kiarashQ/farsi-asr-unified-cleaned

Viewer • Updated Nov 3, 2025 • 1.28M • 1.5k • 4

lelegu/omni-router-speechcrawl-streaming-asr-0.6b-v1

Automatic Speech Recognition • Updated Oct 15, 2025 • 1

LattifAI/Lattice-1-Alpha

Updated Dec 30, 2025 • 3 • 3

zhisheng01/VoiceCraft-X

Updated Jul 17, 2025 • 33 • 4

nvidia/omnivinci

Updated Feb 23 • 1.51k • 178

Plachta/StreamVoiceAnon

Audio-to-Audio • Updated Oct 20, 2025 • 4

haoweilou/ParaStyleTTS

Text-to-Speech • Updated Oct 23, 2025 • 8 • 3

TigreGotico/ambient_noises

Viewer • Updated Oct 22, 2025 • 3.18k • 94

FreedomIntelligence/ExpressiveSpeech

Viewer • Updated Oct 24, 2025 • 10.8k • 305 • 10

vogent/Vogent-Turn-80M

79.2M • Updated Oct 23, 2025 • 446 • 14

smulelabs/Smule-Renaissance-Small

Updated Oct 27, 2025 • 8

NeuraFusionAI/meta-translation-chinese-english-model

60.5M • Updated Aug 17, 2024 • 8

Jmica/IndexTTS-2-Japanese

Updated Nov 3, 2025 • 20 • 17

BAAI/Emotiontalk

Viewer • Updated Aug 5, 2025 • 116k • 734 • 23

lmms-lab/Aero-1-Audio

Text Generation • 2B • Updated Jun 7, 2025 • 171 • 91

treble-technologies/Treble10-Speech

Viewer • Updated Nov 3, 2025 • 9.26k • 1.15k • 21

Itbanque/ScreenTalk_JA2ZH-XS

Viewer • Updated May 23, 2025 • 10k • 264 • 3

OpenMOSS-Team/OmniAction

Updated Mar 27 • 45.1k • 282

espnet/powsm

Automatic Speech Recognition • Updated Jan 21 • 146 • 13

Soul-AILab/SoulX-Podcast-1.7B

Text-to-Speech • 2B • Updated Dec 18, 2025 • 238 • 234

Soul-AILab/SoulX-Podcast-1.7B-dialect

Text-to-Speech • 2B • Updated Dec 18, 2025 • 61 • 25

anyspeech/zipa-large-crctc-ns-800k

Updated Feb 12 • 4

ASLP-lab/MeanVC

Audio-to-Audio • Updated Dec 9, 2025 • 111 • 24

mkrausio/audiosnippets-cleaned

Viewer • Updated Jan 22, 2025 • 3M • 919 • 3

NandemoGHS/Anime-Speech-Japanese-Refiner

35B • Updated Nov 4, 2025 • 33 • 11

Appenlimited/1000h-us-english-smartphone-conversation

Viewer • Updated Jun 19, 2025 • 40 • 240 • 3

lab260/Balalaika2000H

Viewer • Updated about 21 hours ago • 554k • 74 • 1

YirongSun/LLaSO-Instruct

Preview • Updated Oct 2, 2025 • 5.58k • 6

jamescalam/youtube-transcriptions

Viewer • Updated Oct 22, 2022 • 209k • 820 • 44

alvanlii/cantonese-youtube

Viewer • Updated Nov 7, 2024 • 1.48M • 564 • 51

stcoats/YCSEP_v1

Viewer • Updated Jul 1, 2025 • 756k • 1.62k

MohammadGholizadeh/youtube-farsi

Viewer • Updated Jun 5, 2025 • 142k • 1.02k • 6

MohamedRashad/arabic-english-code-switching

Viewer • Updated Jul 4, 2024 • 12.5k • 310 • 35

MAdel121/arabic-egy-cleaned

Viewer • Updated May 4, 2025 • 104k • 619 • 15

Alex-Song/MSR-86K

Updated Jul 4, 2024 • 20 • 54

mamed0v/TurkmenSpeech

Updated Nov 1, 2025 • 433 • 5

spw2000/BEWO-1M

Updated Apr 10, 2025 • 1.84k • 5

Nicolas-BZRD/French_Transcribed_Podcast

Viewer • Updated Sep 22, 2023 • 282k • 17 • 2

bdx33/ted-talks-in-chinese-zhongwen

Viewer • Updated Sep 21, 2025 • 105 • 13 • 1

HKUSTAudio/Audio-FLAN-Dataset

Preview • Updated Oct 6, 2025 • 5.68k • 45

SeaLLMs/SeaLLMs-Audio-7B

Audio-Text-to-Text • 8B • Updated Dec 7, 2025 • 208 • 19

JHU-SmileLab/NaturalVoices_VC_870h

Updated Nov 11, 2025 • 54 • 11

JHU-SmileLab/NaturalVoices_VC_0.1

Viewer • Updated Nov 11, 2025 • 88.6k • 25 • 2

Tejasva-Maurya/English-Technical-Speech-Dataset

Viewer • Updated Oct 26, 2024 • 11.2k • 34 • 8

maya-research/maya1

Text-to-Speech • 3B • Updated Nov 12, 2025 • 3.4k • 886

facebook/omnilingual-asr-corpus

Viewer • Updated Nov 14, 2025 • 548k • 4.39k • 206

MERaLiON/MERaLiON-SER-v1

0.8B • Updated Feb 2 • 4.92k • 11

slprl/Stress-17K-raw

Viewer • Updated Nov 10, 2025 • 5.71k • 93 • 1

nvidia/parakeet_realtime_eou_120m-v1

Updated Dec 3, 2025 • 646 • 143

malaysia-ai/Multilingual-TTS

Updated about 10 hours ago • 4.63k • 21

jordand/echo-tts-no-speaker

Updated Nov 18, 2025 • 8 • 9

jordand/fish-s1-dac-min

Updated Nov 6, 2025 • 30 • 7

videosdk-live/Namo-Turn-Detector-v1-Chinese

Voice Activity Detection • Updated Oct 15, 2025 • 26 • 1

ai4bharat/Shrutilipi

Viewer • Updated Mar 7, 2025 • 2.28M • 1.44k • 14

ai4bharat/Kathbath

Viewer • Updated Mar 7, 2025 • 806k • 2.73k • 19

thusinh1969/viEar-V1.0

Audio Classification • Updated Nov 29, 2025 • 2

LattifAI/Lattice-1

Updated Jan 18 • 13 • 2

dolly-vn/dolly-audio-1000h-vietnamese

Viewer • Updated Nov 24, 2025 • 664k • 1.75k • 53

openbmb/VoxCPM1.5

Text-to-Speech • 0.8B • Updated Jan 14 • 10k • 361

zai-org/GLM-ASR-Nano-2512

Automatic Speech Recognition • 2B • Updated Apr 7 • 129k • 379

zai-org/GLM-TTS

Text-to-Speech • Updated Jan 12 • 251 • 342

FunAudioLLM/Fun-CosyVoice3-0.5B-2512

Text-to-Speech • Updated Feb 3 • 27.6k • 574

FunAudioLLM/Fun-ASR-Nano-2512

Automatic Speech Recognition • Updated 6 days ago • 3.29k • 208

FunAudioLLM/Fun-ASR-MLT-Nano-2512

Automatic Speech Recognition • Updated 7 days ago • 376 • 51

ResembleAI/chatterbox-turbo

Text-to-Speech • Updated Dec 15, 2025 • • 658

facebook/sam-audio-large

Updated Dec 30, 2025 • 13k • 418

Ceva-IP/DPDFNet

Audio-to-Audio • Updated about 1 month ago • 210 • 8

nvidia/speakerverification_en_titanet_large

Updated Nov 14, 2023 • 431k • 122

laion/podcast-links

Viewer • Updated Sep 5, 2025 • 17.8M • 8 • 3

abr-ai/niagara-19m-batch.en

Automatic Speech Recognition • Updated Apr 15 • 188 • 10

Rijgersberg/YouTube-Commons

Viewer • Updated Feb 16, 2025 • 22.7M • 98 • 5

common-pile/youtube

Viewer • Updated Jun 6, 2025 • 1.13M • 70 • 12

spellbrush/AliasingFreeNeuralAudioSynthesis

Audio-to-Audio • Updated 1 day ago • 14

YatharthS/FlashSR

Audio-to-Audio • Updated Dec 26, 2025 • 64

k2-fsa/Flow2GAN

Updated Jan 21 • 3

LEMAS-Project/LEMAS-Dataset-train

Viewer • Updated Mar 31 • 125M • 2.8k • 86

pevers/whisperd-nl

Automatic Speech Recognition • 2B • Updated Jul 1, 2025 • 659 • 7

karl-wang/MuChin-v2-6066

Viewer • Updated Jun 26, 2025 • 5.73k • 4 • 1

aihpi/FrWhisper

Automatic Speech Recognition • 2B • Updated 11 days ago • 257 • 10

TalTechNLP/whisper-large-v3-turbo-et-verbatim

Automatic Speech Recognition • 0.9B • Updated Apr 29 • 138 • 3

nelfproject/ASR_verbatim_v1

Updated Jul 10, 2024

LiquidAI/LFM2-2.6B-Transcript

Text Generation • 3B • Updated Mar 31 • 815 • 166

ajibawa-2023/Audio-Children-Stories-Collection-Large

Viewer • Updated Apr 1, 2025 • 2.1k • 111 • 12

LEMAS-Project/LEMAS-TTS

Text-to-Speech • Updated Mar 31 • 39 • 18

yfish/WESR-Bench

Viewer • Updated Jan 9 • 927 • 82 • 8

MahiA/VocalSound

Preview • Updated Nov 2, 2024 • 114

MediaTek-Research/TASTE-Dump

Viewer • Updated Jun 20, 2025 • 16.1M • 16 • 3

YatharthS/NovaSR

Audio-to-Audio • Updated Jan 19 • 93 • 86

AudenAI/azeros

Audio-Text-to-Text • Updated Jan 24 • 5 • 2

nvidia/magpie_tts_multilingual_357m

Text-to-Speech • Updated May 3 • 1.21k • 139

kyutai/pocket-tts

Updated May 4 • 1.13k • 682

kyutai/tts-0.75b-en-public

Text-to-Speech • Updated Sep 11, 2025 • 25.3k • 17

kyutai/stt-2.6b-en

Automatic Speech Recognition • 3B • Updated Jun 26, 2025 • 123

ASLP-lab/VoiceSculptor-VD

Text-to-Speech • 4B • Updated Feb 26 • 23 • 18

AudenAI/auden-encoder-voice

0.2B • Updated Jan 8 • 9 • 2

bofenghuang/stt-pseudo-labeled-whisper-large-v3-multilingual

Updated Mar 19, 2025 • 101 • 4

nvidia/Granary

Viewer • Updated about 14 hours ago • 113M • 4.1k • 204

chanchungkit/IMDA-NSC-datasets

Viewer • Updated Oct 8, 2025 • 4.58M • 1.94k

distil-whisper/spgispeech-timestamped

Updated Sep 25, 2023 • 1

stepfun-ai/Step-Audio-R1.1

Audio-Text-to-Text • 33B • Updated Feb 14 • 260 • 182

verstar/MRSAudio

Viewer • Updated Oct 22, 2025 • 131k • 60k • 7

NandemoGHS/Japanese-Eroge-Voice-V2

Viewer • Updated Jan 15 • 1.03M • 2.01k • 49

Tele-AI/TeleSpeech-ASR1.0

Updated May 31, 2024 • 81

AutoArk-AI/GPA

Text-to-Speech • 0.3B • Updated Apr 9 • 26 • 29

HeartMuLa/HeartMuLa-oss-3B

Text-to-Audio • 4B • Updated Jan 19 • 996 • 257

kk014/indian-accent-final-only

Viewer • Updated Jan 20 • 110k • 18 • 1

PVP-ADD/ZH-Famous

Viewer • Updated Jan 20 • 205k • 194 • 3

ugrozamangustov/sttRU

Viewer • Updated Jan 16 • 1.08M • 3

ugrozamangustov/sttEN

Viewer • Updated Jan 15 • 849k • 3

microsoft/VibeVoice-ASR

Automatic Speech Recognition • 9B • Updated Jan 27 • 744k • 1.19k

Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign

Text-to-Speech • 2B • Updated Jan 29 • 691k • 364

Qwen/Qwen3-TTS-12Hz-1.7B-Base

2B • Updated Jan 23 • 2.08M • 427

Qwen/Qwen3-TTS-Tokenizer-12Hz

Audio-to-Audio • 0.2B • Updated Jan 29 • 78.5k • 68

Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

Text-to-Speech • 2B • Updated Jan 29 • 2.07M • 1.64k

FlashLabs/Chroma-4B

Any-to-Any • 6B • Updated Jan 28 • 235 • 382

pkadambi/wav2textgrid

94.5M • Updated Oct 24, 2025 • 1.18k

google/WaxalNLP

Viewer • Updated 15 days ago • 1.67M • 29.7k • 235

Aynursusuz/lemas-italian-train-speech

Viewer • Updated Jan 20 • 7.21M • 157

YatharthS/LuxTTS

Text-to-Speech • Updated Jan 23 • 5.8k • 195

MLSpeech/WhisperRT-Streaming

Automatic Speech Recognition • Updated Mar 31 • 6

BUT-FIT/SE-DiCoW

Automatic Speech Recognition • 1B • Updated Jan 26 • 2.49k • 7

BUT-FIT/DiCoW_v3_3

Automatic Speech Recognition • 0.9B • Updated Jan 30 • 1.6k • 2

okestro-ai-lab/SYMPHONY-ASR

Audio-Text-to-Text • 5B • Updated Apr 27 • 16 • 2

nvidia/multitalker-parakeet-streaming-0.6b-v1

Automatic Speech Recognition • Updated Jan 28 • 595 • 112

Qwen/Qwen3-ASR-1.7B

Automatic Speech Recognition • 2B • Updated Jan 30 • 1.57M • 904

Qwen/Qwen3-ASR-0.6B

Automatic Speech Recognition • 0.9B • Updated Jan 30 • 908k • 309

Qwen/Qwen3-ForcedAligner-0.6B

Automatic Speech Recognition • 0.9B • Updated Jan 30 • 471k • 145

kugelaudio/kugelaudio-0-open

Text-to-Speech • 9B • Updated Feb 6 • 2.46k • 189

ACE-Step/Ace-Step1.5

Text-to-Audio • Updated Feb 3 • 46.9k • 781

openbmb/MiniCPM-o-4_5

Any-to-Any • 9B • Updated May 19 • 354k • 1.4k

ShandaAI/Hive

Viewer • Updated Feb 8 • 5.6M • 170 • 51

ACE-Step/acestep-transcriber

Audio-Text-to-Text • 11B • Updated Feb 3 • 9.53k • 59

talkbank/callhome

Viewer • Updated Apr 28, 2024 • 660 • 696 • 45

mistralai/Voxtral-Mini-4B-Realtime-2602

Automatic Speech Recognition • 4B • Updated Mar 11 • 1.81M • 895

microsoft/VibeVoice-AcousticTokenizer

Feature Extraction • 0.7B • Updated Feb 6 • 2.54k • • 14

UsefulSensors/moonshine-streaming-tiny

Automatic Speech Recognition • 44.1M • Updated Feb 10 • 7.32k • 12

inclusionAI/Ming-flash-omni-2.0

Any-to-Any • 104B • Updated Feb 12 • 2.5k • 268

FireRedTeam/FireRedLID

Audio Classification • Updated Mar 13 • 236 • 21

FireRedTeam/FireRedPunc

Automatic Speech Recognition • Updated Mar 13 • 169 • 19

FireRedTeam/FireRedVAD

Voice Activity Detection • Updated Mar 13 • 984 • 58

FireRedTeam/FireRedASR2-AED

Automatic Speech Recognition • Updated Mar 13 • 397 • 25

OpenMOSS-Team/MOSS-Audio-Tokenizer

Image Feature Extraction • 2B • Updated 21 days ago • 242k • 46

OpenMOSS-Team/MOSS-VoiceGenerator

Text-to-Speech • 2B • Updated Feb 11 • 11.7k • 44

OpenMOSS-Team/MOSS-SoundEffect

Text-to-Audio • 8B • Updated Mar 13 • 2.36k • 52

kyutai/hibiki-zero-3b-pytorch-bf16

Audio-to-Audio • Updated Feb 12 • 4.45k • 56

facebook/EgoAVU_data

Viewer • Updated Apr 26 • 3.99M • 412 • 14

nvidia/music-flamingo-2601-hf

Audio-Text-to-Text • 8B • Updated Apr 9 • 188k • 105

FireRedTeam/FireRedASR2-LLM

Automatic Speech Recognition • Updated Mar 13 • 256 • 17

cslys1999/Eureka-Audio-Instruct

Audio-Text-to-Text • 3B • Updated Feb 26 • 191 • 6

warisqr007/GAPS

Viewer • Updated Mar 31 • 177k • 496

warisqr007/GAPS-nptel

Viewer • Updated Mar 31 • 1.42M • 438

marksverdhei/Qwen3-Voice-Embedding-12Hz-0.6B

Feature Extraction • Updated Feb 23 • 2.05k • 23

efwkjn/whisper-ja-1.5B

Automatic Speech Recognition • 2B • Updated Feb 13 • 545 • 5

ProgramComputer/avspeech-visual-audio

Viewer • Updated Feb 20 • 2.81M • 8.27k • 3

oezi13/PlayDiffusion-nonverbal

Updated Oct 17, 2025 • 8 • 4

mispeech/dashengtokenizer

Audio-to-Audio • 0.8B • Updated Apr 21 • 1.96k • 12

nccm2p2/ZH-Famous

Viewer • Updated Mar 3 • 205k • 356 • 1

AAdonis/multilingual_audio_alignments

Viewer • Updated May 1 • 21.2M • 9.34k • 24

formospeech/whisper-large-v2-taiwanese-hakka-v1

Automatic Speech Recognition • 2B • Updated May 12 • 2

ibm-granite/granite-4.0-1b-speech

Automatic Speech Recognition • 2B • Updated Apr 2 • 105k • 249

espnet/oooo

Viewer • Updated Mar 8 • 74.8M • 1.24k

AudenAI/auden-encoder-tta-m10

Automatic Speech Recognition • 0.2B • Updated Jan 24 • 6 • 3

fishaudio/s2-pro

Text-to-Speech • 5B • Updated Mar 11 • 397k • 1.05k

HumeAI/tada-3b-ml

Text-to-Speech • 4B • Updated Mar 17 • 8.71k • 159

HumeAI/tada-1b

Text-to-Speech • 2B • Updated Mar 17 • 7.97k • 238

HumeAI/tada-codec

Text-to-Speech • Updated Mar 13 • 22

Soul-AILab/SoulX-Duplug-0.6B

Updated Mar 17 • 98 • 18

tencent/Covo-Audio-Chat

Audio-to-Audio • 8B • Updated Mar 16 • 12.7k • 97

CharlesNi/Multilingual-NVASR

Automatic Speech Recognition • Updated 14 days ago • 35 • 7

frothywater/kanade-25hz-clean

Updated Feb 3 • 2.25k • 2

YatharthS/LinaCodec

Audio-to-Audio • 0.1B • Updated Jan 3 • 270 • 24

woongzip1/universr-speech

Updated Mar 18 • 197 • 2

nvidia/RE-USE

Audio-to-Audio • 9.61M • Updated May 14 • 5.58k • 77

ooshyun/fine_grained_soundscape_control

Audio-to-Audio • Updated Apr 12

aman4014/translated-german-english-asr

Viewer • Updated May 6 • 4.65M • 3.48k • 2

woongzip1/universr-audio

Updated Mar 18 • 6.46k • 9

jingyi49/llsdr

Updated Mar 6

ngocminh06/whisper-nonverbal

2B • Updated Jan 27 • 1

bosonai/higgs-audio-v3-stt

Automatic Speech Recognition • 3B • Updated 14 days ago • 2.02k • 23

CohereLabs/cohere-transcribe-03-2026

Automatic Speech Recognition • 2B • Updated 16 days ago • 743k • 1.02k

mistralai/Voxtral-4B-TTS-2603

Text-to-Speech • Updated Mar 31 • 86.7k • 862

AudenAI/UTS

Viewer • Updated Mar 30 • 361k • 68 • 1

ajd12342/paraspeechclap-combined

Audio Classification • Updated Apr 6

capacit-ai/saga

Automatic Speech Recognition • 2B • Updated Apr 1 • 17 • 6

Aratako/Irodori-TTS-500M-v2-VoiceDesign

Text-to-Speech • 0.5B • Updated Apr 11 • 78

k2-fsa/OmniVoice

Text-to-Speech • 0.6B • Updated May 7 • 1.02M • 1.08k

gijs/speech-utterances

Viewer • Updated Jan 19 • 207k • 1.05k • 1

nvidia/nemocurator-speech-bandwidth-filter

Updated Apr 2 • 21

alvanlii/cantonese-youtube-tts

Viewer • Updated Apr 5 • 4.33M • 547 • 3

OpenSpeechHub/gigaspeech-asr-clean

Viewer • Updated Mar 31 • 10.1M • 2.05k • 2

KRAFTON/Raon-OpenTTS-Pool

Viewer • Updated May 21 • 846M • 15.2k • 35

yfyeung/FCaps

Preview • Updated 7 days ago • 63 • 5

laion/voice-tagging-whisper

Automatic Speech Recognition • 0.2B • Updated Apr 6 • 186 • 1

openbmb/VoxCPM2

Text-to-Speech • 2B • Updated Apr 16 • 585k • 1.43k

nvidia/parakeet-unified-en-0.6b

Automatic Speech Recognition • Updated 17 days ago • 728 • 51

HiDolen/Mini-BS-RoFormer

Audio-to-Audio • 8.83M • Updated Oct 4, 2025 • 26 • 1

syvai/hviske-v5

Automatic Speech Recognition • 2B • Updated May 21 • 117 • 1

XRXRX/X-Voice-Dataset-Train

Viewer • Updated May 4 • 72.4M • 5.28k • 11

SynDataLab-EN/omnivoice-zh

Viewer • Updated Apr 8 • 19.9k • 307

nvidia/audio-flamingo-next-hf

Audio-Text-to-Text • 8B • Updated May 13 • 8.21k • 56

amanuelbyte/african_speech_clean

Viewer • Updated Apr 14 • 2.38M • 775 • 1

OpenMOSS-Team/MOSS-Audio-4B-Instruct

Audio-Text-to-Text • 5B • Updated Apr 14 • 12.3k • 73

tiantiaf/voxlect-english-dialect-whisper-large-v3

Audio Classification • 2B • Updated Aug 10, 2025 • 88 • 2

LocalAI-io/LocalVQE

Audio-to-Audio • 578k • Updated 4 days ago • 4.01k • 56

mispeech/dasheng-denoiser

Audio-to-Audio • 0.1B • Updated Apr 21 • 364 • 14

Trelis/Chorus-v1

Automatic Speech Recognition • 0.9B • Updated 21 days ago • 370 • 20

ASLP-lab/Speaker-Reasoner

32B • Updated Apr 24 • 43 • 2

OpenMOSS-Team/MOSS-Audio-8B-Thinking

Audio-Text-to-Text • 9B • Updated 15 days ago • 4k • 76

mcshao/LAT-Audio

35B • Updated Apr 27 • 8 • 4

kensho/SPGISpeech2.0

Viewer • Updated Apr 29 • 170k • 690 • 2

CharlesNi/NV-Bench

Viewer • Updated 14 days ago • 1.65k • 54 • 6

TTS-AGI/EN_Emilia_Yodas_ScribeEvents

Viewer • Updated Mar 28 • 16k • 155 • 1

TTS-AGI/vocal-burst-annotation-asr-tuning-dataset

Viewer • Updated Apr 11 • 498k • 218 • 1

TTS-AGI/DE_Emilia_Yodas_ScribeEvents

Viewer • Updated Mar 28 • 12.2k • 762

TTS-AGI/JA_Emilia_Yodas_ScribeEvents

Viewer • Updated Mar 28 • 3.15k • 95

OpenMOSS-Team/MOSS-Audio-8B-Instruct

Audio-Text-to-Text • 9B • Updated 15 days ago • 3.89k • 44

MrDragonFox/EN_Emilia_Yodas_616h

Viewer • Updated Apr 14, 2025 • 228k • 1.25k • 10

MrDragonFox/DE_Emilia_Yodas_680h

Viewer • Updated Apr 14, 2025 • 245k • 647 • 11

MrDragonFox/JA_Emilia_Yodas_266h

Viewer • Updated Apr 14, 2025 • 107k • 139 • 4

ibm-granite/granite-speech-4.1-2b

Automatic Speech Recognition • 2B • Updated 14 days ago • 412k • 145

ibm-granite/granite-speech-4.1-2b-plus

Automatic Speech Recognition • 2B • Updated 10 days ago • 18.3k • 82

ibm-granite/granite-speech-4.1-2b-nar

Image Feature Extraction • 2B • Updated 8 days ago • 155k • 56

YongkangZOU/evoxtral-realtime-rl

Automatic Speech Recognition • Updated May 2 • 10

YongkangZOU/evoxtral-realtime-sft

Automatic Speech Recognition • Updated May 2 • 6 • 1

YongkangZOU/evoxtral-rl

Automatic Speech Recognition • Updated Mar 1 • 17 • 2

mrfakename/Qwen3-ASR-Enhanced-v0.1

Automatic Speech Recognition • 2B • Updated May 5 • 162 • 12

apptek-com/apptek_callcenter_dialogues

Viewer • Updated 29 days ago • 1.75k • 1.72k • 32

huckiyang/DiPCo

Preview • Updated May 5 • 245 • 6

hhoangphuoc/ami-disfluency

Updated Apr 8, 2025 • 32 • 2

commotion/indic-voices-scribe-v2-transcripts

Viewer • Updated Apr 3 • 37.4k • 34

MrDragonFox/whisper-tags

Viewer • Updated Jun 21, 2025 • 10k • 329 • 8

OpenSound/CapSpeech-PT

Viewer • Updated Jul 28, 2025 • 11.4M • 19 • 2

OpenSound/CapSpeech

Viewer • Updated Jun 4, 2025 • 20.8M • 259 • 25

freds0/TAGARELA

Viewer • Updated 17 days ago • 7.11M • 7.02k • 5

gaochengxia/ChineseConversation

Viewer • Updated May 8 • 1.67M • 326

videosdk-live/Namo-Turn-Detector-v1-Multilingual

Voice Activity Detection • Updated Oct 15, 2025 • 7.97k • 21

TEN-framework/TEN_Turn_Detection

Text Generation • 8B • Updated May 27, 2025 • 411 • • 70

brgroup/TurnSense

Updated May 22 • 9

pipecat-ai/smart-turn-v3

Voice Activity Detection • Updated Jan 7 • 174

disco-eth/WorldSpeech

Updated May 18 • 31k • 19

raditotev/bg-audiobooks-tts

Viewer • Updated Feb 27 • 10.6k • 143

adwumatech-ai/mghana-st

Updated 13 days ago • 191 • 2

ThanhNV1999/dolly-audio-1000h-vietnamese

Viewer • Updated May 15 • 664k • 550

TaurenMountain/WenetSpeech-Formal

Viewer • Updated about 6 hours ago • 1.01M • 1.87k

TaurenMountain/FormalASR-1.7B

Automatic Speech Recognition • 2B • Updated about 6 hours ago • 62

y-ren16/WenetSpeech-RP

Viewer • Updated May 19 • 900 • 196

y-ren16/MCLP-RPTTS

Text-to-Speech • 8B • Updated May 19 • 16

Adnan256/streaming-target-stno-wavlm-base

Voice Activity Detection • Updated May 17

zhifeixie/Voices-in-the-Wild-2M

Updated 29 days ago • 10.3k • 43

zhifeixie/Mega-ASR

Automatic Speech Recognition • Updated 17 days ago • 94

tencent/Hy-MT2-30B-A3B

Translation • 30B • Updated May 26 • 8.05k • 466

mispeech/Dasheng-AudioGen-Multilingual

Text-to-Audio • 2B • Updated 29 days ago • 92 • 5

ASLP-lab/UrduSpeech

Viewer • Updated 22 days ago • 73.5k • 10.8k • 1

opedromartins/ASR-datasets-ptbr

Viewer • Updated Sep 14, 2025 • 8.07M • 2.06k • 10

OpenMOSS-Team/MOSS-TTS-v1.5

Text-to-Speech • 8B • Updated May 26 • 193k • 176

AmapVoice/PilotTTS

Updated 25 days ago • 6

inesc-id/FalAR

Viewer • Updated May 13 • 793k • 992 • 7

yxdu/ESRT-4B

Automatic Speech Recognition • 5B • Updated 28 days ago • 177

mispeech/Dasheng-AudioGen

Text-to-Audio • 2B • Updated 10 days ago • 1.03k • 10

thu-spmi/whistle-large

Automatic Speech Recognition • Updated 29 days ago • 23

cmots/UniST

Viewer • Updated 23 days ago • 19.8M • 4.14k • 2

Centi234/WorldSpeech

Updated May 17 • 4.46k • 1

daydreamlive/DEMON

Audio-to-Audio • Updated 25 days ago • 1

pymaster/VocalParse

Automatic Speech Recognition • 2B • Updated 8 days ago • 58

Soul-AILab/SoulX-Transcriber

Automatic Speech Recognition • 35B • Updated 3 days ago • 628 • 25

CharlesZhang-USTC/Hi-Singers

Viewer • Updated 25 days ago • 20.2k • 47

MisoLabs/MisoTTS

Text-to-Speech • 8B • Updated 24 days ago • 213

zhifeixie/StreamAudio-2M

Viewer • Updated 23 days ago • 381k • 4.29k • 27

MIT-SLS/USAD2-XXLarge-Plus

Feature Extraction • 1B • Updated 22 days ago • 42 • 1

bosonai/higgs-tts-3-4b

Text-to-Speech • 5B • Updated about 23 hours ago • 86.7k • 535

nvidia/nemotron-3.5-asr-streaming-0.6b

Automatic Speech Recognition • Updated 10 days ago • 56.4k • • 702

Depositair/nonspeech_sfx

Viewer • Updated 22 days ago • 2.72k • 87

laion/universal-audio-annotation-pipeline

12B • Updated 19 days ago • 15.7k • 6

rednote-hilab/dots.tts-mf

Text-to-Speech • 2B • Updated 16 days ago • 1.22k • 23

nyu-dice-lab/wavepulse-radio-raw-transcripts

Viewer • Updated Feb 18, 2025 • 565M • 5.77k • 8

binant/voice-gender-classifier

Audio Classification • 15.5M • Updated 19 days ago • 516

yujie-ovo/ChildTalk

Updated 29 days ago • 272 • 1

humanify/DNS-Noise

Viewer • Updated 16 days ago • 133k • 739

byan/cs-yodas

Updated 16 days ago • 199

ResembleAI/Chatterbox-Multilingual-zh-cmn

Text-to-Speech • Updated 16 days ago

qualialabsAI/DuplexConv

Preview • Updated 14 days ago • 19.7k • 9

qualialabsAI/SmoothConv

Updated 14 days ago • 21.9k • 14

PaddlePaddle/PP-OCRv6_medium_rec_safetensors

Image-to-Text • 19.2M • Updated 14 days ago • 1.41k • 20

ASLP-lab/FM-Speech

Audio Classification • 35B • Updated 10 days ago • 22 • 2

zenlm/zen-3-asr

Automatic Speech Recognition • 2B • Updated about 6 hours ago • 53

zenlm/zen-3-asr-aligner

Automatic Speech Recognition • 0.9B • Updated about 6 hours ago • 48

Trelis/whisper-hinglish-preview

Automatic Speech Recognition • 2B • Updated 8 days ago • 1.03k • 4

5Hyeons/StyleTTS2

Updated Mar 24, 2025 • 4

StyleTTS2: Ukrainian text to speech