Speech Models 🎧 - a MElHuseyni Collection

MElHuseyni 's Collections

Emotion Detection

Arabic Models (LLM, VLM, Multimodel)

Image Segmentation Models 🍪

OCR Models 👀️📃

Object Detection Models 🍉

Visual Embedding Models 🖼️

VLM Leaderboards 📈

Speech Models 🎧

Speech Models 🎧

updated Aug 25, 2025

ICTNLP/Llama-3.1-8B-Omni

9B • Updated Nov 14, 2024 • 42 • 418
AudioPaLM: A Large Language Model That Can Speak and Listen

Paper • 2306.12925 • Published Jun 22, 2023 • 56
OpenMOSS-Team/SpeechGPT-7B-cm

Text Generation • Updated Sep 15, 2023 • 198 • 8
parler-tts/parler_tts_mini_v0.1

Text-to-Speech • 0.6B • Updated Apr 30, 2024 • 2.7k • 358
parler-tts/parler-tts-mini-expresso

Text-to-Speech • 0.6B • Updated May 21, 2024 • 457 • 117
ylacombe/expresso

Viewer • Updated Apr 30, 2024 • 11.6k • 792 • 93
parler-tts/parler-tts-large-v1

Text-to-Speech • 2B • Updated Nov 22, 2024 • 8.2k • 273
parler-tts/parler-tts-mini-v1

Text-to-Speech • 0.9B • Updated Nov 25, 2024 • 12k • 153
parler-tts/parler-tts-mini-jenny-30H

Text-to-Speech • 0.6B • Updated Apr 30, 2024 • 117 • 8
google/flan-t5-base

0.2B • Updated Jul 17, 2023 • 1.41M • 1.09k
parler-tts/dac_44khZ_8kbps

76.7M • Updated Apr 10, 2024 • 387 • 19
distil-whisper/distil-large-v3

Automatic Speech Recognition • 0.8B • Updated Apr 21 • 683k • 376
distil-whisper/distil-large-v3-ggml

Automatic Speech Recognition • Updated Mar 21, 2024 • 24
distil-whisper/distil-large-v3-ct2

Automatic Speech Recognition • Updated Mar 22, 2024 • 198 • 6
distil-whisper/distil-large-v3-openai

Automatic Speech Recognition • Updated Mar 27, 2024 • 4
distil-whisper/distil-large-v2

Automatic Speech Recognition • 0.8B • Updated Apr 21 • 4.77k • 516
distil-whisper/distil-medium.en

Automatic Speech Recognition • 0.4B • Updated Apr 21 • 15.6k • 127
distil-whisper/distil-small.en

Automatic Speech Recognition • 0.2B • Updated Apr 21 • 11k • 112
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56
coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 9.32M • 3.63k
suno/bark

Text-to-Speech • Updated Oct 4, 2023 • 18.6k • 1.54k
OuteAI/OuteTTS-0.1-350M

Text-to-Speech • 0.4B • Updated Apr 17, 2025 • 291 • 302
microsoft/speecht5_tts

Text-to-Speech • Updated Nov 8, 2023 • 80.8k • 836
fixie-ai/ultravox-v0_4_1-llama-3_1-8b

Audio-Text-to-Text • 50.3M • Updated May 6, 2025 • 894 • 99
fixie-ai/ultravox-v0_4_1-llama-3_1-70b

Audio-Text-to-Text • 58.7M • Updated May 6, 2025 • 19 • 24
fixie-ai/ultravox-v0_4_1-mistral-nemo

Audio-Text-to-Text • 52.4M • Updated May 6, 2025 • 666 • 27
facebook/seamless-m4t-v2-large

Automatic Speech Recognition • 2B • Updated Jan 4, 2024 • 332k • 989
nvidia/diar_sortformer_4spk-v1

Automatic Speech Recognition • 0.1B • Updated Dec 15, 2025 • 7.38k • 144
amiriparian/ExHuBERT

Audio Classification • Updated Dec 15, 2024 • 95 • 19
BUT-FIT/DiCoW_v3_2

Automatic Speech Recognition • 1.0B • Updated Sep 2, 2025 • 1.52k • 9
pyannote/segmentation-3.0

Voice Activity Detection • Updated May 10, 2024 • 6.52M • 1.25k
SWivid/F5-TTS

Text-to-Speech • Updated Mar 21, 2025 • 799k • 1.19k
SWivid/E2-TTS

Text-to-Speech • Updated Mar 12, 2025 • 109k • 57
ResembleAI/chatterbox

Text-to-Speech • Updated 23 days ago • 2.28M • • 1.67k
NAMAA-Space/EgypTalk-ASR-v2

Updated Aug 9, 2025 • 475 • 13
nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0

Automatic Speech Recognition • Updated Oct 21, 2025 • 5.06k • 40
nvidia/canary-1b-v2

Automatic Speech Recognition • Updated Dec 3, 2025 • 105k • 397
nvidia/canary-1b-flash

Automatic Speech Recognition • 0.8B • Updated 4 days ago • 3.8k • 274
nvidia/parakeet-tdt-0.6b-v3

Automatic Speech Recognition • 0.6B • Updated 4 days ago • 129k • • 965
Running on CPU Upgrade

Agents

Featured

1.39k

Open ASR Leaderboard

🏆

1.39k

Explore speech model performance benchmarks
microsoft/VibeVoice-1.5B

Text-to-Speech • 3B • Updated Jan 22 • 236k • 2.42k