Speech-to-Text Models - a CIMAI Collection

CIMAI 's Collections

Document Understanding

VL Embedding Models

VL Embedding (multi-vec) Models

VL Instruct Models

VL Reasoning Models

VL Reranker Models

Text Embedding Models

Text Instruct Edge Models

Text Instruct Models

Text Reasoning Models

Text Reranking Models

Speech-to-Text Models

Speech-to-Text Models

updated Mar 20

https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

mistralai/Voxtral-Mini-4B-Realtime-2602

Automatic Speech Recognition • 4B • Updated Mar 11 • 1.36M • 848
mistralai/Voxtral-Mini-3B-2507

5B • Updated Jul 28, 2025 • 527k • 650

Note See benchmark scores here: https://mistral.ai/news/voxtral
nvidia/canary-1b-flash

Automatic Speech Recognition • 0.8B • Updated Dec 3, 2025 • 268k • 272

Note CC BY 4.0 License: 1) credit creator, 2) add link to license, 3) indicate if you made changes
nvidia/canary-qwen-2.5b

Automatic Speech Recognition • 3B • Updated 25 days ago • 87.8k • 425

Note CC BY 4.0 License: 1) credit creator, 2) add link to license, 3) indicate if you made changes
nvidia/canary-180m-flash

Automatic Speech Recognition • Updated Mar 18, 2025 • 1.78k • 99

Note CC BY 4.0 License: 1) credit creator, 2) add link to license, 3) indicate if you made changes