Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
CIMAI 's Collections
Document Understanding
VL Embedding Models
VL Embedding (multi-vec) Models
VL Instruct Models
VL Reasoning Models
VL Reranker Models
Text Embedding Models
Text Instruct Edge Models
Text Instruct Models
Text Reasoning Models
Text Reranking Models
Speech-to-Text Models
Coding Models

Speech-to-Text Models

updated Mar 20

https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

Upvote
-

  • mistralai/Voxtral-Mini-4B-Realtime-2602

    Automatic Speech Recognition • 4B • Updated Mar 11 • 1.36M • 848

  • mistralai/Voxtral-Mini-3B-2507

    5B • Updated Jul 28, 2025 • 527k • 650

    Note See benchmark scores here: https://mistral.ai/news/voxtral


  • nvidia/canary-1b-flash

    Automatic Speech Recognition • 0.8B • Updated Dec 3, 2025 • 268k • 272

    Note CC BY 4.0 License: 1) credit creator, 2) add link to license, 3) indicate if you made changes


  • nvidia/canary-qwen-2.5b

    Automatic Speech Recognition • 3B • Updated 25 days ago • 87.8k • 425

    Note CC BY 4.0 License: 1) credit creator, 2) add link to license, 3) indicate if you made changes


  • nvidia/canary-180m-flash

    Automatic Speech Recognition • Updated Mar 18, 2025 • 1.78k • 99

    Note CC BY 4.0 License: 1) credit creator, 2) add link to license, 3) indicate if you made changes

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs