Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
adarshzolekar 's Collections
Multimodal AI Models
Audio & Speech Models
Vision Models (Image & Video)
Text & Code Models (NLP)

Audio & Speech Models

updated 2 days ago

Purpose: Speech recognition, text-to-speech, music, audio analysis.

Upvote
1

  • openai/whisper-large-v3

    Automatic Speech Recognition • 2B • Updated Aug 12, 2024 • 6.26M • • 5.33k

  • facebook/wav2vec2-base-960h

    Automatic Speech Recognition • 94.4M • Updated Nov 14, 2022 • 1.03M • 387

  • coqui/XTTS-v2

    Text-to-Speech • Updated Dec 11, 2023 • 5.61M • 3.34k

  • microsoft/speecht5_tts

    Text-to-Speech • Updated Nov 8, 2023 • 76.3k • 822

  • facebook/musicgen-small

    Text-to-Audio • 0.6B • Updated Nov 17, 2023 • 91.6k • 470
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs