Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
adarshzolekar
's Collections
Multimodal AI Models
Audio & Speech Models
Vision Models (Image & Video)
Text & Code Models (NLP)
Audio & Speech Models
updated
Jan 23
Purpose: Speech recognition, text-to-speech, music, audio analysis.
Upvote
1
openai/whisper-large-v3
Automatic Speech Recognition
•
2B
•
Updated
Aug 12, 2024
•
5.05M
•
•
5.8k
facebook/wav2vec2-base-960h
Automatic Speech Recognition
•
94.4M
•
Updated
Nov 14, 2022
•
1.1M
•
398
coqui/XTTS-v2
Text-to-Speech
•
Updated
Dec 11, 2023
•
9.03M
•
3.59k
microsoft/speecht5_tts
Text-to-Speech
•
Updated
Nov 8, 2023
•
105k
•
835
facebook/musicgen-small
Text-to-Audio
•
0.6B
•
Updated
Nov 17, 2023
•
186k
•
493
Upvote
1
Share collection
View history
Collection guide
Browse collections