Audio & Speech Models - a adarshzolekar Collection

adarshzolekar 's Collections

Multimodal AI Models

Audio & Speech Models

Vision Models (Image & Video)

Text & Code Models (NLP)

Audio & Speech Models

updated Jan 23

Purpose: Speech recognition, text-to-speech, music, audio analysis.

openai/whisper-large-v3

Automatic Speech Recognition • 2B • Updated Aug 12, 2024 • 5.05M • • 5.8k
facebook/wav2vec2-base-960h

Automatic Speech Recognition • 94.4M • Updated Nov 14, 2022 • 1.1M • 398
coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 9.03M • 3.59k
microsoft/speecht5_tts

Text-to-Speech • Updated Nov 8, 2023 • 105k • 835
facebook/musicgen-small

Text-to-Audio • 0.6B • Updated Nov 17, 2023 • 186k • 493