Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Giymo11
's Collections
Multimodal (Audio + Visual)
Multimodal (Audio)
Audio Only
Multimodal (Audio)
updated
Jan 8
Upvote
-
Qwen/Qwen3-Omni-30B-A3B-Instruct
Any-to-Any
•
35B
•
Updated
Sep 22, 2025
•
1.22M
•
924
Qwen/Qwen2-Audio-7B
Audio-Text-to-Text
•
8B
•
Updated
Nov 20, 2024
•
9.38k
•
171
mistralai/Voxtral-Small-24B-2507
Audio-Text-to-Text
•
24B
•
Updated
Dec 20, 2025
•
59.2k
•
495
mistralai/Voxtral-Mini-3B-2507
5B
•
Updated
Jul 28, 2025
•
523k
•
652
moonshotai/Kimi-Audio-7B-Instruct
Text-to-Speech
•
10B
•
Updated
May 29, 2025
•
87.2k
•
399
google/gemma-3n-E4B-it
Image-Text-to-Text
•
Updated
Jul 14, 2025
•
30.5k
•
•
913
nvidia/audio-flamingo-3-hf
Audio-Text-to-Text
•
8B
•
Updated
Apr 13
•
196k
•
185
Upvote
-
Share collection
View history
Collection guide
Browse collections