Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections trending this week

Audio for video

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

Paper • 2309.16429 • Published Sep 28, 2023 • 11

Spaces for Text-to-Speech synthesis.

Running

en-tts

💬

Generate speech from text
stefantaubert/zho-tts

Text-to-Speech • Updated Apr 24, 2024
Running

pinyin-to-ipa

🔄

Convert Pinyin to IPA

OpenELM Instruct Models

apple/OpenELM-270M-Instruct

Text Generation • 0.3B • Updated Feb 28, 2025 • 2.43k • 144
apple/OpenELM-450M-Instruct

Text Generation • 0.5B • Updated Feb 28, 2025 • 922 • 50
apple/OpenELM-1_1B-Instruct

Text Generation • Updated Feb 28, 2025 • 1.57M • 73
apple/OpenELM-3B-Instruct

Text Generation • 3B • Updated Feb 28, 2025 • 2.54k • 339

GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering.

GIT: A Generative Image-to-text Transformer for Vision and Language

Paper • 2205.14100 • Published May 27, 2022 • 1
microsoft/git-base

Image-to-Text • 0.2B • Updated Apr 24, 2023 • 12.1k • 106
microsoft/git-large

Image-to-Text • Updated Feb 8, 2023 • 740 • 17
microsoft/git-base-vqav2

Visual Question Answering • 0.2B • Updated Mar 9, 2024 • 121 • 20

ARCH models, benchmark and paper

This collection contains pre-trained models on the AudioSet dataset, offering a diverse set of features for audio representation learning.

Running

6

ARCH

📊

6

Compare audio representation models
ALM/wav2vec2-large-audioset

Audio Classification • 0.3B • Updated Jan 2 • 130 • 1
ALM/hubert-base-audioset

Audio Classification • Updated Jan 2 • 1.02k • 3
ALM/wav2vec2-base-audioset

Audio Classification • Updated Jan 2 • 918 • 1

Collection of TOP Open Source LLM, Sort by Best on top

meta-llama/Llama-3.1-405B-Instruct

Text Generation • 406B • Updated Sep 25, 2024 • 161k • 593
mistralai/Mistral-Large-Instruct-2407

Updated Jul 28, 2025 • 7.69k • 857
meta-llama/Llama-3.1-70B-Instruct

Text Generation • 71B • Updated Dec 15, 2024 • 1.06M • • 900
Qwen/Qwen2-72B-Instruct

Text Generation • 73B • Updated Oct 8, 2024 • 41.8k • • 718

Models focus on video understanding (previously known as LLaVA-NeXT-Video).

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 41
lmms-lab/LLaVA-Video-178K

Viewer • Updated Oct 11, 2024 • 1.63M • 25.3k • 187
lmms-lab/LLaVA-Video-7B-Qwen2

Video-Text-to-Text • 8B • Updated Oct 25, 2024 • 36.9k • 125
lmms-lab/LLaVA-Video-72B-Qwen2

Text Generation • 73B • Updated Oct 25, 2024 • 481 • 22

fla-hub/gla-1.3B-100B

Text Generation • 1B • Updated Sep 9, 2025 • 1.22k • 1
fla-hub/gla-2.7B-100B

Text Generation • 3B • Updated Feb 9, 2025 • 240
Gated Linear Attention Transformers with Hardware-Efficient Training

Paper • 2312.06635 • Published Dec 11, 2023 • 9

KickLang is a programming language and design system for orchestrated thought and cognitive workflows. It provides role presets and action verbs to...

vikhyatk/moondream2

Image-Text-to-Text • 2B • Updated Sep 23, 2025 • 4.94M • 1.4k
CohereLabs/c4ai-command-r-v01

Text Generation • Updated Apr 16, 2025 • 14k • 1.1k

Image Generation

Diffusion models 🧨

CompVis/stable-diffusion-v1-4

Text-to-Image • Updated Aug 23, 2023 • 472k • 6.99k
stable-diffusion-v1-5/stable-diffusion-v1-5

Text-to-Image • Updated Sep 7, 2024 • 1.67M • 1.05k
benjamin-paine/stable-diffusion-v1-5

Text-to-Image • Updated Oct 7, 2024 • 822 • 70
Comfy-Org/stable-diffusion-v1-5-archive

Updated Dec 10, 2025 • 146k • 90

Audio for video

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

Paper • 2309.16429 • Published Sep 28, 2023 • 11

Collection of TOP Open Source LLM, Sort by Best on top

meta-llama/Llama-3.1-405B-Instruct

Text Generation • 406B • Updated Sep 25, 2024 • 161k • 593
mistralai/Mistral-Large-Instruct-2407

Updated Jul 28, 2025 • 7.69k • 857
meta-llama/Llama-3.1-70B-Instruct

Text Generation • 71B • Updated Dec 15, 2024 • 1.06M • • 900
Qwen/Qwen2-72B-Instruct

Text Generation • 73B • Updated Oct 8, 2024 • 41.8k • • 718

Spaces for Text-to-Speech synthesis.

Running

en-tts

💬

Generate speech from text
stefantaubert/zho-tts

Text-to-Speech • Updated Apr 24, 2024
Running

pinyin-to-ipa

🔄

Convert Pinyin to IPA

Models focus on video understanding (previously known as LLaVA-NeXT-Video).

Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 41
lmms-lab/LLaVA-Video-178K

Viewer • Updated Oct 11, 2024 • 1.63M • 25.3k • 187
lmms-lab/LLaVA-Video-7B-Qwen2

Video-Text-to-Text • 8B • Updated Oct 25, 2024 • 36.9k • 125
lmms-lab/LLaVA-Video-72B-Qwen2

Text Generation • 73B • Updated Oct 25, 2024 • 481 • 22

OpenELM Instruct Models

apple/OpenELM-270M-Instruct

Text Generation • 0.3B • Updated Feb 28, 2025 • 2.43k • 144
apple/OpenELM-450M-Instruct

Text Generation • 0.5B • Updated Feb 28, 2025 • 922 • 50
apple/OpenELM-1_1B-Instruct

Text Generation • Updated Feb 28, 2025 • 1.57M • 73
apple/OpenELM-3B-Instruct

Text Generation • 3B • Updated Feb 28, 2025 • 2.54k • 339

fla-hub/gla-1.3B-100B

Text Generation • 1B • Updated Sep 9, 2025 • 1.22k • 1
fla-hub/gla-2.7B-100B

Text Generation • 3B • Updated Feb 9, 2025 • 240
Gated Linear Attention Transformers with Hardware-Efficient Training

Paper • 2312.06635 • Published Dec 11, 2023 • 9

GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering.

GIT: A Generative Image-to-text Transformer for Vision and Language

Paper • 2205.14100 • Published May 27, 2022 • 1
microsoft/git-base

Image-to-Text • 0.2B • Updated Apr 24, 2023 • 12.1k • 106
microsoft/git-large

Image-to-Text • Updated Feb 8, 2023 • 740 • 17
microsoft/git-base-vqav2

Visual Question Answering • 0.2B • Updated Mar 9, 2024 • 121 • 20

KickLang is a programming language and design system for orchestrated thought and cognitive workflows. It provides role presets and action verbs to...

vikhyatk/moondream2

Image-Text-to-Text • 2B • Updated Sep 23, 2025 • 4.94M • 1.4k
CohereLabs/c4ai-command-r-v01

Text Generation • Updated Apr 16, 2025 • 14k • 1.1k

ARCH models, benchmark and paper

This collection contains pre-trained models on the AudioSet dataset, offering a diverse set of features for audio representation learning.

Running

6

ARCH

📊

6

Compare audio representation models
ALM/wav2vec2-large-audioset

Audio Classification • 0.3B • Updated Jan 2 • 130 • 1
ALM/hubert-base-audioset

Audio Classification • Updated Jan 2 • 1.02k • 3
ALM/wav2vec2-base-audioset

Audio Classification • Updated Jan 2 • 918 • 1

Image Generation

Diffusion models 🧨

CompVis/stable-diffusion-v1-4

Text-to-Image • Updated Aug 23, 2023 • 472k • 6.99k
stable-diffusion-v1-5/stable-diffusion-v1-5

Text-to-Image • Updated Sep 7, 2024 • 1.67M • 1.05k
benjamin-paine/stable-diffusion-v1-5

Text-to-Image • Updated Oct 7, 2024 • 822 • 70
Comfy-Org/stable-diffusion-v1-5-archive

Updated Dec 10, 2025 • 146k • 90

Previous
1
...
19,152
19,153
19,154
19,155
19,156
19,157
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs