VLMs - a hg2wzh Collection

hg2wzh 's Collections

VLMs

updated Apr 25, 2025

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 80
Multimodal Latent Language Modeling with Next-Token Diffusion

Paper • 2412.08635 • Published Dec 11, 2024 • 50
ATH-MaaS/Ovis2-2B

Image-Text-to-Text • 2B • Updated Aug 15, 2025 • 221 • 60
DAMO-NLP-SG/VideoLLaMA3-2B

Video-Text-to-Text • 2B • Updated Sep 3, 2025 • 2.38k • 21
ATH-MaaS/Ovis2-16B

Image-Text-to-Text • 16B • Updated Aug 15, 2025 • 26 • 101
microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 541k • 1.61k
StarJiaxing/R1-Omni-0.5B

1B • Updated Mar 24, 2025 • 23 • 83
Skywork/Skywork-R1V2-38B

Image-Text-to-Text • 38B • Updated Jun 10, 2025 • 34 • 127