Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
hg2wzh 's Collections
Eval-bench
Text-to-Image
Datasets
Reasoning
Embedding
CLIP series
VLMs
LLMs

VLMs

updated Apr 25, 2025
Upvote
-

  • Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

    Paper • 2409.12191 • Published Sep 18, 2024 • 79

  • Multimodal Latent Language Modeling with Next-Token Diffusion

    Paper • 2412.08635 • Published Dec 11, 2024 • 49

  • AIDC-AI/Ovis2-2B

    Image-Text-to-Text • Updated Aug 15, 2025 • 1.08k • 60

  • DAMO-NLP-SG/VideoLLaMA3-2B

    Video-Text-to-Text • 2B • Updated Sep 3, 2025 • 5.49k • 21

  • AIDC-AI/Ovis2-16B

    Image-Text-to-Text • 16B • Updated Aug 15, 2025 • 40 • 98

  • microsoft/Phi-4-multimodal-instruct

    Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 439k • 1.6k

  • StarJiaxing/R1-Omni-0.5B

    1B • Updated Mar 24, 2025 • 52 • 82

  • Skywork/Skywork-R1V2-38B

    Image-Text-to-Text • 38B • Updated Jun 10, 2025 • 77 • 125
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs