Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ingridtv 's Collections
Document understanding
Medical LM, Specific
Medical images, encoding
GenAI/LLM
Multimodal/VLM

Multimodal/VLM

updated Nov 18, 2025
Upvote
-

  • microsoft/Phi-4-multimodal-instruct

    Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 284k • 1.57k

  • microsoft/Phi-4-mini-instruct

    Text Generation • 4B • Updated Dec 10, 2025 • 166k • 675

  • SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

    Paper • 2503.11576 • Published Mar 14, 2025 • 138

  • Emerging Properties in Unified Multimodal Pretraining

    Paper • 2505.14683 • Published May 20, 2025 • 133

  • google/medgemma-4b-it

    Image-Text-to-Text • Updated Oct 28, 2025 • 376k • 877

  • kelkalot/medgemma-4b-it-GGUF

    4B • Updated May 22, 2025 • 261 • 10

  • Qwen/Qwen3-VL-8B-Instruct

    Image-Text-to-Text • 9B • Updated Oct 15, 2025 • 2.42M • • 727

  • Qwen/Qwen3-VL-8B-Instruct-GGUF

    Image-Text-to-Text • 8B • Updated Nov 1, 2025 • 39.7k • 57

  • Qwen/Qwen3-VL-2B-Instruct-GGUF

    Image-Text-to-Text • 2B • Updated Nov 1, 2025 • 21.9k • 30

  • deepseek-ai/DeepSeek-OCR

    Image-Text-to-Text • 3B • Updated Nov 4, 2025 • 2.98M • 3.13k
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs