Multimodal/VLM - a ingridtv Collection

ingridtv 's Collections

Document understanding

Medical LM, Specific

Medical images, encoding

Multimodal/VLM

updated Apr 7

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 506k • 1.6k
microsoft/Phi-4-mini-instruct

Text Generation • Updated Dec 10, 2025 • 1.6M • • 743
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 160
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20, 2025 • 134
google/medgemma-4b-it

Image-Text-to-Text • Updated Oct 28, 2025 • 516k • 965
kelkalot/medgemma-4b-it-GGUF

4B • Updated May 22, 2025 • 205 • 10
Qwen/Qwen3-VL-8B-Instruct

Image-Text-to-Text • 9B • Updated Oct 15, 2025 • 7.82M • • 914
Qwen/Qwen3-VL-8B-Instruct-GGUF

Image-Text-to-Text • 8B • Updated Nov 1, 2025 • 40.3k • 95
Qwen/Qwen3-VL-2B-Instruct-GGUF

Image-Text-to-Text • 2B • Updated Nov 1, 2025 • 25.8k • 46
deepseek-ai/DeepSeek-OCR

Image-Text-to-Text • 3B • Updated Nov 4, 2025 • 3.07M • 3.24k
unsloth/gemma-4-26B-A4B-it-GGUF

Image-Text-to-Text • 25B • Updated 21 days ago • 2.97M • 781
unsloth/gemma-4-E4B-it-GGUF

Image-Text-to-Text • 8B • Updated 21 days ago • 1.13M • 444