Inference Providers
Active filters: VLM
Video-Text-to-Text
• 2B • Updated • 15.8k
• 454
numind/NuMarkdown-8B-Thinking
Image-to-Text
• 8B • Updated • 38.5k
• 472
Video-Text-to-Text
• 2B • Updated • 1.21k
• 5
nvidia/NVIDIA-Nemotron-Parse-v1.2
Image-Text-to-Text
• 0.9B • Updated • 141k
• 37
Image-Text-to-Text
• 9B • Updated • 112
• 5
Image-Text-to-Text
• 2B • Updated • 563
• 34
Image-Text-to-Text
• 1B • Updated • 2.38k
• 30
nvidia/VILA-HD-8B-PS3-1.5K-SigLIP
Image-Text-to-Text
• Updated • 58
• 4
nvidia/VILA-HD-8B-PS3-4K-SigLIP
Image-Text-to-Text
• Updated • 62
• 2
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
Image-Text-to-Text
• 9B • Updated • 1.21M
• 179
nvidia/VILA-HD-8B-PS3-1.5K-SigLIP2
Image-Text-to-Text
• Updated • 487
• 1
nvidia/VILA-HD-8B-PS3-4K-SigLIP2
Image-Text-to-Text
• Updated • 55
• 3
nvidia/VILA-HD-8B-PS3-1.5K-C-RADIOv2
Image-Text-to-Text
• Updated • 58
• 1
nvidia/VILA-HD-8B-PS3-4K-C-RADIOv2
Image-Text-to-Text
• Updated • 60
• 1
nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16
Image-Text-to-Text
• 13B • Updated • 161k
• 83
nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-NVFP4-QAD
Image-Text-to-Text
• 8B • Updated • 4.17k
• 27
Image-Text-to-Text
• 0.8B • Updated • 179
• 1
mradermacher/ToolCUA-8B-GGUF
8B • Updated • 779
• 2
adnankhan-11/VisionNav-3B
4B • Updated • 123
• 1
mradermacher/VisionNav-3B-GGUF
3B • Updated • 547
• 1
Efficient-Large-Model/VILA-13b
Text Generation
• 13B • Updated • 24
• 20
Efficient-Large-Model/VILA-7b
Text Generation
• 7B • Updated • 602
• 27
Efficient-Large-Model/VILA-7b-4bit-awq
Text Generation
• Updated • 14
• 2
Efficient-Large-Model/VILA-13b-4bit-awq
Text Generation
• Updated • 13
• 2
Efficient-Large-Model/VILA-2.7b
Text Generation
• 3B • Updated • 139
• 15
TIGER-Lab/Mantis-bakllava-7b
Image-Text-to-Text
• 8B • Updated • 50
• 5
TIGER-Lab/Mantis-llava-7b
Image-Text-to-Text
• 7B • Updated • 23
• 16
Efficient-Large-Model/VILA1.5-3b
Text Generation
• Updated • 1.58k
• 34
Efficient-Large-Model/VILA1.5-13b
Text Generation
• Updated • 257
• 5