How to use this model without mmproj?

by alvanalrakib - opened Dec 15, 2025

Dec 15, 2025

I’m a bit confused about the multimodal setup.

After inspecting the GGUF metadata, I don’t see any vision encoder
or mmproj tensors—only text model weights.

Can you confirm:
• Is this model intended to be text-only despite image tokens?
• Should image tokens simply be ignored?
• Or is there a separate vision/projector repo that’s required?

Thanks for clarification.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment