How to use this model without mmproj?
#1
by
alvanalrakib
- opened
I’m a bit confused about the multimodal setup.
The model’s chat template includes image tokens
(<|begin_of_image|>, <|image|>, <|end_of_image|>),
but there is no mmproj / vision projector in the repo,
and llama.cpp throws “image input is not supported”.
After inspecting the GGUF metadata, I don’t see any vision encoder
or mmproj tensors—only text model weights.
Can you confirm:
• Is this model intended to be text-only despite image tokens?
• Should image tokens simply be ignored?
• Or is there a separate vision/projector repo that’s required?
Thanks for clarification.