Image-Text-to-Text
Transformers
Safetensors
English
idefics3
multimodal
vision
conversational

This model enabling video understanding and multi-image understanding capabilities

#20
by xJohn - opened

Can support enabling video understanding and multi-image understanding capabilities?

xJohn changed discussion status to closed

Sign up or log in to comment