Chat with a multimodal AI using images, video, and text
a tiny vision language model
Extract text from images in multiple languages