Annotate and describe images with text prompts
Ask questions about images
a tiny vision language model