Describe and highlight entities in images
Generate captions and answer questions from an image
Segment objects in images using text prompts