Generate text from images and queries
Identify key entities in text
Describe images and extract text with Florence-2
Identify named entities in text