Describe images and extract text with Florence-2
Extract text from images in multiple languages
Transcribe audio files or YouTube videos into text