Generate responses to video or image inputs
Smart Compressors for Long Video Understanding
Answer questions about videos using text