LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper • 2501.03895 • Published Jan 7, 2025 • 52
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published Jan 10, 2025 • 65
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot Paper • 2501.09012 • Published Jan 15, 2025 • 10