Multimodal models - a s-emanuilov Collection

s-emanuilov 's Collections

Licensing Oracle (experiments)

Tucan — Tool using and function calling in Bulgarian

Query expansion

Multimodal models

Small Language Models

Multimodal models

updated Jan 16, 2025

Papers on AI models that combine vision and language capabilities.

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7, 2025 • 51
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

Paper • 2501.06186 • Published Jan 10, 2025 • 67
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot

Paper • 2501.09012 • Published Jan 15, 2025 • 10