VLFM
updated
Kosmos-2.5: A Multimodal Literate Model
Paper
• 2309.11419
• Published
• 56
Mirasol3B: A Multimodal Autoregressive model for time-aligned and
contextual modalities
Paper
• 2311.05698
• Published
• 11
Florence-2: Advancing a Unified Representation for a Variety of Vision
Tasks
Paper
• 2311.06242
• Published
• 95
PolyMaX: General Dense Prediction with Mask Transformer
Paper
• 2311.05770
• Published
• 8
Learning Vision from Models Rivals Learning Vision from Data
Paper
• 2312.17742
• Published
• 16
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning
Capabilities
Paper
• 2401.12168
• Published
• 29
A Survey of Resource-efficient LLM and Multimodal Foundation Models
Paper
• 2401.08092
• Published
• 3
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on
Generalizability, Trustworthiness and Causality through Four Modalities
Paper
• 2401.15071
• Published
• 37
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD
Generalization
Paper
• 2401.15914
• Published
• 7
MouSi: Poly-Visual-Expert Vision-Language Models
Paper
• 2401.17221
• Published
• 9
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
Paper
• 2401.17093
• Published
• 20
DataComp: In search of the next generation of multimodal datasets
Paper
• 2304.14108
• Published
• 2
Question Aware Vision Transformer for Multimodal Reasoning
Paper
• 2402.05472
• Published
• 10
Paper
• 2309.16671
• Published
• 21
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Paper
• 2402.12226
• Published
• 45
CLoVe: Encoding Compositional Language in Contrastive Vision-Language
Models
Paper
• 2402.15021
• Published
• 12
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper
• 2403.05525
• Published
• 49
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Paper
• 2403.01487
• Published
• 16
Ferret-v2: An Improved Baseline for Referring and Grounding with Large
Language Models
Paper
• 2404.07973
• Published
• 32
Groma: Localized Visual Tokenization for Grounding Multimodal Large
Language Models
Paper
• 2404.13013
• Published
• 31
An Introduction to Vision-Language Modeling
Paper
• 2405.17247
• Published
• 90
Dense Connector for MLLMs
Paper
• 2405.13800
• Published
• 24