OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation Paper • 2601.15369 • Published 6 days ago • 16
OpenVision 3 Collection A Family of Unified Visual Encoder with Unified Visual Representation. • 4 items • Updated about 12 hours ago • 1
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation Paper • 2601.15369 • Published 6 days ago • 16
SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards Paper • 2511.07403 • Published Nov 10, 2025 • 15
SpatialThinker Collection This collection consists of SpatialThinker 3B and 7B model checkpoints, and STVQA-7K, a Spatial VQA dataset used for training the models. • 4 items • Updated Nov 12, 2025 • 1
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4, 2025 • 59
LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Paper • 2510.22946 • Published Oct 27, 2025 • 18
AHELM: A Holistic Evaluation of Audio-Language Models Paper • 2508.21376 • Published Aug 29, 2025 • 9
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper • 2509.01644 • Published Sep 1, 2025 • 34
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper • 2509.01644 • Published Sep 1, 2025 • 34