Template Matters: Understanding the Role of Instruction Templates in Multimodal Language Model Evaluation and Training Paper • 2412.08307 • Published Dec 11, 2024
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective Paper • 2502.14296 • Published Feb 20, 2025 • 45
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems Paper • 2505.00212 • Published Apr 30, 2025 • 9
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory Paper • 2505.23617 • Published May 29, 2025
H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos Paper • 2505.11920 • Published May 17, 2025
CoAct-1: Computer-using Agents with Coding as Actions Paper • 2508.03923 • Published Aug 5, 2025 • 13
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published Aug 11, 2025 • 45
SOS: Synthetic Object Segments Improve Detection, Segmentation, and Grounding Paper • 2510.09110 • Published Oct 10, 2025
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning Paper • 2512.13874 • Published Dec 15, 2025 • 18
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published Jan 15 • 35
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration? Paper • 2602.07055 • Published Feb 4 • 23
Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos Paper • 2602.23543 • Published Feb 26 • 9
URDF-Anything+: Autoregressive Articulated 3D Models Generation for Physical Simulation Paper • 2603.14010 • Published Mar 14 • 1
MolmoPoint: Better Pointing for VLMs with Grounding Tokens Paper • 2603.28069 • Published Mar 30 • 9
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published Apr 9 • 248
You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass Paper • 2604.10966 • Published Apr 13 • 12