From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper • 2401.15071 • Published Jan 26, 2024 • 37
Assessment of Multimodal Large Language Models in Alignment with Human Values Paper • 2403.17830 • Published Mar 26, 2024
OASIS: Open Agent Social Interaction Simulations with One Million Agents Paper • 2411.11581 • Published Nov 18, 2024
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 228
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective Paper • 2509.18905 • Published Sep 23, 2025 • 29
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards Paper • 2510.08529 • Published Oct 9, 2025 • 18
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning Paper • 2509.25300 • Published Sep 29, 2025 • 8
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published Dec 22, 2025 • 64
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 119
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics Paper • 2512.13660 • Published Dec 15, 2025 • 37
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework Paper • 2512.03041 • Published Dec 2, 2025 • 63
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities Paper • 2510.08759 • Published Oct 9, 2025 • 46
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards Paper • 2510.08529 • Published Oct 9, 2025 • 18
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Paper • 2509.24695 • Published Sep 29, 2025 • 44
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective Paper • 2509.18905 • Published Sep 23, 2025 • 29