ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors Paper • 2603.04338 • Published 8 days ago • 21
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published 9 days ago • 81
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published Feb 9 • 50
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition Paper • 2602.08439 • Published Feb 9 • 28
Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives Paper • 2601.20833 • Published Jan 28 • 182
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published Dec 22, 2025 • 66
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25, 2025 • 187
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 93
Scaling Spatial Intelligence with Multimodal Foundation Models Paper • 2511.13719 • Published Nov 17, 2025 • 47
Simulating the Visual World with Artificial Intelligence: A Roadmap Paper • 2511.08585 • Published Nov 11, 2025 • 30
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image Paper • 2511.13648 • Published Nov 17, 2025 • 52
The Quest for Generalizable Motion Generation: Data, Model, and Evaluation Paper • 2510.26794 • Published Oct 30, 2025 • 27
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published Sep 28, 2025 • 48
LLaVA-OneVision-1.5 Collection https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5 • 9 items • Updated Oct 21, 2025 • 19
Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published Sep 9, 2025 • 84
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9, 2025 • 105