ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors Paper • 2603.04338 • Published 10 days ago • 21
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published 11 days ago • 81
The Quest for Generalizable Motion Generation: Data, Model, and Evaluation Paper • 2510.26794 • Published Oct 30, 2025 • 27
ConsistCompose: Unified Multimodal Layout Control for Image Composition Paper • 2511.18333 • Published Nov 23, 2025 • 4
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition Paper • 2602.08439 • Published Feb 9 • 28
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published Jan 29 • 73
sensenova/SenseNova-SI-1.1-Qwen2.5-VL-3B Image-Text-to-Text • 4B • Updated Dec 9, 2025 • 831 • 4
sensenova/SenseNova-SI-1.1-InternVL3-8B-800K Image-Text-to-Text • 8B • Updated Dec 23, 2025 • 2 • 2
sensenova/SenseNova-SI-1.1-Qwen2.5-VL-7B Image-Text-to-Text • 8B • Updated Dec 9, 2025 • 841 • 4