Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published about 21 hours ago • 59
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing Paper • 2603.03143 • Published 10 days ago • 135
CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation Paper • 2603.08652 • Published 4 days ago • 31
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence Paper • 2603.07660 • Published 5 days ago • 77
CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation Paper • 2603.08652 • Published 4 days ago • 31
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios Paper • 2602.22638 • Published 15 days ago • 106
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published 10 days ago • 81
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 10 days ago • 88
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published Feb 3 • 62
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models Paper • 2602.03392 • Published Feb 3 • 54
GEBench: Benchmarking Image Generation Models as GUI Environments Paper • 2602.09007 • Published Feb 9 • 39
GEBench: Benchmarking Image Generation Models as GUI Environments Paper • 2602.09007 • Published Feb 9 • 39
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published Jan 29 • 154
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models Paper • 2601.20354 • Published Jan 28 • 111
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published Dec 31, 2025 • 65