LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published May 18 • 116
MinT: Managed Infrastructure for Training and Serving Millions of LLMs Paper • 2605.13779 • Published May 13 • 223
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context Paper • 2605.13831 • Published May 13 • 89
AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward Paper • 2605.12495 • Published May 12 • 35
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published Apr 6 • 116
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners? Paper • 2603.25823 • Published Mar 26 • 44
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence Paper • 2603.07660 • Published Mar 8 • 87
DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning Paper • 2512.12799 • Published Dec 14, 2025 • 12
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12, 2025 • 218
VST Collection A comprehensive framework designed to cultivate VLMs with human-like visuospatial abilities. • 7 items • Updated 14 days ago • 6
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27, 2025 • 181
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction Paper • 2305.18752 • Published May 30, 2023 • 5