Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching Paper • 2311.12751 • Published Nov 21, 2023 • 2
UnityShots: Memory-Driven Multi-Shot Audio-Video Generation with Boundary-Aware Gating Paper • 2606.21661 • Published 8 days ago • 21
Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories Paper • 2606.11176 • Published 18 days ago • 126
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published May 18 • 116
view article Article NEO-unify: Building Native Multimodal Unified Models End to End sensenova • Mar 5 • 167
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing Paper • 2604.04911 • Published Apr 6 • 36
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published Apr 24 • 231
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 183
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26, 2025 • 189
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17, 2025 • 80