VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery Paper • 2509.17191 • Published Sep 21, 2025 • 1
3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence Paper • 2601.06496 • Published 16 days ago • 1
3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence Paper • 2601.06496 • Published 16 days ago • 1
DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion Paper • 2510.15264 • Published Oct 17, 2025 • 4
EgoLCD: Egocentric Video Generation with Long Context Diffusion Paper • 2512.04515 • Published Dec 4, 2025 • 6
EgoLCD: Egocentric Video Generation with Long Context Diffusion Paper • 2512.04515 • Published Dec 4, 2025 • 6 • 2
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation Paper • 2511.22973 • Published Nov 28, 2025 • 5
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation Paper • 2511.22973 • Published Nov 28, 2025 • 5 • 2
MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots Paper • 2511.17889 • Published Nov 22, 2025 • 5
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation Paper • 2511.20714 • Published Nov 25, 2025 • 48
EvoVLA: Self-Evolving Vision-Language-Action Model Paper • 2511.16166 • Published Nov 20, 2025 • 6 • 2