Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding Paper • 2603.19235 • Published 1 day ago • 63 • 4
Tinted Frames: Question Framing Blinds Vision-Language Models Paper • 2603.19203 • Published 1 day ago • 13 • 2
MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction Paper • 2603.19231 • Published 1 day ago • 28 • 3
VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction Paper • 2603.13964 • Published 7 days ago • 2
PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark Paper • 2603.14456 • Published 6 days ago • 1 • 2
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? Paper • 2603.19017 • Published 1 day ago • 1 • 2
EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing Paper • 2603.19224 • Published 1 day ago • 15 • 2
OSM-based Domain Adaptation for Remote Sensing VLMs Paper • 2603.11804 • Published 9 days ago • 4 • 2
LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs Paper • 2603.19217 • Published 1 day ago • 26 • 2
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens Paper • 2603.19232 • Published 1 day ago • 26 • 2
Loc3R-VLM: Language-based Localization and 3D Reasoning with Vision-Language Models Paper • 2603.18002 • Published 2 days ago • 4 • 2
SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing Paper • 2603.19228 • Published 1 day ago • 56 • 3
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents Paper • 2603.18815 • Published 1 day ago • 5 • 1
Reasoning over mathematical objects: on-policy reward modeling and test time aggregation Paper • 2603.18886 • Published 1 day ago • 2 • 1
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining Paper • 2603.15030 • Published 5 days ago • 13 • 2
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model Paper • 2603.18524 • Published 2 days ago • 41 • 3
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents Paper • 2603.18429 • Published 2 days ago • 17 • 2