Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue Paper • 2606.31719 • Published 4 days ago • 4
Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models Paper • 2606.11324 • Published 25 days ago • 170
SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control Paper • 2605.27891 • Published May 27 • 8
LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training Paper • 2605.29888 • Published May 28 • 34
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 196
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion Paper • 2605.12825 • Published May 12 • 12