Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue Paper • 2606.31719 • Published 4 days ago • 4
Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models Paper • 2606.11324 • Published 25 days ago • 170
SmartDirector: Keyframe-Conditioned Cinematic Video Generation with Narrative Pacing Control Paper • 2605.27891 • Published May 27 • 8
LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training Paper • 2605.29888 • Published May 28 • 34
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 196
Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion Paper • 2605.12825 • Published May 12 • 12
Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models Paper • 2604.01618 • Published Apr 2 • 15
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 509
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 353
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published Mar 17 • 312
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models Paper • 2602.22859 • Published Feb 26 • 150
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published Feb 11 • 221
TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents Paper • 2602.07274 • Published Feb 6 • 211
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published Feb 11 • 245