See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding Paper • 2605.18018 • Published 8 days ago • 28
Rethinking Cross-Layer Information Routing in Diffusion Transformers Paper • 2605.20708 • Published 6 days ago • 81
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published 6 days ago • 87
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 6 days ago • 198
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos Paper • 2605.18233 • Published 8 days ago • 89
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning Paper • 2605.22012 • Published 5 days ago • 43
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 7 days ago • 94
FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching Paper • 2605.20910 • Published 6 days ago • 24
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning Paper • 2605.22642 • Published 5 days ago • 33
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published 12 days ago • 143
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond Paper • 2605.19660 • Published 7 days ago • 39
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 14 days ago • 190
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment Paper • 2605.19577 • Published 7 days ago • 57
OpenComputer: Verifiable Software Worlds for Computer-Use Agents Paper • 2605.19769 • Published 7 days ago • 57
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation Paper • 2605.18739 • Published 8 days ago • 109