LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 2 days ago • 99
OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning Paper • 2605.28691 • Published 1 day ago • 13
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 1 day ago • 32
ResearchMath-14K: Scaling Research-Level Mathematics via Agents Paper • 2605.28003 • Published 1 day ago • 25
DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning Paper • 2605.25604 • Published 3 days ago • 125
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published 6 days ago • 184
See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding Paper • 2605.18018 • Published 10 days ago • 31
Rethinking Cross-Layer Information Routing in Diffusion Transformers Paper • 2605.20708 • Published 8 days ago • 101
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published 8 days ago • 100
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 8 days ago • 202
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos Paper • 2605.18233 • Published 10 days ago • 90
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning Paper • 2605.22012 • Published 7 days ago • 45
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 9 days ago • 100
FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching Paper • 2605.20910 • Published 8 days ago • 29
Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning Paper • 2605.22642 • Published 7 days ago • 35
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published 14 days ago • 143
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond Paper • 2605.19660 • Published 9 days ago • 39