WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation Paper • 2605.25874 • Published 6 days ago • 98
SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild Paper • 2605.07604 • Published 23 days ago • 4
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 11 days ago • 204
Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning Paper • 2605.14386 • Published 17 days ago • 60
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution Paper • 2605.18401 • Published 13 days ago • 126
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published 18 days ago • 269
Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation Paper • 2605.04128 • Published 26 days ago • 17
DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Paper • 2604.13902 • Published Apr 15 • 62
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Paper • 2604.18982 • Published Apr 21 • 5
Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces Paper • 2604.08362 • Published Apr 9 • 16
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 630
SEAR: Schema-Based Evaluation and Routing for LLM Gateways Paper • 2603.26728 • Published Mar 20 • 12
6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models Paper • 2603.18742 • Published Mar 19 • 11
FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization Paper • 2603.19835 • Published Mar 20 • 352