Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2606.15007 • Published 13 days ago • 15
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models Paper • 2606.11025 • Published 16 days ago • 41
Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory Paper • 2606.06523 • Published 23 days ago • 6
AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents Paper • 2606.05597 • Published 21 days ago • 4
Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues Paper • 2606.02754 • Published 24 days ago • 13
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 24 days ago • 232
MinT: Managed Infrastructure for Training and Serving Millions of LLMs Paper • 2605.13779 • Published May 13 • 223
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling Paper • 2604.28185 • Published Apr 30 • 92
Seedance 2.0: Advancing Video Generation for World Complexity Paper • 2604.14148 • Published Apr 15 • 166
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published Apr 15 • 127
HippoCamp: Benchmarking Contextual Agents on Personal Computers Paper • 2604.01221 • Published Apr 1 • 30
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning Paper • 2603.26653 • Published Mar 27 • 18
Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models Paper • 2603.13985 • Published Mar 14 • 11
Future-KL Regularized GRPO: Process-Level Credit Assignment from f-Divergence Regularization Paper • 2601.10201 • Published May 23 • 10