Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding Paper • 2604.26779 • Published 9 days ago • 13
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization Paper • 2604.24952 • Published 11 days ago • 6
Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models Paper • 2604.27251 • Published 9 days ago • 8
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence Paper • 2604.24954 • Published 11 days ago • 21
Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling Paper • 2604.27039 • Published 9 days ago • 24
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists Paper • 2604.28158 • Published 8 days ago • 45
Efficient Training on Multiple Consumer GPUs with RoundPipe Paper • 2604.27085 • Published 9 days ago • 38
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 8 days ago • 206
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills Paper • 2604.24026 • Published 11 days ago • 18
Convergent Evolution: How Different Language Models Learn Similar Number Representations Paper • 2604.20817 • Published 16 days ago • 7
Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL Paper • 2604.17073 • Published 20 days ago • 9
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts Paper • 2604.19835 • Published 17 days ago • 19
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Paper • 2604.13602 • Published 23 days ago • 32
DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Paper • 2604.13902 • Published 23 days ago • 62
Where does output diversity collapse in post-training? Paper • 2604.16027 • Published 21 days ago • 22