Why Fine-Tuning Encourages Hallucinations and How to Fix It Paper • 2604.15574 • Published Apr 16 • 25
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published Apr 27 • 71
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora Paper • 2604.24819 • Published Apr 27 • 91
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents Paper • 2604.26752 • Published Apr 29 • 112
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding Paper • 2604.26779 • Published Apr 29 • 14
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Paper • 2605.22791 • Published May 21 • 33
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps Paper • 2605.16928 • Published May 16 • 97
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models Paper • 2605.20177 • Published May 19 • 10
Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR Paper • 2605.19282 • Published May 19 • 9
Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models Paper • 2605.26895 • Published May 26 • 20
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards Paper • 2605.31584 • Published 29 days ago • 43
Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation Paper • 2605.26844 • Published May 26 • 26
ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning Paper • 2606.03503 • Published 24 days ago • 25
Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings Paper • 2606.07502 • Published 22 days ago • 97
How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs Paper • 2606.10646 • Published 18 days ago • 6
Redesign Mixture-of-Experts Routers with Manifold Power Iteration Paper • 2606.12397 • Published 17 days ago • 87
Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2606.15007 • Published 15 days ago • 16
Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories Paper • 2606.11176 • Published 18 days ago • 126
Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling Paper • 2606.12370 • Published 17 days ago • 21
Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale Paper • 2606.15079 • Published 14 days ago • 84
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 11 days ago • 61
Rethinking the Role of Efficient Attention in Hybrid Architectures Paper • 2606.15378 • Published 14 days ago • 17
Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models Paper • 2606.19750 • Published 9 days ago • 3
Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning Paper • 2606.18831 • Published 10 days ago • 6
The Hitchhiker's Guide to Agentic AI: From Foundations to Systems Paper • 2606.24937 • Published 5 days ago • 13
Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning Paper • 2606.24133 • Published 4 days ago • 8