SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks Paper • 2604.08865 • Published 6 days ago • 25
From Word to World: Can Large Language Models be Implicit Text-based World Models? Paper • 2512.18832 • Published Dec 21, 2025 • 15