KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance Paper • 2604.12627 • Published 2 days ago • 80
From Word to World: Can Large Language Models be Implicit Text-based World Models? Paper • 2512.18832 • Published Dec 21, 2025 • 15
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks Paper • 2604.08865 • Published 6 days ago • 24