SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning Paper • 2512.13874 • Published 10 days ago • 16
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models Paper • 2512.13607 • Published 10 days ago • 26
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published 8 days ago • 17
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published 24 days ago • 63
Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published 24 days ago • 50
Agentic Learner with Grow-and-Refine Multimodal Semantic Memory Paper • 2511.21678 • Published 29 days ago • 11
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms Paper • 2511.17592 • Published Nov 17 • 118
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall Paper • 2510.19304 • Published Oct 22 • 23
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents Paper • 2510.14967 • Published Oct 16 • 33
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26 • 134
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3 • 75