TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization Paper • 2601.16480 • Published 4 days ago • 49
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models Paper • 2601.14004 • Published 6 days ago • 44
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Paper • 2601.11077 • Published 11 days ago • 63
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Paper • 2601.11077 • Published 11 days ago • 63
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment Paper • 2601.01576 • Published 22 days ago • 18
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published Dec 4, 2025 • 80
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published Dec 4, 2025 • 80
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 213
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published Oct 21, 2025 • 84
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 262
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 262
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models Paper • 2404.00884 • Published Apr 1, 2024 • 1
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning Paper • 2402.05808 • Published Feb 8, 2024
A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models Paper • 2303.10420 • Published Mar 18, 2023 • 1
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7, 2025 • 39