-
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
Paper • 2603.04597 • Published • 210 -
SII-Enigma/Llama3.2-8B-Ins-AMPO
Text Generation • 8B • Updated • 76 -
Understanding R1-Zero-Like Training: A Critical Perspective
Paper • 2503.20783 • Published • 59 -
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs
Paper • 2509.25779 • Published • 19
ming
elonming
AI & ML interests
None yet
Recent Activity
updated a collection about 8 hours ago
DailyPapers updated a collection about 14 hours ago
DailyPapers updated a collection about 14 hours ago
DailyPapersOrganizations
None yet