Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published 15 days ago • 141
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 20 days ago • 210
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems Paper • 2505.00212 • Published Apr 30, 2025 • 9
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 888