DailyPapers - a elonming Collection

elonming 's Collections

DailyPapers

updated about 6 hours ago

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published 29 days ago • 210
SII-Enigma/Llama3.2-8B-Ins-AMPO

Text Generation • 8B • Updated 12 days ago • 76
Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26, 2025 • 59
Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Paper • 2509.25779 • Published Sep 30, 2025 • 19
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes

Paper • 2306.13649 • Published Jun 23, 2023 • 31
Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Paper • 2403.05612 • Published Mar 8, 2024 • 3
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Paper • 2505.11711 • Published May 16, 2025 • 11
Reinforcement Learning via Self-Distillation

Paper • 2601.20802 • Published Jan 28 • 43
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2, 2025 • 237
AI Can Learn Scientific Taste

Paper • 2603.14473 • Published 18 days ago • 413
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4, 2025 • 104
GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

Paper • 2510.04374 • Published Oct 5, 2025
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?

Paper • 2502.12115 • Published Feb 17, 2025 • 46
Persona Features Control Emergent Misalignment

Paper • 2506.19823 • Published Jun 24, 2025
π_{0.5}: a Vision-Language-Action Model with Open-World Generalization

Paper • 2504.16054 • Published Apr 22, 2025 • 4