TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration Paper β’ 2604.14116 β’ Published 22 days ago β’ 13
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space Paper β’ 2604.14142 β’ Published 22 days ago β’ 29
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents Paper β’ 2604.14004 β’ Published 22 days ago β’ 30
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper β’ 2604.07429 β’ Published 29 days ago β’ 119
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Paper β’ 2604.11626 β’ Published 24 days ago β’ 101
Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop Paper β’ 2506.10968 β’ Published Jun 12, 2025 β’ 1
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics Paper β’ 2602.12617 β’ Published Feb 13 β’ 20
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper β’ 2601.08763 β’ Published Jan 13 β’ 149
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning Paper β’ 2505.24298 β’ Published May 30, 2025 β’ 34
view article Article πΊπ¦ββ¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs Dec 4, 2024 β’ 80
Skywork-o1-Open Collection Skywork o1 open model collections β’ 3 items β’ Updated Jun 12, 2025 β’ 22
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. β’ 6 items β’ Updated 17 days ago β’ 156