ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence Paper • 2605.26340 • Published May 25 • 36
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards Paper • 2605.10899 • Published May 11 • 79
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints Paper • 2410.06458 • Published Oct 9, 2024 • 8
Model Extrapolation Expedites Alignment Collection Better aligned models obtained by model extrapolation (ExPO) • 23 items • Updated Mar 2 • 17