Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning Paper β’ 2509.25534 β’ Published Sep 19, 2025 β’ 3
Learning to Align, Aligning to Learn: A Unified Approach for Self-Optimized Alignment Paper β’ 2508.07750 β’ Published Aug 11, 2025 β’ 21
view article Article πΊπ¦ββ¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs Dec 4, 2024 β’ 80