MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration Paper • 2602.01734 • Published 9 days ago • 30
QueST: Incentivizing LLMs to Generate Difficult Problems Paper • 2510.17715 • Published Oct 20, 2025 • 35
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration Paper • 2508.13755 • Published Aug 19, 2025 • 14
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 213
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19, 2025 • 118
shawnxzhu/CHARM-calibrated-Skywork-Reward-Llama-3.1-8B-v0.2 Text Classification • 8B • Updated Apr 14, 2025 • 1
CHARM_datasets Collection Datasets used in CHARM: Calibrating Reward Models With Chatbot Arena Scores. • 16 items • Updated Apr 14, 2025
CHARM_models Collection Models used in CHARM: Calibrating Reward Models With Chatbot Arena Scores. • 1 item • Updated Apr 14, 2025