CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs Paper โข 2602.03048 โข Published 2 days ago โข 33
Running 1 Generalist Value Model V0 ๐ป 1 Predict how well language models will perform on new tasks
Running 1 Generalist Value Model V0 ๐ป 1 Predict how well language models will perform on new tasks
V_0: A Generalist Value Model for Any Policy at State Zero Paper โข 2602.03584 โข Published 1 day ago โข 20
Running 1 Generalist Value Model V0 ๐ป 1 Predict how well language models will perform on new tasks
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start Paper โข 2505.22334 โข Published May 28, 2025 โข 36
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO Paper โข 2505.22453 โข Published May 28, 2025 โข 46
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper โข 2505.02567 โข Published May 5, 2025 โข 80