Dr. Seg: Revisiting GRPO Training for Visual Large Language Models through Perception-Oriented Design Paper • 2603.00152 • Published Feb 25 • 2
V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning Paper • 2606.25319 • Published 1 day ago • 17
V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning Paper • 2606.25319 • Published 1 day ago • 17
V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning Paper • 2606.25319 • Published 1 day ago • 17
On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models Paper • 2602.03392 • Published Feb 3 • 59
Dr. Seg: Revisiting GRPO Training for Visual Large Language Models through Perception-Oriented Design Paper • 2603.00152 • Published Feb 25 • 2