Instruction-Guided Lesion Segmentation for Chest X-rays with Automatically Generated Large-Scale Dataset Paper • 2511.15186 • Published Nov 19 • 25
When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling Paper • 2510.15346 • Published Oct 17 • 33
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping Paper • 2509.21880 • Published Sep 26 • 52
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published Sep 26 • 57
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models Paper • 2505.17225 • Published May 22 • 64