ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning Paper • 2603.05863 • Published 7 days ago • 3
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards Paper • 2603.09117 • Published 3 days ago • 5
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness Paper • 2603.09200 • Published 3 days ago • 5
Test-Driven AI Agent Definition (TDAD): Compiling Tool-Using Agents from Behavioral Specifications Paper • 2603.08806 • Published 3 days ago • 6
Do What I Say: A Spoken Prompt Dataset for Instruction-Following Paper • 2603.09881 • Published 2 days ago • 7
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering Paper • 2603.06854 • Published 6 days ago • 11
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs Paper • 2603.09095 • Published 3 days ago • 23
Stepping VLMs onto the Court: Benchmarking Spatial Intelligence in Sports Paper • 2603.09896 • Published 2 days ago • 24
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion Paper • 2603.06577 • Published 6 days ago • 43
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published 3 days ago • 41
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs Paper • 2603.09906 • Published 2 days ago • 55
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR Paper • 2603.10101 • Published 2 days ago • 3
Lost in Backpropagation: The LM Head is a Gradient Bottleneck Paper • 2603.10145 • Published 2 days ago • 3
UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations Paper • 2603.10702 • Published 1 day ago • 3
Causal Concept Graphs in LLM Latent Space for Stepwise Reasoning Paper • 2603.10377 • Published 2 days ago • 3
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published 8 days ago • 5
RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation Paper • 2603.09723 • Published 3 days ago • 6
Prism-Δ: Differential Subspace Steering for Prompt Highlighting in Large Language Models Paper • 2603.10705 • Published 1 day ago • 10
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback Paper • 2603.08561 • Published 3 days ago • 10