Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It Paper • 2606.26027 • Published 6 days ago • 16
Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It Paper • 2606.26027 • Published 6 days ago • 16
Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do Paper • 2606.22565 • Published 9 days ago • 9
Look Light, Think Heavy: What Multimodal Chain-of-Thought Reasoning Can and Cannot Do Paper • 2606.22565 • Published 9 days ago • 9
Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 7 days ago • 140
Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion Paper • 2606.15236 • Published 14 days ago • 21
LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching Paper • 2606.16428 • Published 15 days ago • 39
TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs Paper • 2606.09030 • Published 22 days ago • 30
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? Paper • 2606.17861 • Published 14 days ago • 58
ACE-Ego-0: Unifying Egocentric Human and Robotic Data for VLA Pretraining Paper • 2606.17200 • Published 15 days ago • 53
FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain Paper • 2505.17471 • Published May 23, 2025
Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models Paper • 2402.19103 • Published Feb 29, 2024
MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models Paper • 2410.09542 • Published Oct 12, 2024
Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning Paper • 2402.18344 • Published Feb 28, 2024 • 1
Fixing the Broken Compass: Diagnosing and Improving Inference-Time Reward Modeling Paper • 2503.05188 • Published Mar 7, 2025
MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning Paper • 2603.02024 • Published Mar 2 • 47
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models Paper • 2603.11896 • Published Mar 12 • 10
Towards Robust Knowledge Unlearning: An Adversarial Framework for Assessing and Improving Unlearning Robustness in Large Language Models Paper • 2408.10682 • Published Aug 20, 2024