Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published Nov 24, 2025 • 29
ReasonFLux-Coder Collection Coding LLMs excel at both writing code and generating unit tests. • 9 items • Updated May 26, 2025 • 11
ReasonFlux Series Collection A series of released reasoning models based on ReasonFlux • 9 items • Updated Oct 31, 2025 • 3
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 40
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents Paper • 2510.14967 • Published Oct 16, 2025 • 34
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published Oct 16, 2025 • 67
Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization Paper • 2510.13554 • Published Oct 15, 2025 • 58
Revisiting Model Interpolation for Efficient Reasoning Paper • 2510.10977 • Published Oct 13, 2025 • 10
Generative Universal Verifier as Multimodal Meta-Reasoner Paper • 2510.13804 • Published Oct 15, 2025 • 27
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 179