GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization Paper • 2606.16771 • Published 6 days ago • 11
MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling Paper • 2606.13473 • Published 10 days ago • 89
Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill Paper • 2606.03980 • Published 19 days ago • 13
SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects Paper • 2605.19587 • Published May 19 • 10
SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents Paper • 2606.05761 • Published 17 days ago • 19
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published May 13 • 163
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs Paper • 2605.00814 • Published May 1 • 21
GEMS: Agent-Native Multimodal Generation with Memory and Skills Paper • 2603.28088 • Published Mar 30 • 86
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning Paper • 2601.18631 • Published Jan 26 • 48
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper • 2512.24165 • Published Dec 30, 2025 • 52
Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning Paper • 2511.20549 • Published Nov 25, 2025 • 27
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Paper • 2511.13704 • Published Nov 17, 2025 • 44
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17, 2025 • 135
VideoSSR: Video Self-Supervised Reinforcement Learning Paper • 2511.06281 • Published Nov 9, 2025 • 25
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published Oct 30, 2025 • 88
VPPO Data Collection Official training and evaluation datasets for the VPPO project. • 4 items • Updated Oct 13, 2025 • 3
VPPO Model Collection SOTA models for multimodal reasoning, fine-tuned with VPPO. Achieves superior performance by focusing on critical visual tokens. • 4 items • Updated Nov 7, 2025 • 4
Spotlight on Token Perception for Multimodal Reinforcement Learning Paper • 2510.09285 • Published Oct 10, 2025 • 37
Native Hybrid Attention for Efficient Sequence Modeling Paper • 2510.07019 • Published Oct 8, 2025 • 17