EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models Paper • 2603.12252 • Published 1 day ago • 9
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data Paper • 2603.09206 • Published 4 days ago • 41
Reasoning Models Struggle to Control their Chains of Thought Paper • 2603.05706 • Published 8 days ago • 26
Towards Multimodal Lifelong Understanding: A Dataset and Agentic Baseline Paper • 2603.05484 • Published 8 days ago • 4
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 10 days ago • 88
AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games Paper • 2602.17594 • Published 22 days ago • 9
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference Paper • 2602.21548 • Published 17 days ago • 43
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception Paper • 2602.11858 • Published 30 days ago • 59
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning Paper • 2602.11748 • Published 30 days ago • 30
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads Paper • 2602.09443 • Published Feb 10 • 57
Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations Paper • 2602.05885 • Published Feb 5 • 28
LatentMem: Customizing Latent Memory for Multi-Agent Systems Paper • 2602.03036 • Published Feb 3 • 14
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published Feb 3 • 62
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning Paper • 2601.21468 • Published Jan 29 • 25
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing Paper • 2601.21957 • Published Jan 29 • 19
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought Paper • 2601.23184 • Published Jan 30 • 36