JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence Paper • 2606.14777 • Published 22 days ago • 208
Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis Paper • 2605.18451 • Published May 18 • 41
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development Paper • 2603.27460 • Published Mar 29 • 72
CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains Paper • 2603.28474 • Published Mar 30 • 9
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published Mar 26 • 53
EVA: Efficient Reinforcement Learning for End-to-End Video Agent Paper • 2603.22918 • Published Mar 24 • 44
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback Paper • 2603.08561 • Published Mar 9 • 12
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback Paper • 2603.08561 • Published Mar 9 • 12
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published Feb 2 • 140
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published Dec 26, 2025 • 61
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18, 2025 • 24
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Paper • 2505.13427 • Published May 19, 2025 • 26
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published May 18, 2025 • 24
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Paper • 2505.13427 • Published May 19, 2025 • 26