Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games Paper • 2606.19338 • Published 14 days ago • 48
JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence Paper • 2606.14777 • Published 21 days ago • 206
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Paper • 2602.12205 • Published Feb 13 • 83
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing Paper • 2601.21957 • Published Jan 29 • 23
UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing Paper • 2602.02437 • Published Feb 2 • 80
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning Paper • 2510.27606 • Published Oct 31, 2025 • 31
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper • 2510.24411 • Published Oct 28, 2025 • 73
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence Paper • 2510.24693 • Published Oct 28, 2025 • 19
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27, 2025 • 181
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation Paper • 2510.18701 • Published Oct 21, 2025 • 68
G^2RPO: Granular GRPO for Precise Reward in Flow Models Paper • 2510.01982 • Published Oct 2, 2025 • 8
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 174
SPARK: Synergistic Policy And Reward Co-Evolving Framework Paper • 2509.22624 • Published Sep 26, 2025 • 19
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning Paper • 2509.22647 • Published Sep 26, 2025 • 37