Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games Paper • 2606.19338 • Published 12 days ago • 48
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published May 11 • 46
SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction Paper • 2605.20110 • Published May 19 • 4
DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders Paper • 2605.22777 • Published May 21 • 5
LoMo: Local Modality Substitution for Deeper Vision-Language Fusion Paper • 2605.30265 • Published May 28 • 23
CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning Paper • 2606.09393 • Published 21 days ago
JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence Paper • 2606.14777 • Published 19 days ago • 204
JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence Paper • 2606.14777 • Published 19 days ago • 204