One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications Paper • 2606.25621 • Published 3 days ago • 13
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 11 days ago • 61
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning Paper • 2606.13673 • Published 16 days ago • 106
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models Paper • 2605.30161 • Published 30 days ago • 60
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published about 1 month ago • 93
VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published Jan 22 • 5
Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in Paper • 2512.14273 • Published Dec 16, 2025 • 10
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published Oct 22, 2025 • 32
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published Jun 18, 2025 • 43
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks Paper • 2501.08326 • Published Jan 14, 2025 • 34
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models Paper • 2412.01822 • Published Dec 2, 2024 • 16