Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding Paper • 2605.00642 • Published 3 days ago • 4
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published Nov 17, 2025 • 26
Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs Paper • 2506.22139 • Published Jun 27, 2025 • 2
HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration Paper • 2510.27266 • Published Oct 31, 2025 • 21
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators Paper • 2510.00406 • Published Oct 1, 2025 • 68
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent Paper • 2509.15566 • Published Sep 19, 2025 • 14