Agent Skills Should Go Beyond Text: The Case for Visual Skills Paper • 2606.01414 • Published 27 days ago • 10
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents Paper • 2605.18652 • Published May 18 • 8
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents Paper • 2605.18652 • Published May 18 • 8
MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models Paper • 2603.25744 • Published Mar 26 • 13
MuRF: Unlocking the Multi-Scale Potential of Vision Foundation Models Paper • 2603.25744 • Published Mar 26 • 13
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30, 2025 • 34
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models Paper • 2410.10818 • Published Oct 14, 2024 • 16
VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation Paper • 2407.10972 • Published Jul 15, 2024 • 1