MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models Paper • 2410.09733 • Published Oct 13, 2024 • 9
AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks? Paper • 2606.05080 • Published 22 days ago • 30
Agent Skills Should Go Beyond Text: The Case for Visual Skills Paper • 2606.01414 • Published 25 days ago • 10
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents Paper • 2605.18652 • Published May 18 • 8
view article Article Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents ibm-granite • Mar 31 • 34
MIRA: Multimodal Iterative Reasoning Agent for Image Editing Paper • 2511.21087 • Published Nov 26, 2025 • 10
MIRA: Multimodal Iterative Reasoning Agent for Image Editing Paper • 2511.21087 • Published Nov 26, 2025 • 10
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published Oct 6, 2025 • 51