OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models Paper • 2603.09326 • Published 26 days ago • 1
LlamaSeg: Image Segmentation via Autoregressive Mask Generation Paper • 2505.19422 • Published May 26, 2025 • 3
LlamaSeg: Image Segmentation via Autoregressive Mask Generation Paper • 2505.19422 • Published May 26, 2025 • 3
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper • 2503.19757 • Published Mar 25, 2025 • 51
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models Paper • 2503.14939 • Published Mar 19, 2025 • 5
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models Paper • 2503.14939 • Published Mar 19, 2025 • 5