DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards Paper β’ 2508.17398 β’ Published Aug 24, 2025 β’ 2
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering Paper β’ 2504.05506 β’ Published Apr 7, 2025 β’ 25
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published Feb 20, 2025 β’ 157
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper β’ 2502.01341 β’ Published Feb 3, 2025 β’ 39
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper β’ 2412.04626 β’ Published Dec 5, 2024 β’ 13
ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild Paper β’ 2407.04172 β’ Published Jul 4, 2024 β’ 25