Ahmed Masry PRO

ahmed-masry

·

https://ahmedmasryku.github.io/

Ahmed_Masry97

AI & ML interests

Multimodal Chart Understanding, Multimodal Document AI, Multimodal Vision - Language Models,

Organizations

upvoted an article 4 months ago

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

+7

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego

•

Mar 10

• 168

upvoted a paper 4 months ago

SkillOrchestra: Learning to Route Agents via Skill Transfer

Paper • 2602.19672 • Published Feb 23 • 58

upvoted a paper 6 months ago

DashboardQA: Benchmarking Multimodal Agents for Question Answering on Interactive Dashboards

Paper • 2508.17398 • Published Aug 24, 2025 • 2

upvoted a paper about 1 year ago

ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering

Paper • 2504.05506 • Published Apr 7, 2025 • 25

upvoted 3 papers over 1 year ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20, 2025 • 167

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding

Paper • 2502.01341 • Published Feb 3, 2025 • 39

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published Dec 5, 2024 • 15

upvoted a paper almost 2 years ago

ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

Paper • 2407.04172 • Published Jul 4, 2024 • 25