mlfoundations
/

Gelato-30B-A3B

Image-Text-to-Text

Model card Files Files and versions

anas-awadalla commited on Nov 15, 2025

Commit

161e928

·

verified ·

1 Parent(s): fd7109e

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ base_model: Qwen/Qwen3-VL-30B-A3B-Instruct
 <img src="gelato-fig1.png" alt="Figure 1: Gelato achieves SOTA performance on grounding benchmarks." width="750"/>
-We are releasing **🍨 Gelato-30B-A3B**, a state-of-the-art grounding model for GUI computer-use tasks! Gelato is trained on our open-sourced [**Click-100k**](https://huggingface.co/datasets/mlfoundations/Click-100k) dataset and achieves **63.88% accuracy on ScreenSpot-Pro**<sup>[[3](#ref-screenspot-pro)]</sup> and **69.15% / 74.65% on OS-World-G / OS-World-G (Refined)**<sup>[[4](#ref-jedi)]</sup>, surpassing prior specialized computer grounding models like GTA1-32B <sup>[[5](#ref-gta1)]</sup> and much larger VLMs including Qwen3-VL-235B-A22B-Instruct <sup>[[10](#ref-qwen3vl)]</sup>.
 # Performance

 <img src="gelato-fig1.png" alt="Figure 1: Gelato achieves SOTA performance on grounding benchmarks." width="750"/>
+We are releasing **🍨 Gelato-30B-A3B**, a state-of-the-art grounding model for GUI computer-use tasks! Gelato is trained on our open-sourced [**Click-100k**](https://huggingface.co/datasets/mlfoundations/Click-100k) dataset and achieves **63.88% accuracy on ScreenSpot-Pro** and **69.15% / 74.65% on OS-World-G / OS-World-G (Refined)**, surpassing prior specialized computer grounding models like GTA1-32B and much larger VLMs including Qwen3-VL-235B-A22B-Instruct.
 # Performance