UGround-V1-2B bf16

This is an MLX conversion of osunlp/UGround-V1-2B, optimized for Apple Silicon.

UGround is a GUI visual grounding model built on Qwen2-VL and framed upstream around point-based grounding for screen elements. This refreshed MLX artifact is intended to replace the older stale mlx-community/UGround-V1-2B conversion as the structurally trustworthy Track E reference row.

This MLX artifact was converted with mlx-vlm, structurally triaged locally, and checked for basic runtime viability with direct mlx_vlm probes.

Conversion Details

Field	Value
Upstream model	`osunlp/UGround-V1-2B`
Artifact type	`bf16 MLX conversion`
Conversion tool	`mlx_vlm.convert` via `mlx-vlm 0.3.12`
Python	`3.11.14`
MLX	`0.31.0`
Validation backend	`vllm-mlx (phase/p1 @ 48b51ed)`
Quantization	`bf16`
Group size	`n/a`
Quantization mode	`n/a`
Artifact size	`4.5G`
Template repair	`tokenizer_config.json["chat_template"]` was re-injected from `chat_template.jinja` after conversion

Additional notes:

Root-level processor_config.json is present. This is the key structural fix relative to the older stale mlx-community/UGround-V1-2B artifact.
chat_template.jinja, chat_template.json["chat_template"], and tokenizer_config.json["chat_template"] were verified to match exactly after repair.
Local multimodal detection now passes on this refreshed artifact.

Validation

This artifact passed local structural triage in this workspace:

root packaging / multimodal structure: PASS
tokenizer-visible template repair: PASS
safetensors and index structure: PASS
minimum runtime viability: PASS with caution

Local notes:

source posture: fresh-source conversion from osunlp/UGround-V1-2B
root structure: corrected relative to the older stale public MLX artifact
structural triage verdict: triage-pass
quantization status: still blocked pending stronger semantic evidence

Important limitation:

the refreshed artifact is structurally sound, but the first local grounding/classification probes were weak
this repo should be treated as the authoritative fresh MLX reference artifact for Track E, not yet as a recommended winning row for GUI grounding

Usage

Install

pip install -U mlx-vlm

CLI

python -m mlx_vlm.generate \
  --model mlx-community/UGround-V1-2B-bf16 \
  --image path/to/image.png \
  --prompt "Your task is to help the user identify the precise coordinates (x, y) of a specific area or element on the screen based on a description. Your response should be a single string (x, y) corresponding to the point of interest on a 0-1000 grid. Description: API Host input field. Answer:" \
  --max-tokens 64 \
  --temperature 0.0

Python

from mlx_vlm import load, generate

model, processor = load("mlx-community/UGround-V1-2B-bf16")
result = generate(
    model,
    processor,
    prompt=(
        "Your task is to help the user identify the precise coordinates (x, y) "
        "of a specific area or element on the screen based on a description. "
        "Your response should be a single string (x, y) corresponding to the "
        "point of interest on a 0-1000 grid. Description: API Host input field. "
        "Answer:"
    ),
    image="path/to/image.png",
    max_tokens=64,
    temp=0.0,
)
print(result.text)

Other Quantizations

Not published from this Track E lane:

6bit not authorized
4bit not authorized

If a later operator decision reopens quantization, those rows should be derived from this refreshed bf16 artifact rather than from the stale public MLX repo.

Notes and Limitations

This card reports local MLX conversion and structural-triage results only.
Upstream benchmark tables belong to the original UGround family and were not re-run here.
The older mlx-community/UGround-V1-2B artifact should be treated as stale comparison context, not as the authoritative root for later quantization work.
First local semantic probes were weak enough that quantization remained blocked after triage.

Citation

If you use this MLX conversion, please also cite the original UGround work:

@article{gou2024uground,
  title={Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents},
  author={Boyu Gou and Ruohan Wang and Boyuan Zheng and Yanan Xie and Cheng Chang and Yiheng Shu and Huan Sun and Yu Su},
  journal={arXiv preprint arXiv:2410.05243},
  year={2024},
  url={https://arxiv.org/abs/2410.05243},
}

License

This repo follows the upstream model license: Apache 2.0. See the upstream model card for the authoritative license details: osunlp/UGround-V1-2B.

Downloads last month: 5

Safetensors

Model size

2B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/UGround-V1-2B-bf16

Base model

Qwen/Qwen2-VL-2B

Finetuned

osunlp/UGround-V1-2B