ShowUI-2B bf16

This is an MLX conversion of showlab/ShowUI-2B, optimized for Apple Silicon.

ShowUI is a lightweight 2B vision-language-action model designed for GUI agents. Upstream, it is framed around GUI grounding and UI navigation, with point-style localization and atomic action dictionaries over screenshots.

This MLX artifact was converted with mlx-vlm and validated locally with both mlx_vlm prompt-packet checks and vllm-mlx OpenAI-compatible serve checks.

Conversion Details

Field	Value
Upstream model	`showlab/ShowUI-2B`
Artifact type	`bf16 MLX conversion`
Source posture	`fresh-source conversion from an upstream-faithful local safetensors mirror`
Repo action	`update existing mlx-community repo`
Conversion tool	`mlx_vlm.convert` via `mlx-vlm 0.3.12`
Python	`3.11.14`
MLX	`0.31.0`
Transformers	`5.2.0`
Validation backend	`vllm-mlx (phase/p1 @ 8a5d41b)`
Quantization	`bf16`
Group size	`n/a`
Quantization mode	`n/a`
Artifact size	`4.56G`
Template repair	`tokenizer_config.json["chat_template"]` was re-injected after conversion
Source-mirror repair	`lm_head.weight` and upstream metadata were carried into the local safetensors mirror before conversion

Additional notes:

This is a fresh-source conversion from showlab/ShowUI-2B, not a re-quantized derivative of the older community bf16.
The published upstream repo did not provide a clean direct safetensors path for the current mlx-vlm flow, so a source-faithful local mirror was used before MLX conversion.
chat_template.json, chat_template.jinja, and tokenizer_config.json["chat_template"] were kept aligned for downstream compatibility checks.
Root-level preprocessor_config.json and processor_config.json are present intentionally for multimodal detection compatibility.

Validation

This artifact passed local validation in this workspace:

mlx_vlm prompt-packet validation: PASS
vllm-mlx OpenAI-compatible serve validation: PASS

Local validation notes:

This family was validated on a Track B packet revision aligned to ShowUI's native point/action response contract, not the Track A bounding-box packet.
Point grounding and atomic action outputs were stable on the fixed packet.
The main caution is coordinate drift between non-stream and streamed serve responses, not artifact breakage.

Performance

Artifact size on disk: 4.56G
Local fixed-packet mlx_vlm validation used about 6.45 GB peak memory
Local vllm-mlx serve validation completed in about 18.61s non-stream and 19.62s streamed

These are local validation measurements, not a full benchmark suite.

Usage

Install

pip install -U mlx-vlm

CLI

python -m mlx_vlm.generate \
  --model mlx-community/ShowUI-2B-bf16-v2 \
  --image path/to/image.png \
  --prompt "Based on the screenshot, return the clickable location for the API Host field as [x, y] on a 0-1 scale." \
  --max-tokens 128 \
  --temperature 0.0

Python

from mlx_vlm import load, generate

model, processor = load("mlx-community/ShowUI-2B-bf16-v2")
result = generate(
    model,
    processor,
    prompt="Based on the screenshot, return the clickable location for the API Host field as [x, y] on a 0-1 scale.",
    image="path/to/image.png",
    max_tokens=128,
    temp=0.0,
)
print(result.text)

vllm-mlx Serve

python -m vllm_mlx.cli serve mlx-community/ShowUI-2B-bf16-v2 --mllm --localhost --port 8000

Other Quantizations

Planned sibling repos in this wave:

mlx-community/ShowUI-2B-bf16-v2 - this model
mlx-community/ShowUI-2B-6bit-v2

Notes and Limitations

This card reports local MLX conversion and validation results only.
Upstream benchmark claims belong to the original ShowUI model family and were not re-run here unless explicitly stated.
This family is useful for point-grounding and atomic action generation, but it is not directly compatible with the Track A bounding-box evaluation contract.
The original mlx-community/ShowUI-2B-bf16 repo already existed, so this refreshed artifact is published under the -v2 repo id.

Citation

If you use this MLX conversion, please also cite the original ShowUI work:

@misc{lin2024showui,
      title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent},
      author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou},
      year={2024},
      eprint={2411.17465},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.17465},
}

License

This repo follows the upstream model license: MIT. See the upstream model card for the authoritative license details: showlab/ShowUI-2B.

Downloads last month: 9

Safetensors

Model size

2B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/ShowUI-2B-bf16-v2

Base model

Qwen/Qwen2-VL-2B

Finetuned

Qwen/Qwen2-VL-2B-Instruct

Finetuned

showlab/ShowUI-2B

Finetuned

(1)

this model

Paper for mlx-community/ShowUI-2B-bf16-v2

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 90

mlx-community
/

ShowUI-2B-bf16-v2

ShowUI-2B bf16

Conversion Details

Validation

Performance

Usage

Install

CLI

Python

vllm-mlx Serve

Links

Other Quantizations

Notes and Limitations

Citation

License

Model tree for mlx-community/ShowUI-2B-bf16-v2

Paper for mlx-community/ShowUI-2B-bf16-v2

ShowUI: One Vision-Language-Action Model for GUI Visual Agent