ShowUI-2B bf16
This is an MLX conversion of showlab/ShowUI-2B, optimized for Apple Silicon.
ShowUI is a lightweight 2B vision-language-action model designed for GUI agents. Upstream, it is framed around GUI grounding and UI navigation, with point-style localization and atomic action dictionaries over screenshots.
This MLX artifact was converted with mlx-vlm and validated locally with both mlx_vlm prompt-packet checks and vllm-mlx OpenAI-compatible serve checks.
Conversion Details
| Field | Value |
|---|---|
| Upstream model | showlab/ShowUI-2B |
| Artifact type | bf16 MLX conversion |
| Source posture | fresh-source conversion from an upstream-faithful local safetensors mirror |
| Repo action | update existing mlx-community repo |
| Conversion tool | mlx_vlm.convert via mlx-vlm 0.3.12 |
| Python | 3.11.14 |
| MLX | 0.31.0 |
| Transformers | 5.2.0 |
| Validation backend | vllm-mlx (phase/p1 @ 8a5d41b) |
| Quantization | bf16 |
| Group size | n/a |
| Quantization mode | n/a |
| Artifact size | 4.56G |
| Template repair | tokenizer_config.json["chat_template"] was re-injected after conversion |
| Source-mirror repair | lm_head.weight and upstream metadata were carried into the local safetensors mirror before conversion |
Additional notes:
- This is a fresh-source conversion from
showlab/ShowUI-2B, not a re-quantized derivative of the older communitybf16. - The published upstream repo did not provide a clean direct safetensors path for the current
mlx-vlmflow, so a source-faithful local mirror was used before MLX conversion. chat_template.json,chat_template.jinja, andtokenizer_config.json["chat_template"]were kept aligned for downstream compatibility checks.- Root-level
preprocessor_config.jsonandprocessor_config.jsonare present intentionally for multimodal detection compatibility.
Validation
This artifact passed local validation in this workspace:
mlx_vlmprompt-packet validation:PASSvllm-mlxOpenAI-compatible serve validation:PASS
Local validation notes:
- This family was validated on a Track B packet revision aligned to ShowUI's native point/action response contract, not the Track A bounding-box packet.
- Point grounding and atomic action outputs were stable on the fixed packet.
- The main caution is coordinate drift between non-stream and streamed serve responses, not artifact breakage.
Performance
- Artifact size on disk:
4.56G - Local fixed-packet
mlx_vlmvalidation used about6.45 GBpeak memory - Local
vllm-mlxserve validation completed in about18.61snon-stream and19.62sstreamed
These are local validation measurements, not a full benchmark suite.
Usage
Install
pip install -U mlx-vlm
CLI
python -m mlx_vlm.generate \
--model mlx-community/ShowUI-2B-bf16-v2 \
--image path/to/image.png \
--prompt "Based on the screenshot, return the clickable location for the API Host field as [x, y] on a 0-1 scale." \
--max-tokens 128 \
--temperature 0.0
Python
from mlx_vlm import load, generate
model, processor = load("mlx-community/ShowUI-2B-bf16-v2")
result = generate(
model,
processor,
prompt="Based on the screenshot, return the clickable location for the API Host field as [x, y] on a 0-1 scale.",
image="path/to/image.png",
max_tokens=128,
temp=0.0,
)
print(result.text)
vllm-mlx Serve
python -m vllm_mlx.cli serve mlx-community/ShowUI-2B-bf16-v2 --mllm --localhost --port 8000
Links
- Upstream model: showlab/ShowUI-2B
- Paper: ShowUI: One Vision-Language-Action Model for GUI Visual Agent
- GitHub: showlab/ShowUI
- Demo Space: showlab/ShowUI Space
- Dataset: showlab/ShowUI-desktop-8K
- Base model lineage: Qwen/Qwen2-VL-2B-Instruct
- MLX framework: ml-explore/mlx
- mlx-vlm: Blaizzy/mlx-vlm
Other Quantizations
Planned sibling repos in this wave:
Notes and Limitations
- This card reports local MLX conversion and validation results only.
- Upstream benchmark claims belong to the original ShowUI model family and were not re-run here unless explicitly stated.
- This family is useful for point-grounding and atomic action generation, but it is not directly compatible with the Track A bounding-box evaluation contract.
- The original
mlx-community/ShowUI-2B-bf16repo already existed, so this refreshed artifact is published under the-v2repo id.
Citation
If you use this MLX conversion, please also cite the original ShowUI work:
@misc{lin2024showui,
title={ShowUI: One Vision-Language-Action Model for GUI Visual Agent},
author={Kevin Qinghong Lin and Linjie Li and Difei Gao and Zhengyuan Yang and Shiwei Wu and Zechen Bai and Weixian Lei and Lijuan Wang and Mike Zheng Shou},
year={2024},
eprint={2411.17465},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.17465},
}
License
This repo follows the upstream model license: MIT. See the upstream model card for the authoritative license details: showlab/ShowUI-2B.
- Downloads last month
- 18
Quantized