GUI-Owl-7B-6bit / README.md
swaylenhayes's picture
Add files using upload-large-folder tool
717d02c verified
metadata
language:
  - en
license: mit
library_name: mlx
pipeline_tag: image-text-to-text
base_model: mPLUG/GUI-Owl-7B
tags:
  - mlx
  - mlx-vlm
  - safetensors
  - apple-silicon
  - conversational
  - gui
  - vision-language-model
  - qwen2_5_vl
  - gui-owl
  - mobile-agent-v3
  - computer-use
  - grounding
  - osworld
  - screenspot
  - 6-bit
  - quantized

GUI-Owl-7B 6bit

This is a 6-bit quantized MLX conversion of mPLUG/GUI-Owl-7B, optimized for Apple Silicon.

GUI-Owl is a GUI automation model family developed as part of the Mobile-Agent-V3 project. Upstream, it is positioned for screen understanding, GUI grounding, and agentic action planning across benchmark suites such as ScreenSpot and OSWorld-style tasks.

This artifact was derived from the validated local MLX bf16 reference conversion and then quantized with mlx-vlm. It was validated locally with both mlx_vlm prompt-packet checks and vllm-mlx OpenAI-compatible serve checks.

Conversion Details

Field Value
Upstream model mPLUG/GUI-Owl-7B
Artifact type 6bit quantized MLX conversion
Source artifact local validated bf16 MLX artifact
Conversion tool mlx_vlm.convert via mlx-vlm 0.3.12
Python 3.11.14
MLX 0.31.0
Transformers 5.2.0
Validation backend vllm-mlx (phase/p1 @ 8a5d41b)
Quantization 6bit
Group size 64
Quantization mode affine
Reported effective bits per weight 7.280
Artifact size 7.04G
Template repair tokenizer_config.json["chat_template"] was re-injected after quantization

Additional notes:

  • This quantized artifact inherits the direct-upstream posture of the validated local bf16 base artifact.
  • chat_template.json, chat_template.jinja, and tokenizer_config.json["chat_template"] were kept aligned after quantization.
  • This family remained on the original Track A packet through the full local validation pass.

Validation

This artifact passed local validation in this workspace:

  • mlx_vlm prompt-packet validation: PASS
  • vllm-mlx OpenAI-compatible serve validation: PASS

Local validation notes:

  • Track A packet reuse remained valid after quantization; this did not turn into a ShowUI-style contract split.
  • Grounding stayed effectively unchanged relative to the local bf16 artifact, including the same normalization failure.
  • Streamed output improved by dropping the bf16 Chinese drift, but it became more verbose than the non-stream answer.

Performance

  • Artifact size on disk: 7.04G
  • Local fixed-packet mlx_vlm validation used about 9.08 GB peak memory
  • Local vllm-mlx serve validation completed in about 21.80s non-stream and 24.60s streamed

These are local validation measurements, not a full benchmark suite.

Usage

Install

pip install -U mlx-vlm

CLI

python -m mlx_vlm.generate \
  --model mlx-community/GUI-Owl-7B-6bit \
  --image path/to/image.png \
  --prompt "Describe the visible controls on this screen in five short bullet points." \
  --max-tokens 256 \
  --temperature 0.0

Python

from mlx_vlm import load, generate

model, processor = load("mlx-community/GUI-Owl-7B-6bit")
result = generate(
    model,
    processor,
    prompt="Describe the visible controls on this screen in five short bullet points.",
    image="path/to/image.png",
    max_tokens=256,
    temp=0.0,
)
print(result.text)

vllm-mlx Serve

python -m vllm_mlx.cli serve mlx-community/GUI-Owl-7B-6bit --mllm --localhost --port 8000

Links

Other Quantizations

Planned sibling repos in this wave:

Notes and Limitations

  • This card reports local MLX conversion and validation results only.
  • Upstream benchmark claims belong to the original GUI-Owl family and were not re-run here unless explicitly stated.
  • Quantization changed output style slightly relative to the local bf16 reference artifact, especially on streamed responses.
  • The main local weaknesses remained grounding normalization and weak structured-action target choice, not serve-path instability.

Citation

If you use this MLX conversion, please also cite the original GUI-Owl work:

@misc{ye2025mobileagentv3foundamentalagentsgui,
      title={Mobile-Agent-v3: Foundamental Agents for GUI Automation},
      author={Jiabo Ye and Xi Zhang and Haiyang Xu and Haowei Liu and Junyang Wang and Zhaoqing Zhu and Ziwei Zheng and Feiyu Gao and Junjie Cao and Zhengxi Lu and Jitong Liao and Qi Zheng and Fei Huang and Jingren Zhou and Ming Yan},
      year={2025},
      eprint={2508.15144},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.15144},
}

License

This repo follows the upstream model license: MIT. See the upstream model card for the authoritative license details: mPLUG/GUI-Owl-7B.