README.md · mlx-community/GUI-Owl-7B-6bit at main

GUI-Owl-7B-6bit / README.md

swaylenhayes

Add files using upload-large-folder tool

717d02c verified 18 days ago

preview code

raw

history blame contribute delete

5.76 kB

	---
	language:
	- en
	license: mit
	library_name: mlx
	pipeline_tag: image-text-to-text
	base_model: mPLUG/GUI-Owl-7B
	tags:
	- mlx
	- mlx-vlm
	- safetensors
	- apple-silicon
	- conversational
	- gui
	- vision-language-model
	- qwen2_5_vl
	- gui-owl
	- mobile-agent-v3
	- computer-use
	- grounding
	- osworld
	- screenspot
	- 6-bit
	- quantized
	---

	# GUI-Owl-7B 6bit

	This is a 6-bit quantized MLX conversion of [mPLUG/GUI-Owl-7B](https://huggingface.co/mPLUG/GUI-Owl-7B), optimized for Apple Silicon.

	GUI-Owl is a GUI automation model family developed as part of the Mobile-Agent-V3 project. Upstream, it is positioned for screen understanding, GUI grounding, and agentic action planning across benchmark suites such as ScreenSpot and OSWorld-style tasks.

	This artifact was derived from the validated local MLX `bf16` reference conversion and then quantized with `mlx-vlm`. It was validated locally with both `mlx_vlm` prompt-packet checks and `vllm-mlx` OpenAI-compatible serve checks.

	## Conversion Details

	\| Field \| Value \|
	\|---\|---\|
	\| Upstream model \| `mPLUG/GUI-Owl-7B` \|
	\| Artifact type \| `6bit quantized MLX conversion` \|
	\| Source artifact \| local validated `bf16` MLX artifact \|
	\| Conversion tool \| `mlx_vlm.convert` via `mlx-vlm 0.3.12` \|
	\| Python \| `3.11.14` \|
	\| MLX \| `0.31.0` \|
	\| Transformers \| `5.2.0` \|
	\| Validation backend \| `vllm-mlx (phase/p1 @ 8a5d41b)` \|
	\| Quantization \| `6bit` \|
	\| Group size \| `64` \|
	\| Quantization mode \| `affine` \|
	\| Reported effective bits per weight \| `7.280` \|
	\| Artifact size \| `7.04G` \|
	\| Template repair \| `tokenizer_config.json["chat_template"]` was re-injected after quantization \|

	Additional notes:

	- This quantized artifact inherits the direct-upstream posture of the validated local `bf16` base artifact.
	- `chat_template.json`, `chat_template.jinja`, and `tokenizer_config.json["chat_template"]` were kept aligned after quantization.
	- This family remained on the original Track A packet through the full local validation pass.

	## Validation

	This artifact passed local validation in this workspace:

	- `mlx_vlm` prompt-packet validation: `PASS`
	- `vllm-mlx` OpenAI-compatible serve validation: `PASS`

	Local validation notes:

	- Track A packet reuse remained valid after quantization; this did not turn into a ShowUI-style contract split.
	- Grounding stayed effectively unchanged relative to the local `bf16` artifact, including the same normalization failure.
	- Streamed output improved by dropping the `bf16` Chinese drift, but it became more verbose than the non-stream answer.

	## Performance

	- Artifact size on disk: `7.04G`
	- Local fixed-packet `mlx_vlm` validation used about `9.08 GB` peak memory
	- Local `vllm-mlx` serve validation completed in about `21.80s` non-stream and `24.60s` streamed

	These are local validation measurements, not a full benchmark suite.

	## Usage

	### Install

	```bash
	pip install -U mlx-vlm
	```

	### CLI

	```bash
	python -m mlx_vlm.generate \
	--model mlx-community/GUI-Owl-7B-6bit \
	--image path/to/image.png \
	--prompt "Describe the visible controls on this screen in five short bullet points." \
	--max-tokens 256 \
	--temperature 0.0
	```

	### Python

	```python
	from mlx_vlm import load, generate

	model, processor = load("mlx-community/GUI-Owl-7B-6bit")
	result = generate(
	model,
	processor,
	prompt="Describe the visible controls on this screen in five short bullet points.",
	image="path/to/image.png",
	max_tokens=256,
	temp=0.0,
	)
	print(result.text)
	```

	### vllm-mlx Serve

	```bash
	python -m vllm_mlx.cli serve mlx-community/GUI-Owl-7B-6bit --mllm --localhost --port 8000
	```

	## Links

	- Upstream model: [mPLUG/GUI-Owl-7B](https://huggingface.co/mPLUG/GUI-Owl-7B)
	- Paper: [Mobile-Agent-v3: Foundamental Agents for GUI Automation](https://arxiv.org/abs/2508.15144)
	- Technical PDF: [Mobile-Agent-V3 Technical Report](https://github.com/X-PLUG/MobileAgent/blob/main/Mobile-Agent-v3/assets/MobileAgentV3_Tech.pdf)
	- GitHub: [X-PLUG/MobileAgent](https://github.com/X-PLUG/MobileAgent)
	- Base model lineage: [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)
	- MLX framework: [ml-explore/mlx](https://github.com/ml-explore/mlx)
	- mlx-vlm: [Blaizzy/mlx-vlm](https://github.com/Blaizzy/mlx-vlm)

	## Other Quantizations

	Planned sibling repos in this wave:

	- [`mlx-community/GUI-Owl-7B-bf16`](https://huggingface.co/mlx-community/GUI-Owl-7B-bf16)
	- [`mlx-community/GUI-Owl-7B-6bit`](https://huggingface.co/mlx-community/GUI-Owl-7B-6bit) - this model

	## Notes and Limitations

	- This card reports local MLX conversion and validation results only.
	- Upstream benchmark claims belong to the original GUI-Owl family and were not re-run here unless explicitly stated.
	- Quantization changed output style slightly relative to the local `bf16` reference artifact, especially on streamed responses.
	- The main local weaknesses remained grounding normalization and weak structured-action target choice, not serve-path instability.

	## Citation

	If you use this MLX conversion, please also cite the original GUI-Owl work:

	```bibtex
	@misc{ye2025mobileagentv3foundamentalagentsgui,
	title={Mobile-Agent-v3: Foundamental Agents for GUI Automation},
	author={Jiabo Ye and Xi Zhang and Haiyang Xu and Haowei Liu and Junyang Wang and Zhaoqing Zhu and Ziwei Zheng and Feiyu Gao and Junjie Cao and Zhengxi Lu and Jitong Liao and Qi Zheng and Fei Huang and Jingren Zhou and Ming Yan},
	year={2025},
	eprint={2508.15144},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2508.15144},
	}
	```

	## License

	This repo follows the upstream model license: MIT.
	See the upstream model card for the authoritative license details:
	[mPLUG/GUI-Owl-7B](https://huggingface.co/mPLUG/GUI-Owl-7B).