UI-Venus-1.5-8B 4bit
This is a 4-bit quantized MLX conversion of inclusionAI/UI-Venus-1.5-8B, optimized for Apple Silicon.
UI-Venus-1.5 is a unified end-to-end GUI agent family built for grounding, web navigation, and mobile navigation. The 1.5 family spans dense 2B and 8B variants plus a 30B-A3B MoE variant, and is framed upstream around a shared GUI semantics stage, online RL for long-horizon navigation, and model merging across grounding, web, and mobile domains.
This artifact was derived from the validated local MLX bf16 reference conversion and then quantized with mlx-vlm. It was validated locally with both mlx_vlm prompt-packet checks and vllm-mlx OpenAI-compatible serve checks.
Conversion Details
| Field | Value |
|---|---|
| Upstream model | inclusionAI/UI-Venus-1.5-8B |
| Artifact type | 4bit quantized MLX conversion |
| Source artifact | local validated bf16 MLX artifact |
| Conversion tool | mlx_vlm.convert via mlx-vlm 0.3.12 |
| Python | 3.11.14 |
| MLX | 0.31.0 |
| Transformers | 5.2.0 |
| Validation backend | vllm-mlx (phase/p1 @ 8a5d41b) |
| Quantization | 4bit |
| Group size | 64 |
| Quantization mode | affine |
| Converter dtype note | float16 |
| Reported effective bits per weight | 5.256 |
| Artifact size | 5.38G |
| Template repair | tokenizer_config.json["chat_template"] was re-injected after quantization |
Additional notes:
- This MLX artifact preserves the dual-template contract across
chat_template.json,chat_template.jinja, andtokenizer_config.json["chat_template"]. chat_template.jinjais present as an additive compatibility shim.- No manual dtype edit was applied after conversion.
Validation
This artifact passed local validation in this workspace:
mlx_vlmprompt-packet validation:PASSvllm-mlxOpenAI-compatible serve validation:PASS
Local validation notes:
- overall behavior stayed aligned with the local
bf16reference artifact - structured action remained valid and retained the requested
reasonfield - the main degradation was grounding stability: localization drift increased and the grounding output changed from the requested single-object schema to a wrapped list with
bbox_2d
Performance
- Artifact size on disk:
5.38G - Local fixed-packet
mlx_vlmvalidation used about18.77 GBpeak memory - Observed local fixed-packet throughput was about
195-200prompt tok/s and52.1-67.6generation tok/s across the four validation prompts - Local
vllm-mlxserve validation completed in about24.48snon-stream and26.34sstreamed
These are local validation measurements, not a full benchmark suite.
Usage
Install
pip install -U mlx-vlm
CLI
python -m mlx_vlm.generate \
--model mlx-community/UI-Venus-1.5-8B-4bit \
--image path/to/image.png \
--prompt "Describe the visible controls on this screen." \
--max-tokens 256 \
--temperature 0.0
Python
from mlx_vlm import load, generate
model, processor = load("mlx-community/UI-Venus-1.5-8B-4bit")
result = generate(
model,
processor,
prompt="Describe the visible controls on this screen.",
image="path/to/image.png",
max_tokens=256,
temp=0.0,
)
print(result.text)
vllm-mlx Serve
python -m vllm_mlx.cli serve mlx-community/UI-Venus-1.5-8B-4bit --mllm --localhost --port 8000
Links
- Upstream model: inclusionAI/UI-Venus-1.5-8B
- Paper: UI-Venus-1.5 Technical Report
- Paper: UI-Venus Technical Report: Building High-performance UI Agents with RFT
- GitHub: inclusionAI/UI-Venus
- MLX framework: ml-explore/mlx
- mlx-vlm: Blaizzy/mlx-vlm
Other Quantizations
Planned sibling repos in this wave:
mlx-community/UI-Venus-1.5-8B-bf16mlx-community/UI-Venus-1.5-8B-6bitmlx-community/UI-Venus-1.5-8B-4bit- this model
Notes and Limitations
- This card reports local MLX conversion and validation results only.
- Upstream benchmark claims belong to the original UI-Venus model family and were not re-run here unless explicitly stated.
- Quantization changes numerical behavior relative to the local
bf16reference artifact. - The main degradation relative to
bf16and6bitwas grounding stability, including both localization drift and schema drift on the fixed grounding prompt.
Citation
If you use this MLX conversion, please cite the original UI-Venus papers:
- UI-Venus-1.5 Technical Report
- UI-Venus Technical Report: Building High-performance UI Agents with RFT
License
This repo follows the upstream model license: Apache 2.0. See the upstream model card for the authoritative license details: inclusionAI/UI-Venus-1.5-8B.
- Downloads last month
- 28
4-bit
Model tree for mlx-community/UI-Venus-1.5-8B-4bit
Base model
inclusionAI/UI-Venus-1.5-8B