UI-Venus-1.5-8B 4bit

This is a 4-bit quantized MLX conversion of inclusionAI/UI-Venus-1.5-8B, optimized for Apple Silicon.

UI-Venus-1.5 is a unified end-to-end GUI agent family built for grounding, web navigation, and mobile navigation. The 1.5 family spans dense 2B and 8B variants plus a 30B-A3B MoE variant, and is framed upstream around a shared GUI semantics stage, online RL for long-horizon navigation, and model merging across grounding, web, and mobile domains.

This artifact was derived from the validated local MLX bf16 reference conversion and then quantized with mlx-vlm. It was validated locally with both mlx_vlm prompt-packet checks and vllm-mlx OpenAI-compatible serve checks.

Conversion Details

Field Value
Upstream model inclusionAI/UI-Venus-1.5-8B
Artifact type 4bit quantized MLX conversion
Source artifact local validated bf16 MLX artifact
Conversion tool mlx_vlm.convert via mlx-vlm 0.3.12
Python 3.11.14
MLX 0.31.0
Transformers 5.2.0
Validation backend vllm-mlx (phase/p1 @ 8a5d41b)
Quantization 4bit
Group size 64
Quantization mode affine
Converter dtype note float16
Reported effective bits per weight 5.256
Artifact size 5.38G
Template repair tokenizer_config.json["chat_template"] was re-injected after quantization

Additional notes:

  • This MLX artifact preserves the dual-template contract across chat_template.json, chat_template.jinja, and tokenizer_config.json["chat_template"].
  • chat_template.jinja is present as an additive compatibility shim.
  • No manual dtype edit was applied after conversion.

Validation

This artifact passed local validation in this workspace:

  • mlx_vlm prompt-packet validation: PASS
  • vllm-mlx OpenAI-compatible serve validation: PASS

Local validation notes:

  • overall behavior stayed aligned with the local bf16 reference artifact
  • structured action remained valid and retained the requested reason field
  • the main degradation was grounding stability: localization drift increased and the grounding output changed from the requested single-object schema to a wrapped list with bbox_2d

Performance

  • Artifact size on disk: 5.38G
  • Local fixed-packet mlx_vlm validation used about 18.77 GB peak memory
  • Observed local fixed-packet throughput was about 195-200 prompt tok/s and 52.1-67.6 generation tok/s across the four validation prompts
  • Local vllm-mlx serve validation completed in about 24.48s non-stream and 26.34s streamed

These are local validation measurements, not a full benchmark suite.

Usage

Install

pip install -U mlx-vlm

CLI

python -m mlx_vlm.generate \
  --model mlx-community/UI-Venus-1.5-8B-4bit \
  --image path/to/image.png \
  --prompt "Describe the visible controls on this screen." \
  --max-tokens 256 \
  --temperature 0.0

Python

from mlx_vlm import load, generate

model, processor = load("mlx-community/UI-Venus-1.5-8B-4bit")
result = generate(
    model,
    processor,
    prompt="Describe the visible controls on this screen.",
    image="path/to/image.png",
    max_tokens=256,
    temp=0.0,
)
print(result.text)

vllm-mlx Serve

python -m vllm_mlx.cli serve mlx-community/UI-Venus-1.5-8B-4bit --mllm --localhost --port 8000

Links

Other Quantizations

Planned sibling repos in this wave:

Notes and Limitations

  • This card reports local MLX conversion and validation results only.
  • Upstream benchmark claims belong to the original UI-Venus model family and were not re-run here unless explicitly stated.
  • Quantization changes numerical behavior relative to the local bf16 reference artifact.
  • The main degradation relative to bf16 and 6bit was grounding stability, including both localization drift and schema drift on the fixed grounding prompt.

Citation

If you use this MLX conversion, please cite the original UI-Venus papers:

License

This repo follows the upstream model license: Apache 2.0. See the upstream model card for the authoritative license details: inclusionAI/UI-Venus-1.5-8B.

Downloads last month
28
Safetensors
Model size
2B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/UI-Venus-1.5-8B-4bit

Quantized
(6)
this model

Papers for mlx-community/UI-Venus-1.5-8B-4bit