UI-Venus-1.5-2B 6bit

This is a 6-bit quantized MLX conversion of inclusionAI/UI-Venus-1.5-2B, optimized for Apple Silicon.

UI-Venus-1.5 is a unified end-to-end GUI agent family built for grounding, web navigation, and mobile navigation. The 1.5 family spans dense 2B and 8B variants plus a 30B-A3B MoE variant, and is framed upstream around a shared GUI semantics stage, online RL for long-horizon navigation, and model merging across grounding, web, and mobile domains.

This artifact was derived from the validated local MLX bf16 reference conversion and then quantized with mlx-vlm. It was validated locally with both mlx_vlm prompt-packet checks and vllm-mlx OpenAI-compatible serve checks.

Conversion Details

Field	Value
Upstream model	`inclusionAI/UI-Venus-1.5-2B`
Artifact type	`6bit quantized MLX conversion`
Source artifact	local validated `bf16` MLX artifact
Conversion tool	`mlx_vlm.convert` via `mlx-vlm 0.3.12`
Python	`3.11.14`
MLX	`0.31.0`
Transformers	`5.2.0`
Validation backend	`vllm-mlx (phase/p1 @ 8a5d41b)`
Quantization	`6bit`
Group size	`64`
Quantization mode	`affine`
Converter dtype note	`float16`
Reported effective bits per weight	`8.086`
Artifact size	`2.31G`
Template repair	`tokenizer_config.json["chat_template"]` was re-injected after quantization
Inherited config workaround	local `bf16` base was created from an upstream mirror with `tie_word_embeddings = false`

Additional notes:

This MLX artifact preserves the dual-template contract across chat_template.json, chat_template.jinja, and tokenizer_config.json["chat_template"].
chat_template.jinja is present as an additive compatibility shim.
This quantized artifact inherits the untied-embedding config from the validated local bf16 base artifact.

Validation

This artifact passed local validation in this workspace:

mlx_vlm prompt-packet validation: PASS
vllm-mlx OpenAI-compatible serve validation: PASS

Local validation notes:

the model stayed in the same weak-but-stable instruction-following envelope as the local bf16 reference artifact
the same action choice and the same weaker Default Models answer were preserved
there was no new regression in the serve path; the main limitation remains model quality rather than packaging or runtime stability

Performance

Artifact size on disk: 2.31G
Local fixed-packet mlx_vlm validation used about 3.99 GB peak memory
Observed local fixed-packet throughput was about 536-565 prompt tok/s and 114.6-180.2 generation tok/s across the four validation prompts
Local vllm-mlx serve validation completed in about 8.94s non-stream and 9.68s streamed

These are local validation measurements, not a full benchmark suite.

Usage

Install

pip install -U mlx-vlm

CLI

python -m mlx_vlm.generate \
  --model mlx-community/UI-Venus-1.5-2B-6bit \
  --image path/to/image.png \
  --prompt "Describe the visible controls on this screen." \
  --max-tokens 256 \
  --temperature 0.0

Python

from mlx_vlm import load, generate

model, processor = load("mlx-community/UI-Venus-1.5-2B-6bit")
result = generate(
    model,
    processor,
    prompt="Describe the visible controls on this screen.",
    image="path/to/image.png",
    max_tokens=256,
    temp=0.0,
)
print(result.text)

vllm-mlx Serve

python -m vllm_mlx.cli serve mlx-community/UI-Venus-1.5-2B-6bit --mllm --localhost --port 8000

Other Quantizations

Planned sibling repos in this wave:

mlx-community/UI-Venus-1.5-2B-bf16
mlx-community/UI-Venus-1.5-2B-6bit - this model

Notes and Limitations

This card reports local MLX conversion and validation results only.
Upstream benchmark claims belong to the original UI-Venus model family and were not re-run here unless explicitly stated.
Quantization changes numerical behavior relative to the local bf16 reference artifact.
This artifact remains materially weaker than the UI-Venus 8B family on precise UI instruction-following, even though the serve path stayed stable.

Citation

If you use this MLX conversion, please cite the original UI-Venus papers:

License

This repo follows the upstream model license: Apache 2.0. See the upstream model card for the authoritative license details: inclusionAI/UI-Venus-1.5-2B.

Downloads last month: 28

Safetensors

Model size

0.9B params

Tensor type

BF16

U32

MLX

Hardware compatibility

6-bit

Model tree for mlx-community/UI-Venus-1.5-2B-6bit

Base model

inclusionAI/UI-Venus-1.5-2B

Quantized

(6)

this model

Papers for mlx-community/UI-Venus-1.5-2B-6bit

UI-Venus-1.5 Technical Report

Paper • 2602.09082 • Published Feb 9 • 158

UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14, 2025 • 46

mlx-community
/

UI-Venus-1.5-2B-6bit