UI-Venus-1.5-2B 6bit
This is a 6-bit quantized MLX conversion of inclusionAI/UI-Venus-1.5-2B, optimized for Apple Silicon.
UI-Venus-1.5 is a unified end-to-end GUI agent family built for grounding, web navigation, and mobile navigation. The 1.5 family spans dense 2B and 8B variants plus a 30B-A3B MoE variant, and is framed upstream around a shared GUI semantics stage, online RL for long-horizon navigation, and model merging across grounding, web, and mobile domains.
This artifact was derived from the validated local MLX bf16 reference conversion and then quantized with mlx-vlm. It was validated locally with both mlx_vlm prompt-packet checks and vllm-mlx OpenAI-compatible serve checks.
Conversion Details
| Field | Value |
|---|---|
| Upstream model | inclusionAI/UI-Venus-1.5-2B |
| Artifact type | 6bit quantized MLX conversion |
| Source artifact | local validated bf16 MLX artifact |
| Conversion tool | mlx_vlm.convert via mlx-vlm 0.3.12 |
| Python | 3.11.14 |
| MLX | 0.31.0 |
| Transformers | 5.2.0 |
| Validation backend | vllm-mlx (phase/p1 @ 8a5d41b) |
| Quantization | 6bit |
| Group size | 64 |
| Quantization mode | affine |
| Converter dtype note | float16 |
| Reported effective bits per weight | 8.086 |
| Artifact size | 2.31G |
| Template repair | tokenizer_config.json["chat_template"] was re-injected after quantization |
| Inherited config workaround | local bf16 base was created from an upstream mirror with tie_word_embeddings = false |
Additional notes:
- This MLX artifact preserves the dual-template contract across
chat_template.json,chat_template.jinja, andtokenizer_config.json["chat_template"]. chat_template.jinjais present as an additive compatibility shim.- This quantized artifact inherits the untied-embedding config from the validated local
bf16base artifact.
Validation
This artifact passed local validation in this workspace:
mlx_vlmprompt-packet validation:PASSvllm-mlxOpenAI-compatible serve validation:PASS
Local validation notes:
- the model stayed in the same weak-but-stable instruction-following envelope as the local
bf16reference artifact - the same action choice and the same weaker
Default Modelsanswer were preserved - there was no new regression in the serve path; the main limitation remains model quality rather than packaging or runtime stability
Performance
- Artifact size on disk:
2.31G - Local fixed-packet
mlx_vlmvalidation used about3.99 GBpeak memory - Observed local fixed-packet throughput was about
536-565prompt tok/s and114.6-180.2generation tok/s across the four validation prompts - Local
vllm-mlxserve validation completed in about8.94snon-stream and9.68sstreamed
These are local validation measurements, not a full benchmark suite.
Usage
Install
pip install -U mlx-vlm
CLI
python -m mlx_vlm.generate \
--model mlx-community/UI-Venus-1.5-2B-6bit \
--image path/to/image.png \
--prompt "Describe the visible controls on this screen." \
--max-tokens 256 \
--temperature 0.0
Python
from mlx_vlm import load, generate
model, processor = load("mlx-community/UI-Venus-1.5-2B-6bit")
result = generate(
model,
processor,
prompt="Describe the visible controls on this screen.",
image="path/to/image.png",
max_tokens=256,
temp=0.0,
)
print(result.text)
vllm-mlx Serve
python -m vllm_mlx.cli serve mlx-community/UI-Venus-1.5-2B-6bit --mllm --localhost --port 8000
Links
- Upstream model: inclusionAI/UI-Venus-1.5-2B
- Paper: UI-Venus-1.5 Technical Report
- Paper: UI-Venus Technical Report: Building High-performance UI Agents with RFT
- GitHub: inclusionAI/UI-Venus
- MLX framework: ml-explore/mlx
- mlx-vlm: Blaizzy/mlx-vlm
Other Quantizations
Planned sibling repos in this wave:
Notes and Limitations
- This card reports local MLX conversion and validation results only.
- Upstream benchmark claims belong to the original UI-Venus model family and were not re-run here unless explicitly stated.
- Quantization changes numerical behavior relative to the local
bf16reference artifact. - This artifact remains materially weaker than the UI-Venus
8Bfamily on precise UI instruction-following, even though the serve path stayed stable.
Citation
If you use this MLX conversion, please cite the original UI-Venus papers:
- UI-Venus-1.5 Technical Report
- UI-Venus Technical Report: Building High-performance UI Agents with RFT
License
This repo follows the upstream model license: Apache 2.0. See the upstream model card for the authoritative license details: inclusionAI/UI-Venus-1.5-2B.
- Downloads last month
- 22
6-bit
Model tree for mlx-community/UI-Venus-1.5-2B-6bit
Base model
inclusionAI/UI-Venus-1.5-2B