Qwen3.5-0.8B-MLX-4bit

This is a 4-bit quantized MLX version of Qwen/Qwen3.5-0.8B for Apple Silicon.

Model Details

  • Original Model: Qwen/Qwen3.5-0.8B
  • Quantization: 4-bit (5.863 bits per weight)
  • Group Size: 64
  • Format: MLX SafeTensors
  • Framework: mlx-vlm
  • Disk Size: ~622M

Conversion Details

This model was converted using mlx-vlm from the pc/fix-qwen35-predicate branch, which includes fixes for Qwen3.5 model support (proper handling of MoE gate layers, shared_expert_gate, and A_log casting).

Conversion command:

python3 -m mlx_vlm convert \
  --hf-path "Qwen/Qwen3.5-0.8B" \
  --mlx-path "./Qwen3.5-0.8B-MLX-4bit" \
  -q --q-bits 4 --q-group-size 64

Important Note

A better, more optimized conversion may be available from @Prince (@Blaizzy) in the MLX VLM community. Check the mlx-community organization for updated versions as official Qwen3.5 support is merged into the main mlx-vlm branch.

Related Models

Usage

from mlx_vlm import load, generate

model, processor = load("mlx-community/Qwen3.5-0.8B-MLX-4bit")

output = generate(
    model,
    processor,
    prompt="Describe this image.",
    image="path/to/image.jpg",
    max_tokens=512
)
print(output)

CLI:

python3 -m mlx_vlm.generate \
  --model mlx-community/Qwen3.5-0.8B-MLX-4bit \
  --image path/to/image.jpg \
  --prompt "Describe this image."

License

This model inherits the Apache 2.0 license from the original Qwen model.

Downloads last month
4,154
Safetensors
Model size
0.2B params
Tensor type
BF16
U32
F32
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for mlx-community/Qwen3.5-0.8B-MLX-4bit

Finetuned
Qwen/Qwen3.5-0.8B
Quantized
(48)
this model