Qwen3.5-9B-oQ4

Unofficial 4-bit quantized derivative of Qwen/Qwen3.5-9B for Apple Silicon / oMLX.

Quantization: oMLX oQ (Outlier-aware Quantization) at level 4, group size 64, bfloat16.

from omlx.oq import quantize_oq_streaming
quantize_oq_streaming(
    model_path="Qwen/Qwen3.5-9B",
    output_path="Qwen3.5-9B-oQ4",
    oq_level=4, group_size=64, dtype="bfloat16",
)

Size

Version Size
Original (BF16) ~18 GB
oQ4 (this) 5.7 GB

Files

  • model-00001-of-00002.safetensors (4.7 GB)
  • model-00002-of-00002.safetensors (1.0 GB)
  • config.json, tokenizer.json, tokenizer_config.json
  • chat_template.jinja, preprocessor_config.json
  • vocab.json, merges.txt

Usage

Run with oMLX:

omlx serve --model-dir /path/to/models

Or with mlx-lm:

python -m mlx_lm generate --model Qwen3.5-9B-oQ4

Notes

  • This is a reasoning/thinking model — responses include <think> tags by default
  • Use /no_think in prompts for direct output
  • trust_remote_code: true recommended
  • Intended for Apple Silicon (M-series) with 8GB+ unified memory
Downloads last month
68
Safetensors
Model size
2B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vystartasv/Qwen3.5-9B-oQ4

Finetuned
Qwen/Qwen3.5-9B
Quantized
(260)
this model