Qwen3.5-9B-oQ4

Unofficial 4-bit quantized derivative of Qwen/Qwen3.5-9B for Apple Silicon / oMLX.

Quantization: oMLX oQ (Outlier-aware Quantization) at level 4, group size 64, bfloat16.

from omlx.oq import quantize_oq_streaming
quantize_oq_streaming(
    model_path="Qwen/Qwen3.5-9B",
    output_path="Qwen3.5-9B-oQ4",
    oq_level=4, group_size=64, dtype="bfloat16",
)

Size

Version	Size
Original (BF16)	~18 GB
oQ4 (this)	5.7 GB

Files

model-00001-of-00002.safetensors (4.7 GB)
model-00002-of-00002.safetensors (1.0 GB)
config.json, tokenizer.json, tokenizer_config.json
chat_template.jinja, preprocessor_config.json
vocab.json, merges.txt

Usage

Run with oMLX:

omlx serve --model-dir /path/to/models

Or with mlx-lm:

python -m mlx_lm generate --model Qwen3.5-9B-oQ4

Notes

This is a reasoning/thinking model — responses include <think> tags by default
Use /no_think in prompts for direct output
trust_remote_code: true recommended
Intended for Apple Silicon (M-series) with 8GB+ unified memory

Downloads last month: 68

Safetensors

Model size

2B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vystartasv/Qwen3.5-9B-oQ4

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Quantized

(260)

this model