mlx-community
/

Qwen3.5-2B-OptiQ-4bit

Text Generation

mixed-precision

4-bit precision

Model card Files Files and versions

codelion commited on Mar 4

Commit

70257fd

·

verified ·

1 Parent(s): d4eb985

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ tags:
 - optiq
 ---
-# Qwen3.5-2B-4bit-OptiQ
 > Mixed-precision quantized with OptiQ — sensitivity-driven quantization for Apple Silicon
@@ -65,7 +65,7 @@ At the 2B scale, there is a **sharp quality cliff between 4-bit and 3-bit** —
 - **At 3.0 BPW:** The floor bit-width matters enormously. With a 3-bit floor [3,4,8], OptiQ matches uniform 3-bit (6.0%). With a 2-bit floor [2,4], quality collapses to 2.0% — even mixed-precision can't save layers quantized to 2-bit at this scale.
 - **Uniform 2-bit** is essentially random (0.5%).
-**Recommendation:** Use this 4.5 BPW model for the best quality-size tradeoff at 2B scale. For smaller models where mixed-precision shows dramatic benefits, see [Qwen3.5-0.8B-4bit-OptiQ](https://huggingface.co/mlx-community/Qwen3.5-0.8B-4bit-OptiQ) where OptiQ more than doubles uniform 4-bit accuracy (27% vs 11.5%).
 ## Usage
@@ -74,7 +74,7 @@ This model works with standard `mlx-lm` — no special code needed:
 ```python
 from mlx_lm import load, generate
-model, tokenizer = load("mlx-community/Qwen3.5-2B-4bit-OptiQ")
 prompt = "Explain quantum computing in simple terms:"
 response = generate(model, tokenizer, prompt=prompt, max_tokens=200)

 - optiq
 ---
+# Qwen3.5-2B-OptiQ-4bit
 > Mixed-precision quantized with OptiQ — sensitivity-driven quantization for Apple Silicon
 - **At 3.0 BPW:** The floor bit-width matters enormously. With a 3-bit floor [3,4,8], OptiQ matches uniform 3-bit (6.0%). With a 2-bit floor [2,4], quality collapses to 2.0% — even mixed-precision can't save layers quantized to 2-bit at this scale.
 - **Uniform 2-bit** is essentially random (0.5%).
+**Recommendation:** Use this 4.5 BPW model for the best quality-size tradeoff at 2B scale. For smaller models where mixed-precision shows dramatic benefits, see [Qwen3.5-0.8B-OptiQ-4bit](https://huggingface.co/mlx-community/Qwen3.5-0.8B-OptiQ-4bit) where OptiQ more than doubles uniform 4-bit accuracy (27% vs 11.5%).
 ## Usage
 ```python
 from mlx_lm import load, generate
+model, tokenizer = load("mlx-community/Qwen3.5-2B-OptiQ-4bit")
 prompt = "Explain quantum computing in simple terms:"
 response = generate(model, tokenizer, prompt=prompt, max_tokens=200)