codelion commited on
Commit
70257fd
·
verified ·
1 Parent(s): d4eb985

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -13,7 +13,7 @@ tags:
13
  - optiq
14
  ---
15
 
16
- # Qwen3.5-2B-4bit-OptiQ
17
 
18
  > Mixed-precision quantized with OptiQ — sensitivity-driven quantization for Apple Silicon
19
 
@@ -65,7 +65,7 @@ At the 2B scale, there is a **sharp quality cliff between 4-bit and 3-bit** —
65
  - **At 3.0 BPW:** The floor bit-width matters enormously. With a 3-bit floor [3,4,8], OptiQ matches uniform 3-bit (6.0%). With a 2-bit floor [2,4], quality collapses to 2.0% — even mixed-precision can't save layers quantized to 2-bit at this scale.
66
  - **Uniform 2-bit** is essentially random (0.5%).
67
 
68
- **Recommendation:** Use this 4.5 BPW model for the best quality-size tradeoff at 2B scale. For smaller models where mixed-precision shows dramatic benefits, see [Qwen3.5-0.8B-4bit-OptiQ](https://huggingface.co/mlx-community/Qwen3.5-0.8B-4bit-OptiQ) where OptiQ more than doubles uniform 4-bit accuracy (27% vs 11.5%).
69
 
70
  ## Usage
71
 
@@ -74,7 +74,7 @@ This model works with standard `mlx-lm` — no special code needed:
74
  ```python
75
  from mlx_lm import load, generate
76
 
77
- model, tokenizer = load("mlx-community/Qwen3.5-2B-4bit-OptiQ")
78
 
79
  prompt = "Explain quantum computing in simple terms:"
80
  response = generate(model, tokenizer, prompt=prompt, max_tokens=200)
 
13
  - optiq
14
  ---
15
 
16
+ # Qwen3.5-2B-OptiQ-4bit
17
 
18
  > Mixed-precision quantized with OptiQ — sensitivity-driven quantization for Apple Silicon
19
 
 
65
  - **At 3.0 BPW:** The floor bit-width matters enormously. With a 3-bit floor [3,4,8], OptiQ matches uniform 3-bit (6.0%). With a 2-bit floor [2,4], quality collapses to 2.0% — even mixed-precision can't save layers quantized to 2-bit at this scale.
66
  - **Uniform 2-bit** is essentially random (0.5%).
67
 
68
+ **Recommendation:** Use this 4.5 BPW model for the best quality-size tradeoff at 2B scale. For smaller models where mixed-precision shows dramatic benefits, see [Qwen3.5-0.8B-OptiQ-4bit](https://huggingface.co/mlx-community/Qwen3.5-0.8B-OptiQ-4bit) where OptiQ more than doubles uniform 4-bit accuracy (27% vs 11.5%).
69
 
70
  ## Usage
71
 
 
74
  ```python
75
  from mlx_lm import load, generate
76
 
77
+ model, tokenizer = load("mlx-community/Qwen3.5-2B-OptiQ-4bit")
78
 
79
  prompt = "Explain quantum computing in simple terms:"
80
  response = generate(model, tokenizer, prompt=prompt, max_tokens=200)