MLX
Safetensors
qwen3_5
mlx-lm
rotorquant
kv-cache-quantization
qwen3.5
thinking-model
8bit
quantized
8-bit precision
Instructions to use majentik/Qwen3.5-27B-RotorQuant-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use majentik/Qwen3.5-27B-RotorQuant-MLX-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen3.5-27B-RotorQuant-MLX-8bit majentik/Qwen3.5-27B-RotorQuant-MLX-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
docs: Tier 2 polish — variant matrix + quant trade-off
Browse files
README.md
CHANGED
|
@@ -2,18 +2,15 @@
|
|
| 2 |
library_name: mlx
|
| 3 |
base_model: Qwen/Qwen3.5-27B
|
| 4 |
tags:
|
| 5 |
-
- mlx
|
| 6 |
-
- mlx-lm
|
| 7 |
-
- rotorquant
|
| 8 |
-
- kv-cache-quantization
|
| 9 |
-
- qwen3.5
|
| 10 |
-
- thinking-model
|
| 11 |
-
- 8bit
|
| 12 |
-
- quantized
|
| 13 |
license: apache-2.0
|
| 14 |
-
pipeline_tag: text-generation
|
| 15 |
-
language:
|
| 16 |
-
- en
|
| 17 |
---
|
| 18 |
|
| 19 |
# Qwen3.5-27B-RotorQuant-MLX-8bit
|
|
@@ -81,3 +78,39 @@ Qwen3.5-27B generates extended reasoning before responses by default. The combin
|
|
| 81 |
- [Base model](https://huggingface.co/Qwen/Qwen3.5-27B)
|
| 82 |
- [KV-cache-only variant](https://huggingface.co/majentik/Qwen3.5-27B-RotorQuant)
|
| 83 |
- [RotorQuant GitHub](https://github.com/scrya-com/rotorquant)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
library_name: mlx
|
| 3 |
base_model: Qwen/Qwen3.5-27B
|
| 4 |
tags:
|
| 5 |
+
- mlx
|
| 6 |
+
- mlx-lm
|
| 7 |
+
- rotorquant
|
| 8 |
+
- kv-cache-quantization
|
| 9 |
+
- qwen3.5
|
| 10 |
+
- thinking-model
|
| 11 |
+
- 8bit
|
| 12 |
+
- quantized
|
| 13 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
# Qwen3.5-27B-RotorQuant-MLX-8bit
|
|
|
|
| 78 |
- [Base model](https://huggingface.co/Qwen/Qwen3.5-27B)
|
| 79 |
- [KV-cache-only variant](https://huggingface.co/majentik/Qwen3.5-27B-RotorQuant)
|
| 80 |
- [RotorQuant GitHub](https://github.com/scrya-com/rotorquant)
|
| 81 |
+
|
| 82 |
+
## Quant trade-off (MLX lane)
|
| 83 |
+
|
| 84 |
+
| Bits | Approx size | Use case | Recommendation |
|
| 85 |
+
|---|---|---|---|
|
| 86 |
+
| 2-bit | ~7.3 GB | Aggressive quantization | Very low-RAM Macs |
|
| 87 |
+
| 3-bit | ~10 GB | Lossy but small | Low-RAM Macs |
|
| 88 |
+
| 4-bit | ~12 GB | Balanced default | Recommended for most Macs |
|
| 89 |
+
| 5-bit | ~14 GB | Higher fidelity | Quality-sensitive |
|
| 90 |
+
| 6-bit | ~17 GB | Approaching FP16 quality | High-fidelity |
|
| 91 |
+
| **8-bit** | ~21 GB | Near-lossless reference | **Fidelity-critical work** |
|
| 92 |
+
|
| 93 |
+
(Current variant — **8bit** — is bolded.)
|
| 94 |
+
|
| 95 |
+
## Variants in this family
|
| 96 |
+
|
| 97 |
+
(Showing 16 sibling variants under `majentik/qwen3.5-27b-*`. The current variant — `RotorQuant-MLX-8bit` — is **bolded**.)
|
| 98 |
+
|
| 99 |
+
| Variant | Runtime | Approx size | Use case |
|
| 100 |
+
|---|---|---|---|
|
| 101 |
+
| [RotorQuant](https://huggingface.co/majentik/qwen3.5-27b-rotorquant) | runtime modifier | n/a | KV-cache root (weight-agnostic) |
|
| 102 |
+
| [RotorQuant-2bit](https://huggingface.co/majentik/qwen3.5-27b-rotorquant-2bit) | transformers | n/a | Standalone 2-bit weights |
|
| 103 |
+
| [RotorQuant-GGUF-IQ4_XS](https://huggingface.co/majentik/qwen3.5-27b-rotorquant-gguf-IQ4_XS) | llama.cpp | ~23 GB | Lossy 4-bit, low-RAM CPU/edge |
|
| 104 |
+
| [RotorQuant-GGUF-Q2_K](https://huggingface.co/majentik/qwen3.5-27b-rotorquant-gguf-Q2_K) | llama.cpp | ~16 GB | Lossy, low-RAM CPU/edge |
|
| 105 |
+
| [RotorQuant-GGUF-Q3_K_M](https://huggingface.co/majentik/qwen3.5-27b-rotorquant-gguf-Q3_K_M) | llama.cpp | ~21 GB | Smaller 3-bit, CPU-friendly |
|
| 106 |
+
| [RotorQuant-GGUF-Q4_K_M](https://huggingface.co/majentik/qwen3.5-27b-rotorquant-gguf-Q4_K_M) | llama.cpp | ~30 GB | Balanced default |
|
| 107 |
+
| [RotorQuant-GGUF-Q5_K_M](https://huggingface.co/majentik/qwen3.5-27b-rotorquant-gguf-Q5_K_M) | llama.cpp | ~36 GB | Higher fidelity, more RAM |
|
| 108 |
+
| [RotorQuant-GGUF-Q8_0](https://huggingface.co/majentik/qwen3.5-27b-rotorquant-gguf-Q8_0) | llama.cpp | ~57 GB | Near-lossless reference |
|
| 109 |
+
| [RotorQuant-MLX-2bit](https://huggingface.co/majentik/qwen3.5-27b-rotorquant-mlx-2bit) | mlx-lm | ~8.6 GB | Apple Silicon, smallest |
|
| 110 |
+
| [RotorQuant-MLX-4bit](https://huggingface.co/majentik/qwen3.5-27b-rotorquant-mlx-4bit) | mlx-lm | ~17 GB | Apple Silicon balanced |
|
| 111 |
+
| **RotorQuant-MLX-8bit** | mlx-lm | ~32 GB | Apple Silicon reference |
|
| 112 |
+
| [TurboQuant](https://huggingface.co/majentik/qwen3.5-27b-turboquant) | runtime modifier | n/a | KV-cache root (weight-agnostic) |
|
| 113 |
+
| [TurboQuant-2bit](https://huggingface.co/majentik/qwen3.5-27b-turboquant-2bit) | transformers | n/a | Standalone 2-bit weights |
|
| 114 |
+
| [TurboQuant-MLX-2bit](https://huggingface.co/majentik/qwen3.5-27b-turboquant-mlx-2bit) | mlx-lm | ~8.6 GB | Apple Silicon, smallest |
|
| 115 |
+
| [TurboQuant-MLX-4bit](https://huggingface.co/majentik/qwen3.5-27b-turboquant-mlx-4bit) | mlx-lm | ~17 GB | Apple Silicon balanced |
|
| 116 |
+
| [TurboQuant-MLX-8bit](https://huggingface.co/majentik/qwen3.5-27b-turboquant-mlx-8bit) | mlx-lm | ~32 GB | Apple Silicon reference |
|