Upload folder using huggingface_hub
Browse files- .gitattributes +2 -0
- README.md +71 -1
- img/Intel.png +3 -0
- img/RTX5090.png +3 -0
.gitattributes
CHANGED
|
@@ -51,3 +51,5 @@ Qwen3-4B-Instruct-2507-Q4_K_S-3.66bpw.gguf filter=lfs diff=lfs merge=lfs -text
|
|
| 51 |
Qwen3-4B-Instruct-2507-Q4_K_S-3.87bpw.gguf filter=lfs diff=lfs merge=lfs -text
|
| 52 |
Qwen3-4B-Instruct-2507-Q4_K_S-4.31bpw.gguf filter=lfs diff=lfs merge=lfs -text
|
| 53 |
Qwen3-4B-Instruct-2507-Q5_K_S-4.74bpw.gguf filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 51 |
Qwen3-4B-Instruct-2507-Q4_K_S-3.87bpw.gguf filter=lfs diff=lfs merge=lfs -text
|
| 52 |
Qwen3-4B-Instruct-2507-Q4_K_S-4.31bpw.gguf filter=lfs diff=lfs merge=lfs -text
|
| 53 |
Qwen3-4B-Instruct-2507-Q5_K_S-4.74bpw.gguf filter=lfs diff=lfs merge=lfs -text
|
| 54 |
+
img/Intel.png filter=lfs diff=lfs merge=lfs -text
|
| 55 |
+
img/RTX5090.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -9,4 +9,74 @@ tags:
|
|
| 9 |
- qwen
|
| 10 |
- qwen3
|
| 11 |
- byteshape
|
| 12 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
- qwen
|
| 10 |
- qwen3
|
| 11 |
- byteshape
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Qwen3-4B-Instruct-2507 GGUF (ShapeLearn Quantized)
|
| 15 |
+
|
| 16 |
+
This is a GGUF-quantized version of Qwen3-4B-Instruct-2507 produced with **ByteShape's ShapeLearn**, which learns the optimal datatype per tensor to maintain high quality even at very low bit lengths (the exclusive focus on this release).
|
| 17 |
+
|
| 18 |
+
<div align="center">
|
| 19 |
+
|
| 20 |
+
### [**Read Our Blog for Detailed Benchmarks**](https://byteshape.com/blogs/Qwen3-4B-I-2507/)
|
| 21 |
+
|
| 22 |
+
*Comprehensive benchmarks across multiple GPUs, CPUs, and even Raspberry Pi*
|
| 23 |
+
|
| 24 |
+
### [**Join the Discussion on Reddit**](https://www.reddit.com/r/ByteShape/)
|
| 25 |
+
|
| 26 |
+
*Questions? Feedback? Connect with the ByteShape community*
|
| 27 |
+
|
| 28 |
+
</div>
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
# How to Pick a Model
|
| 33 |
+
|
| 34 |
+
We provide **CPU and GPU optimized variants** for llama.cpp:
|
| 35 |
+
|
| 36 |
+
- **CPUs**: KQ quantization is the preferred choice due to GGML kernel performance
|
| 37 |
+
- **Nvidia GPUs**: IQ quantization delivers faster performance
|
| 38 |
+
|
| 39 |
+
Each hardware target includes a range of models covering different size and quality tradeoffs.
|
| 40 |
+
|
| 41 |
+
### Understanding the Charts
|
| 42 |
+
|
| 43 |
+
The charts below show **quality vs. tokens per second** for each device, including ShapeLearn models alongside Unsloth baselines for direct comparison.
|
| 44 |
+
|
| 45 |
+
**Selection Strategy:** Choose the model with the best quality at your target throughput, or the fastest model that meets your quality requirements.
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## GGUF-KQ Models: Best for CPU
|
| 50 |
+
|
| 51 |
+

|
| 52 |
+
|
| 53 |
+
**Table sorted by inference speed:** (match the chart’s numbers with the model IDs below)
|
| 54 |
+
|
| 55 |
+
| Model ID | Bits Per Weight | Model Size HF | Average Quality Score |
|
| 56 |
+
| --- | --- | --- | --- |
|
| 57 |
+
| [KQ-1](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-2.77bpw.gguf) | 2.77 | 1.4 GB | 70.33% |
|
| 58 |
+
| [KQ-2](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-2.95bpw.gguf) | 2.95 | 1.49 GB | 79.81% |
|
| 59 |
+
| [KQ-3](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-3.19bpw.gguf) | 3.19 | 1.61 GB | 87.31% |
|
| 60 |
+
| [KQ-4](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-3.34bpw.gguf) | 3.34 | 1.69 GB | 92.04% |
|
| 61 |
+
| [KQ-5](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-3.45bpw.gguf) | 3.45 | 1.74 GB | 93.01% |
|
| 62 |
+
| [KQ-6](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_S-3.66bpw.gguf) | 3.66 | 1.84 GB | 94.46% |
|
| 63 |
+
| [KQ-7](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_S-3.87bpw.gguf) | 3.87 | 1.95 GB | 95.89% |
|
| 64 |
+
| [KQ-8](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_S-4.31bpw.gguf) | 4.31 | 2.17 GB | 98.44% |
|
| 65 |
+
| [KQ-9](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q5_K_S-4.74bpw.gguf) | 4.74 | 2.39 GB | 98.95% |
|
| 66 |
+
|
| 67 |
+
## GGUF-IQ Models: Best for GPU
|
| 68 |
+
|
| 69 |
+

|
| 70 |
+
|
| 71 |
+
**Table sorted by inference speed:** (match the chart’s numbers with the model IDs below)
|
| 72 |
+
|
| 73 |
+
| Model ID | Bits Per Weight | Model Size HF | Average Quality Score |
|
| 74 |
+
| --- | --- | --- | --- |
|
| 75 |
+
| [IQ-1](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-2.55bpw.gguf) | 2.55 | 1.29 GB | 69.87% |
|
| 76 |
+
| [IQ-2](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-2.76bpw.gguf) | 2.76 | 1.39 GB | 83.32% |
|
| 77 |
+
| [IQ-3](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-2.94bpw.gguf) | 2.94 | 1.49 GB | 89.04% |
|
| 78 |
+
| [IQ-4](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-3.07bpw.gguf) | 3.07 | 1.55 GB | 92.32% |
|
| 79 |
+
| [IQ-5](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-3.31bpw.gguf) | 3.31 | 1.67 GB | 92.45% |
|
| 80 |
+
| [IQ-6](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ4_XS-3.55bpw.gguf) | 3.55 | 1.79 GB | 95.16% |
|
| 81 |
+
| [IQ-7](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ4_XS-4.04bpw.gguf) | 4.04 | 2.04 GB | 99.25% |
|
| 82 |
+
| [IQ-8](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ4_XS-4.54bpw.gguf) | 4.54 | 2.29 GB | 99.80% |
|
img/Intel.png
ADDED
|
Git LFS Details
|
img/RTX5090.png
ADDED
|
Git LFS Details
|