Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

.gitattributes +2 -0
README.md +71 -1
img/Intel.png +3 -0
img/RTX5090.png +3 -0

.gitattributes CHANGED Viewed

@@ -51,3 +51,5 @@ Qwen3-4B-Instruct-2507-Q4_K_S-3.66bpw.gguf filter=lfs diff=lfs merge=lfs -text
 Qwen3-4B-Instruct-2507-Q4_K_S-3.87bpw.gguf filter=lfs diff=lfs merge=lfs -text
 Qwen3-4B-Instruct-2507-Q4_K_S-4.31bpw.gguf filter=lfs diff=lfs merge=lfs -text
 Qwen3-4B-Instruct-2507-Q5_K_S-4.74bpw.gguf filter=lfs diff=lfs merge=lfs -text

 Qwen3-4B-Instruct-2507-Q4_K_S-3.87bpw.gguf filter=lfs diff=lfs merge=lfs -text
 Qwen3-4B-Instruct-2507-Q4_K_S-4.31bpw.gguf filter=lfs diff=lfs merge=lfs -text
 Qwen3-4B-Instruct-2507-Q5_K_S-4.74bpw.gguf filter=lfs diff=lfs merge=lfs -text
+img/Intel.png filter=lfs diff=lfs merge=lfs -text
+img/RTX5090.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -9,4 +9,74 @@ tags:
 - qwen
 - qwen3
 - byteshape
----

 - qwen
 - qwen3
 - byteshape
+---
+# Qwen3-4B-Instruct-2507 GGUF (ShapeLearn Quantized)
+This is a GGUF-quantized version of Qwen3-4B-Instruct-2507 produced with **ByteShape's ShapeLearn**, which learns the optimal datatype per tensor to maintain high quality even at very low bit lengths (the exclusive focus on this release).
+<div align="center">
+### [**Read Our Blog for Detailed Benchmarks**](https://byteshape.com/blogs/Qwen3-4B-I-2507/)
+*Comprehensive benchmarks across multiple GPUs, CPUs, and even Raspberry Pi*
+### [**Join the Discussion on Reddit**](https://www.reddit.com/r/ByteShape/)
+*Questions? Feedback? Connect with the ByteShape community*
+</div>
+---
+# How to Pick a Model
+We provide **CPU and GPU optimized variants** for llama.cpp:
+- **CPUs**: KQ quantization is the preferred choice due to GGML kernel performance
+- **Nvidia GPUs**: IQ quantization delivers faster performance
+Each hardware target includes a range of models covering different size and quality tradeoffs.
+### Understanding the Charts
+The charts below show **quality vs. tokens per second** for each device, including ShapeLearn models alongside Unsloth baselines for direct comparison.
+**Selection Strategy:** Choose the model with the best quality at your target throughput, or the fastest model that meets your quality requirements.
+---
+## GGUF-KQ Models: Best for CPU
+![CPU Benchmark - Intel](img/Intel.png)
+**Table sorted by inference speed:** (match the chart’s numbers with the model IDs below)
+| Model ID | Bits Per Weight | Model Size HF | Average Quality Score |
+| --- | --- | --- | --- |
+| [KQ-1](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-2.77bpw.gguf) | 2.77 | 1.4 GB | 70.33% |
+| [KQ-2](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-2.95bpw.gguf) | 2.95 | 1.49 GB | 79.81% |
+| [KQ-3](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-3.19bpw.gguf) | 3.19 | 1.61 GB | 87.31% |
+| [KQ-4](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-3.34bpw.gguf) | 3.34 | 1.69 GB | 92.04% |
+| [KQ-5](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-3.45bpw.gguf) | 3.45 | 1.74 GB | 93.01% |
+| [KQ-6](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_S-3.66bpw.gguf) | 3.66 | 1.84 GB | 94.46% |
+| [KQ-7](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_S-3.87bpw.gguf) | 3.87 | 1.95 GB | 95.89% |
+| [KQ-8](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_S-4.31bpw.gguf) | 4.31 | 2.17 GB | 98.44% |
+| [KQ-9](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q5_K_S-4.74bpw.gguf) | 4.74 | 2.39 GB | 98.95% |
+## GGUF-IQ Models: Best for GPU
+![GPU Benchmark - RTX 5090](img/RTX5090.png)
+**Table sorted by inference speed:** (match the chart’s numbers with the model IDs below)
+| Model ID | Bits Per Weight | Model Size HF | Average Quality Score |
+| --- | --- | --- | --- |
+| [IQ-1](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-2.55bpw.gguf) | 2.55 | 1.29 GB | 69.87% |
+| [IQ-2](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-2.76bpw.gguf) | 2.76 | 1.39 GB | 83.32% |
+| [IQ-3](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-2.94bpw.gguf) | 2.94 | 1.49 GB | 89.04% |
+| [IQ-4](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-3.07bpw.gguf) | 3.07 | 1.55 GB | 92.32% |
+| [IQ-5](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-3.31bpw.gguf) | 3.31 | 1.67 GB | 92.45% |
+| [IQ-6](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ4_XS-3.55bpw.gguf) | 3.55 | 1.79 GB | 95.16% |
+| [IQ-7](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ4_XS-4.04bpw.gguf) | 4.04 | 2.04 GB | 99.25% |
+| [IQ-8](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ4_XS-4.54bpw.gguf) | 4.54 | 2.29 GB | 99.80% |

img/Intel.png ADDED Viewed

Git LFS Details

SHA256: 49a83f8ec63bf19082bca0e0582b833d40ffc1008ce2a2570248377d8960e61f
Pointer size: 131 Bytes
Size of remote file: 709 kB

img/RTX5090.png ADDED Viewed

Git LFS Details

SHA256: cf2ec8c3bc0c9be6701f4d4c5816a658b43194d62e2108cfab2fab15c82ee0a8
Pointer size: 131 Bytes
Size of remote file: 722 kB