Ali93H commited on
Commit
9756bcb
·
verified ·
1 Parent(s): bddf1b1

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. .gitattributes +2 -0
  2. README.md +71 -1
  3. img/Intel.png +3 -0
  4. img/RTX5090.png +3 -0
.gitattributes CHANGED
@@ -51,3 +51,5 @@ Qwen3-4B-Instruct-2507-Q4_K_S-3.66bpw.gguf filter=lfs diff=lfs merge=lfs -text
51
  Qwen3-4B-Instruct-2507-Q4_K_S-3.87bpw.gguf filter=lfs diff=lfs merge=lfs -text
52
  Qwen3-4B-Instruct-2507-Q4_K_S-4.31bpw.gguf filter=lfs diff=lfs merge=lfs -text
53
  Qwen3-4B-Instruct-2507-Q5_K_S-4.74bpw.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
51
  Qwen3-4B-Instruct-2507-Q4_K_S-3.87bpw.gguf filter=lfs diff=lfs merge=lfs -text
52
  Qwen3-4B-Instruct-2507-Q4_K_S-4.31bpw.gguf filter=lfs diff=lfs merge=lfs -text
53
  Qwen3-4B-Instruct-2507-Q5_K_S-4.74bpw.gguf filter=lfs diff=lfs merge=lfs -text
54
+ img/Intel.png filter=lfs diff=lfs merge=lfs -text
55
+ img/RTX5090.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -9,4 +9,74 @@ tags:
9
  - qwen
10
  - qwen3
11
  - byteshape
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - qwen
10
  - qwen3
11
  - byteshape
12
+ ---
13
+
14
+ # Qwen3-4B-Instruct-2507 GGUF (ShapeLearn Quantized)
15
+
16
+ This is a GGUF-quantized version of Qwen3-4B-Instruct-2507 produced with **ByteShape's ShapeLearn**, which learns the optimal datatype per tensor to maintain high quality even at very low bit lengths (the exclusive focus on this release).
17
+
18
+ <div align="center">
19
+
20
+ ### [**Read Our Blog for Detailed Benchmarks**](https://byteshape.com/blogs/Qwen3-4B-I-2507/)
21
+
22
+ *Comprehensive benchmarks across multiple GPUs, CPUs, and even Raspberry Pi*
23
+
24
+ ### [**Join the Discussion on Reddit**](https://www.reddit.com/r/ByteShape/)
25
+
26
+ *Questions? Feedback? Connect with the ByteShape community*
27
+
28
+ </div>
29
+
30
+ ---
31
+
32
+ # How to Pick a Model
33
+
34
+ We provide **CPU and GPU optimized variants** for llama.cpp:
35
+
36
+ - **CPUs**: KQ quantization is the preferred choice due to GGML kernel performance
37
+ - **Nvidia GPUs**: IQ quantization delivers faster performance
38
+
39
+ Each hardware target includes a range of models covering different size and quality tradeoffs.
40
+
41
+ ### Understanding the Charts
42
+
43
+ The charts below show **quality vs. tokens per second** for each device, including ShapeLearn models alongside Unsloth baselines for direct comparison.
44
+
45
+ **Selection Strategy:** Choose the model with the best quality at your target throughput, or the fastest model that meets your quality requirements.
46
+
47
+ ---
48
+
49
+ ## GGUF-KQ Models: Best for CPU
50
+
51
+ ![CPU Benchmark - Intel](img/Intel.png)
52
+
53
+ **Table sorted by inference speed:** (match the chart’s numbers with the model IDs below)
54
+
55
+ | Model ID | Bits Per Weight | Model Size HF | Average Quality Score |
56
+ | --- | --- | --- | --- |
57
+ | [KQ-1](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-2.77bpw.gguf) | 2.77 | 1.4 GB | 70.33% |
58
+ | [KQ-2](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-2.95bpw.gguf) | 2.95 | 1.49 GB | 79.81% |
59
+ | [KQ-3](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-3.19bpw.gguf) | 3.19 | 1.61 GB | 87.31% |
60
+ | [KQ-4](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-3.34bpw.gguf) | 3.34 | 1.69 GB | 92.04% |
61
+ | [KQ-5](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q3_K_S-3.45bpw.gguf) | 3.45 | 1.74 GB | 93.01% |
62
+ | [KQ-6](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_S-3.66bpw.gguf) | 3.66 | 1.84 GB | 94.46% |
63
+ | [KQ-7](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_S-3.87bpw.gguf) | 3.87 | 1.95 GB | 95.89% |
64
+ | [KQ-8](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_S-4.31bpw.gguf) | 4.31 | 2.17 GB | 98.44% |
65
+ | [KQ-9](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q5_K_S-4.74bpw.gguf) | 4.74 | 2.39 GB | 98.95% |
66
+
67
+ ## GGUF-IQ Models: Best for GPU
68
+
69
+ ![GPU Benchmark - RTX 5090](img/RTX5090.png)
70
+
71
+ **Table sorted by inference speed:** (match the chart’s numbers with the model IDs below)
72
+
73
+ | Model ID | Bits Per Weight | Model Size HF | Average Quality Score |
74
+ | --- | --- | --- | --- |
75
+ | [IQ-1](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-2.55bpw.gguf) | 2.55 | 1.29 GB | 69.87% |
76
+ | [IQ-2](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-2.76bpw.gguf) | 2.76 | 1.39 GB | 83.32% |
77
+ | [IQ-3](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-2.94bpw.gguf) | 2.94 | 1.49 GB | 89.04% |
78
+ | [IQ-4](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-3.07bpw.gguf) | 3.07 | 1.55 GB | 92.32% |
79
+ | [IQ-5](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ3_S-3.31bpw.gguf) | 3.31 | 1.67 GB | 92.45% |
80
+ | [IQ-6](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ4_XS-3.55bpw.gguf) | 3.55 | 1.79 GB | 95.16% |
81
+ | [IQ-7](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ4_XS-4.04bpw.gguf) | 4.04 | 2.04 GB | 99.25% |
82
+ | [IQ-8](https://huggingface.co/byteshape/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-IQ4_XS-4.54bpw.gguf) | 4.54 | 2.29 GB | 99.80% |
img/Intel.png ADDED

Git LFS Details

  • SHA256: 49a83f8ec63bf19082bca0e0582b833d40ffc1008ce2a2570248377d8960e61f
  • Pointer size: 131 Bytes
  • Size of remote file: 709 kB
img/RTX5090.png ADDED

Git LFS Details

  • SHA256: cf2ec8c3bc0c9be6701f4d4c5816a658b43194d62e2108cfab2fab15c82ee0a8
  • Pointer size: 131 Bytes
  • Size of remote file: 722 kB