add perplexity graph
Browse files- README.md +5 -6
- images/perplexity.png +3 -0
README.md
CHANGED
|
@@ -24,7 +24,6 @@ Currently cooking this now!
|
|
| 24 |
- [x] adjust MTP nextn tensors to full q8_0 (won't effect RAM+VRAM usage otherwise)
|
| 25 |
- [x] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
|
| 26 |
- [x] upload IQ5_K if all looking good
|
| 27 |
-
- [ ] upload smol-IQ4_KSS if all looking good
|
| 28 |
- [ ] continue with smaller quants
|
| 29 |
- [ ] check if any folks open discussions with desired RAM/VRAM breakpoints
|
| 30 |
|
|
@@ -113,8 +112,8 @@ numactl -N ${SOCKET} -m ${SOCKET} \
|
|
| 113 |
|
| 114 |
</details>
|
| 115 |
|
| 116 |
-
##
|
| 117 |
-
Final estimate: PPL over 565 chunks for n_ctx=512 =
|
| 118 |
|
| 119 |
<details>
|
| 120 |
|
|
@@ -142,7 +141,7 @@ blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
|
|
| 142 |
|
| 143 |
# Routed Experts Layers [3-92]
|
| 144 |
blk\..*\.ffn_down_exps\.weight=iq4_kss
|
| 145 |
-
blk\..*\.ffn_(gate|up)_exps\.weight=
|
| 146 |
|
| 147 |
# NextN MTP Layer [92]
|
| 148 |
blk\..*\.nextn\.embed_tokens\.weight=q8_0
|
|
@@ -164,8 +163,8 @@ numactl -N ${SOCKET} -m ${SOCKET} \
|
|
| 164 |
--custom-q "$custom" \
|
| 165 |
--imatrix /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat \
|
| 166 |
/mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf \
|
| 167 |
-
/mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-4.7-
|
| 168 |
-
|
| 169 |
128
|
| 170 |
```
|
| 171 |
|
|
|
|
| 24 |
- [x] adjust MTP nextn tensors to full q8_0 (won't effect RAM+VRAM usage otherwise)
|
| 25 |
- [x] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
|
| 26 |
- [x] upload IQ5_K if all looking good
|
|
|
|
| 27 |
- [ ] continue with smaller quants
|
| 28 |
- [ ] check if any folks open discussions with desired RAM/VRAM breakpoints
|
| 29 |
|
|
|
|
| 112 |
|
| 113 |
</details>
|
| 114 |
|
| 115 |
+
## IQ3_KS 155.219 GiB (3.721 BPW)
|
| 116 |
+
Final estimate: PPL over 565 chunks for n_ctx=512 = 4.1330 +/- 0.02573
|
| 117 |
|
| 118 |
<details>
|
| 119 |
|
|
|
|
| 141 |
|
| 142 |
# Routed Experts Layers [3-92]
|
| 143 |
blk\..*\.ffn_down_exps\.weight=iq4_kss
|
| 144 |
+
blk\..*\.ffn_(gate|up)_exps\.weight=iq3_ks
|
| 145 |
|
| 146 |
# NextN MTP Layer [92]
|
| 147 |
blk\..*\.nextn\.embed_tokens\.weight=q8_0
|
|
|
|
| 163 |
--custom-q "$custom" \
|
| 164 |
--imatrix /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat \
|
| 165 |
/mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf \
|
| 166 |
+
/mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-4.7-IQ3_KS.gguf \
|
| 167 |
+
IQ3_KS \
|
| 168 |
128
|
| 169 |
```
|
| 170 |
|
images/perplexity.png
ADDED
|
Git LFS Details
|