add perplexity graph

Files changed (2) hide show

README.md CHANGED Viewed

@@ -24,7 +24,6 @@ Currently cooking this now!
 - [x] adjust MTP nextn tensors to full q8_0 (won't effect RAM+VRAM usage otherwise)
 - [x] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
 - [x] upload IQ5_K if all looking good
-- [ ] upload smol-IQ4_KSS if all looking good
 - [ ] continue with smaller quants
 - [ ] check if any folks open discussions with desired RAM/VRAM breakpoints
@@ -113,8 +112,8 @@ numactl -N ${SOCKET} -m ${SOCKET} \
 </details>
-## smol-IQ4_KSS TODO
-Final estimate: PPL over 565 chunks for n_ctx=512 = TODO
 <details>
@@ -142,7 +141,7 @@ blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
 # Routed Experts Layers [3-92]
 blk\..*\.ffn_down_exps\.weight=iq4_kss
-blk\..*\.ffn_(gate|up)_exps\.weight=iq4_kss
 # NextN MTP Layer [92]
 blk\..*\.nextn\.embed_tokens\.weight=q8_0
@@ -164,8 +163,8 @@ numactl -N ${SOCKET} -m ${SOCKET} \
     --custom-q "$custom" \
     --imatrix /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat \
     /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf \
-    /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-4.7-smol-IQ4_KSS.gguf \
-    IQ4_KSS \
     128
 ```

 - [x] adjust MTP nextn tensors to full q8_0 (won't effect RAM+VRAM usage otherwise)
 - [x] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
 - [x] upload IQ5_K if all looking good
 - [ ] continue with smaller quants
 - [ ] check if any folks open discussions with desired RAM/VRAM breakpoints
 </details>
+## IQ3_KS 155.219 GiB (3.721 BPW)
+Final estimate: PPL over 565 chunks for n_ctx=512 = 4.1330 +/- 0.02573
 <details>
 # Routed Experts Layers [3-92]
 blk\..*\.ffn_down_exps\.weight=iq4_kss
+blk\..*\.ffn_(gate|up)_exps\.weight=iq3_ks
 # NextN MTP Layer [92]
 blk\..*\.nextn\.embed_tokens\.weight=q8_0
     --custom-q "$custom" \
     --imatrix /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat \
     /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf \
+    /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-4.7-IQ3_KS.gguf \
+    IQ3_KS \
     128
 ```

images/perplexity.png ADDED Viewed