ubergarm commited on
Commit
aeb1741
·
1 Parent(s): c8ac87b

add perplexity graph

Browse files
Files changed (2) hide show
  1. README.md +5 -6
  2. images/perplexity.png +3 -0
README.md CHANGED
@@ -24,7 +24,6 @@ Currently cooking this now!
24
  - [x] adjust MTP nextn tensors to full q8_0 (won't effect RAM+VRAM usage otherwise)
25
  - [x] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
26
  - [x] upload IQ5_K if all looking good
27
- - [ ] upload smol-IQ4_KSS if all looking good
28
  - [ ] continue with smaller quants
29
  - [ ] check if any folks open discussions with desired RAM/VRAM breakpoints
30
 
@@ -113,8 +112,8 @@ numactl -N ${SOCKET} -m ${SOCKET} \
113
 
114
  </details>
115
 
116
- ## smol-IQ4_KSS TODO
117
- Final estimate: PPL over 565 chunks for n_ctx=512 = TODO
118
 
119
  <details>
120
 
@@ -142,7 +141,7 @@ blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
142
 
143
  # Routed Experts Layers [3-92]
144
  blk\..*\.ffn_down_exps\.weight=iq4_kss
145
- blk\..*\.ffn_(gate|up)_exps\.weight=iq4_kss
146
 
147
  # NextN MTP Layer [92]
148
  blk\..*\.nextn\.embed_tokens\.weight=q8_0
@@ -164,8 +163,8 @@ numactl -N ${SOCKET} -m ${SOCKET} \
164
  --custom-q "$custom" \
165
  --imatrix /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat \
166
  /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf \
167
- /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-4.7-smol-IQ4_KSS.gguf \
168
- IQ4_KSS \
169
  128
170
  ```
171
 
 
24
  - [x] adjust MTP nextn tensors to full q8_0 (won't effect RAM+VRAM usage otherwise)
25
  - [x] cook IQ5_K with full q8_0 attn/shexp/first 3 dense layers and test
26
  - [x] upload IQ5_K if all looking good
 
27
  - [ ] continue with smaller quants
28
  - [ ] check if any folks open discussions with desired RAM/VRAM breakpoints
29
 
 
112
 
113
  </details>
114
 
115
+ ## IQ3_KS 155.219 GiB (3.721 BPW)
116
+ Final estimate: PPL over 565 chunks for n_ctx=512 = 4.1330 +/- 0.02573
117
 
118
  <details>
119
 
 
141
 
142
  # Routed Experts Layers [3-92]
143
  blk\..*\.ffn_down_exps\.weight=iq4_kss
144
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq3_ks
145
 
146
  # NextN MTP Layer [92]
147
  blk\..*\.nextn\.embed_tokens\.weight=q8_0
 
163
  --custom-q "$custom" \
164
  --imatrix /mnt/data/models/ubergarm/GLM-4.7-GGUF/imatrix-GLM-4.7-BF16.dat \
165
  /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-160x21B-4.7-BF16-00001-of-00015.gguf \
166
+ /mnt/data/models/ubergarm/GLM-4.7-GGUF/GLM-4.7-IQ3_KS.gguf \
167
+ IQ3_KS \
168
  128
169
  ```
170
 
images/perplexity.png ADDED

Git LFS Details

  • SHA256: 832022c68f44ed12a28d8f4c9da7504c89e460731d4c2ffa06ce6b99181fc72d
  • Pointer size: 131 Bytes
  • Size of remote file: 128 kB