ddh0 commited on
Commit
0a930e1
·
verified ·
1 Parent(s): 92ca806

Delete tensor_type_testing.txt

Browse files
Files changed (1) hide show
  1. tensor_type_testing.txt +0 -449
tensor_type_testing.txt DELETED
@@ -1,449 +0,0 @@
1
-
2
- ---
3
-
4
- # Tensor Type Testing
5
-
6
- ---
7
-
8
- ## Quantization naming scheme:
9
-
10
- ```
11
- Model-Name-E{TYPE_EMBD}-F{TYPE_FFN}-A{TYPE_ATTN}-O{TYPE_OUTPUT}.gguf
12
- ```
13
-
14
- for example `Llama-3.1-8B-Instruct-EQ4_K-FQ4_K-AQ8_0-OQ8_0.gguf`:
15
- - Model is Llama 3.1 8B Instruct
16
- - TYPE_EMBD (token embeddings) are in Q4_K
17
- - TYPE_FFN (MLP / feed-forward tensors) are in Q4_K
18
- - TYPE_ATTN (K,Q,V attention and attention output tensors) are in Q8_0
19
- - TYPE_OUTPUT (output tensor) is in Q8_0
20
-
21
- ---
22
-
23
- ## Command template:
24
-
25
- ```bash
26
- TYPE_EMBD=GGML_TYPE
27
- TYPE_FFN=GGML_TYPE
28
- TYPE_ATTN=GGML_TYPE
29
- TYPE_OUTPUT=GGML_TYPE
30
- SRC_GGUF=/my/model/orig.gguf
31
- DST_GGUF=/my/model/quant.gguf
32
- N_THREADS=4
33
-
34
- ./llama.cpp/build/bin/llama-quantize --token-embedding-type $TYPE_EMBD --tensor-type ffn_down=$TYPE_FFN --tensor-type ffn_gate=$TYPE_FFN --tensor-type ffn_up=$TYPE_FFN --tensor-type attn_k=$TYPE_ATTN --tensor-type attn_q=$TYPE_ATTN --tensor-type attn_v=$TYPE_ATTN --tensor-type attn_out=$TYPE_ATTN --output-tensor-type $TYPE_OUTPUT $SRC_GGUF $DST_GGUF $TYPE_FFN $N_THREADS
35
- ```
36
-
37
- ---
38
-
39
- ## Commands used for Llama 3.2
40
-
41
- ---
42
-
43
- ### Llama 3.2 3B - Crush token embeddings to Q2_K, otherwise Q8_0
44
-
45
- ```bash
46
- TYPE_EMBD=Q2_K
47
- TYPE_FFN=Q8_0
48
- TYPE_ATTN=Q8_0
49
- TYPE_OUTPUT=Q8_0
50
- SRC_GGUF=/opt/workspace/gguf/Llama-3.2-3B-BF16.gguf
51
- DST_GGUF=/opt/workspace/gguf/Llama-3.2-3B-EQ2_K-FQ8_0-AQ8_0-OQ8_0.gguf
52
- N_THREADS=16
53
-
54
- ./llama.cpp/build/bin/llama-quantize --token-embedding-type $TYPE_EMBD --tensor-type ffn_down=$TYPE_FFN --tensor-type ffn_gate=$TYPE_FFN --tensor-type ffn_up=$TYPE_FFN --tensor-type attn_k=$TYPE_ATTN --tensor-type attn_q=$TYPE_ATTN --tensor-type attn_v=$TYPE_ATTN --tensor-type attn_out=$TYPE_ATTN --output-tensor-type $TYPE_OUTPUT $SRC_GGUF $DST_GGUF $TYPE_FFN $N_THREADS
55
- ```
56
-
57
- ---
58
-
59
- ### Llama 3.2 3B - Crush FFN to Q2_K, otherwise Q8_0
60
-
61
- ```bash
62
- TYPE_EMBD=Q8_0
63
- TYPE_FFN=Q2_K
64
- TYPE_ATTN=Q8_0
65
- TYPE_OUTPUT=Q8_0
66
- SRC_GGUF=/opt/workspace/gguf/Llama-3.2-3B-BF16.gguf
67
- DST_GGUF=/opt/workspace/gguf/Llama-3.2-3B-EQ8_0-FQ2_K-AQ8_0-OQ8_0.gguf
68
- N_THREADS=16
69
-
70
- ./llama.cpp/build/bin/llama-quantize --token-embedding-type $TYPE_EMBD --tensor-type ffn_down=$TYPE_FFN --tensor-type ffn_gate=$TYPE_FFN --tensor-type ffn_up=$TYPE_FFN --tensor-type attn_k=$TYPE_ATTN --tensor-type attn_q=$TYPE_ATTN --tensor-type attn_v=$TYPE_ATTN --tensor-type attn_out=$TYPE_ATTN --output-tensor-type $TYPE_OUTPUT $SRC_GGUF $DST_GGUF $TYPE_FFN $N_THREADS
71
- ```
72
-
73
- ---
74
-
75
- ### Llama 3.2 3B - Crush attention to Q2_K, otherwise Q8_0
76
-
77
- ```bash
78
- TYPE_EMBD=Q8_0
79
- TYPE_FFN=Q8_0
80
- TYPE_ATTN=Q2_K
81
- TYPE_OUTPUT=Q8_0
82
- SRC_GGUF=/opt/workspace/gguf/Llama-3.2-3B-BF16.gguf
83
- DST_GGUF=/opt/workspace/gguf/Llama-3.2-3B-EQ8_0-FQ8_0-AQ2_K-OQ8_0.gguf
84
- N_THREADS=16
85
-
86
- ./llama.cpp/build/bin/llama-quantize --token-embedding-type $TYPE_EMBD --tensor-type ffn_down=$TYPE_FFN --tensor-type ffn_gate=$TYPE_FFN --tensor-type ffn_up=$TYPE_FFN --tensor-type attn_k=$TYPE_ATTN --tensor-type attn_q=$TYPE_ATTN --tensor-type attn_v=$TYPE_ATTN --tensor-type attn_out=$TYPE_ATTN --output-tensor-type $TYPE_OUTPUT $SRC_GGUF $DST_GGUF $TYPE_FFN $N_THREADS
87
- ```
88
-
89
- ---
90
-
91
- ### Llama 3.2 3B - Crush output tensor to Q2_K, otherwise Q8_0
92
-
93
- > **This result was not included because Llama 3.2 3B has no output tensor! The resulting file is the same as a normal Q8_0.**
94
-
95
- ```bash
96
- TYPE_EMBD=Q8_0
97
- TYPE_FFN=Q8_0
98
- TYPE_ATTN=Q8_0
99
- TYPE_OUTPUT=Q2_K
100
- SRC_GGUF=/opt/workspace/gguf/Llama-3.2-3B-BF16.gguf
101
- DST_GGUF=/opt/workspace/gguf/Llama-3.2-3B-EQ8_0-FQ8_0-AQ8_0-OQ2_K.gguf
102
- N_THREADS=16
103
-
104
- ./llama.cpp/build/bin/llama-quantize --token-embedding-type $TYPE_EMBD --tensor-type ffn_down=$TYPE_FFN --tensor-type ffn_gate=$TYPE_FFN --tensor-type ffn_up=$TYPE_FFN --tensor-type attn_k=$TYPE_ATTN --tensor-type attn_q=$TYPE_ATTN --tensor-type attn_v=$TYPE_ATTN --tensor-type attn_out=$TYPE_ATTN --output-tensor-type $TYPE_OUTPUT $SRC_GGUF $DST_GGUF $TYPE_FFN $N_THREADS
105
- ```
106
-
107
- ---
108
-
109
- ## Raw results for Llama 3.2 3B
110
-
111
- ```
112
- Number of input texts: 10
113
- Shortest input length in tokens: 55
114
- Longest input length in tokens: 4678
115
- Average input length in tokens: 1605.5
116
- Total number of input tokens: 16055
117
- --------------------------------------------------------------------------------
118
- Evaluating baseline model Llama-3.2-3B-BF16.gguf...
119
- Load model...
120
- Evaluate prompts...
121
- Unload model...
122
- --------------------------------------------------------------------------------
123
- Now processing: Llama-3.2-3B-Q2_K.gguf
124
- Load model...
125
- Evaluate prompts...
126
- Unload model...
127
- Compute MSD...
128
- Mean-Squared Deviation - Llama-3.2-3B-BF16.gguf vs. Llama-3.2-3B-Q2_K.gguf:
129
- -- Prompt 0: 1.2261667251586914
130
- -- Prompt 1: 1.1347604990005493
131
- -- Prompt 2: 1.388033390045166
132
- -- Prompt 3: 1.1053369045257568
133
- -- Prompt 4: 1.7510676383972168
134
- -- Prompt 5: 4.586221218109131
135
- -- Prompt 6: 1.3651360273361206
136
- -- Prompt 7: 0.8970077037811279
137
- -- Prompt 8: 0.3409916162490845
138
- -- Prompt 9: 1.2506738901138306
139
- Average MSD: 1.5045396089553833
140
- --------------------------------------------------------------------------------
141
- Now processing: Llama-3.2-3B-EQ2_K-FQ8_0-AQ8_0-OQ8_0.gguf
142
- Load model...
143
- Evaluate prompts...
144
- Unload model...
145
- Compute MSD...
146
- Mean-Squared Deviation - Llama-3.2-3B-BF16.gguf vs. Llama-3.2-3B-EQ2_K-FQ8_0-AQ8_0-OQ8_0.gguf:
147
- -- Prompt 0: 0.3589555025100708
148
- -- Prompt 1: 0.1420530527830124
149
- -- Prompt 2: 0.3871675133705139
150
- -- Prompt 3: 0.38336610794067383
151
- -- Prompt 4: 0.4630553722381592
152
- -- Prompt 5: 0.3928600549697876
153
- -- Prompt 6: 0.46294596791267395
154
- -- Prompt 7: 0.41983363032341003
155
- -- Prompt 8: 0.0822080597281456
156
- -- Prompt 9: 0.3548887372016907
157
- Average MSD: 0.34473341703414917
158
- --------------------------------------------------------------------------------
159
- Now processing: Llama-3.2-3B-EQ8_0-FQ2_K-AQ8_0-OQ8_0.gguf
160
- Load model...
161
- Evaluate prompts...
162
- Unload model...
163
- Compute MSD...
164
- Mean-Squared Deviation - Llama-3.2-3B-BF16.gguf vs. Llama-3.2-3B-EQ8_0-FQ2_K-AQ8_0-OQ8_0.gguf:
165
- -- Prompt 0: 4.409396648406982
166
- -- Prompt 1: 2.431891679763794
167
- -- Prompt 2: 5.892056941986084
168
- -- Prompt 3: 4.688146591186523
169
- -- Prompt 4: 6.351741313934326
170
- -- Prompt 5: 8.826679229736328
171
- -- Prompt 6: 4.506043434143066
172
- -- Prompt 7: 4.613113880157471
173
- -- Prompt 8: 1.0596126317977905
174
- -- Prompt 9: 4.1558661460876465
175
- Average MSD: 4.693454742431641
176
- --------------------------------------------------------------------------------
177
- Now processing: Llama-3.2-3B-EQ8_0-FQ8_0-AQ2_K-OQ8_0.gguf
178
- Load model...
179
- Evaluate prompts...
180
- Unload model...
181
- Compute MSD...
182
- Mean-Squared Deviation - Llama-3.2-3B-BF16.gguf vs. Llama-3.2-3B-EQ8_0-FQ8_0-AQ2_K-OQ8_0.gguf:
183
- -- Prompt 0: 1.0618470907211304
184
- -- Prompt 1: 1.1212399005889893
185
- -- Prompt 2: 1.3122810125350952
186
- -- Prompt 3: 0.9195016026496887
187
- -- Prompt 4: 1.201547622680664
188
- -- Prompt 5: 5.760651111602783
189
- -- Prompt 6: 1.0914928913116455
190
- -- Prompt 7: 0.9646959900856018
191
- -- Prompt 8: 0.41648873686790466
192
- -- Prompt 9: 1.4317259788513184
193
- Average MSD: 1.5281471014022827
194
- --------------------------------------------------------------------------------
195
- Now processing: Llama-3.2-3B-Q8_0.gguf
196
- Load model...
197
- Evaluate prompts...
198
- Unload model...
199
- Compute MSD...
200
- Mean-Squared Deviation - Llama-3.2-3B-BF16.gguf vs. Llama-3.2-3B-Q8_0.gguf:
201
- -- Prompt 0: 0.0023212190717458725
202
- -- Prompt 1: 0.0014450754970312119
203
- -- Prompt 2: 0.003914575092494488
204
- -- Prompt 3: 0.002514646854251623
205
- -- Prompt 4: 0.003313937224447727
206
- -- Prompt 5: 0.004224818665534258
207
- -- Prompt 6: 0.0026909655425697565
208
- -- Prompt 7: 0.0033839084208011627
209
- -- Prompt 8: 0.0015104531776160002
210
- -- Prompt 9: 0.002354747150093317
211
- Average MSD: 0.0027674345765262842
212
- --------------------------------------------------------------------------------
213
- Average Mean-Squared Deviation compared to Llama-3.2-3B-BF16.gguf:
214
- --------------------------------------------------------------------------------
215
- Llama-3.2-3B-Q2_K.gguf -- 1.5045396089553833
216
- Llama-3.2-3B-EQ2_K-FQ8_0-AQ8_0-OQ8_0.gguf -- 0.34473341703414917
217
- Llama-3.2-3B-EQ8_0-FQ2_K-AQ8_0-OQ8_0.gguf -- 4.693454742431641
218
- Llama-3.2-3B-EQ8_0-FQ8_0-AQ2_K-OQ8_0.gguf -- 1.5281471014022827
219
- Llama-3.2-3B-Q8_0.gguf -- 0.0027674345765262842
220
- --------------------------------------------------------------------------------
221
- ```
222
-
223
- ---
224
-
225
- ## Summarized results for Llama 3.2 3B
226
-
227
- Approximate Mean-Squared Deviation as compared to BF16, average over 10 inputs (lower is better):
228
- - Standard Q8_0 quant: **0.002**
229
- - Crush token embeddings to Q2_K, otherwise Q8_0: **0.344**
230
- - Standard Q2_K quant: **1.504**
231
- - Crush attention to Q2_K, otherwise Q8_0: **1.528**
232
- - Crush FFN to Q2_K, otherwise Q8_0: **4.693**
233
-
234
- ---
235
-
236
- ## Commands used for Qwen2.5-14B
237
-
238
- ---
239
-
240
- ### Qwen2.5-14B - Crush token embeddings to Q2_K, otherwise Q8_0
241
-
242
- ```bash
243
- TYPE_EMBD=Q2_K
244
- TYPE_FFN=Q8_0
245
- TYPE_ATTN=Q8_0
246
- TYPE_OUTPUT=Q8_0
247
- SRC_GGUF=/opt/workspace/gguf/Qwen2.5-14B-BF16.gguf
248
- DST_GGUF=/opt/workspace/gguf/Qwen2.5-14B-EQ2_K-FQ8_0-AQ8_0-OQ8_0.gguf
249
- N_THREADS=16
250
-
251
- ./llama.cpp/build/bin/llama-quantize --token-embedding-type $TYPE_EMBD --tensor-type ffn_down=$TYPE_FFN --tensor-type ffn_gate=$TYPE_FFN --tensor-type ffn_up=$TYPE_FFN --tensor-type attn_k=$TYPE_ATTN --tensor-type attn_q=$TYPE_ATTN --tensor-type attn_v=$TYPE_ATTN --tensor-type attn_out=$TYPE_ATTN --output-tensor-type $TYPE_OUTPUT $SRC_GGUF $DST_GGUF $TYPE_FFN $N_THREADS
252
- ```
253
-
254
- ---
255
-
256
- ### Qwen2.5-14B - Crush FFNs to Q2_K, otherwise Q8_0
257
-
258
- ```bash
259
- TYPE_EMBD=Q8_0
260
- TYPE_FFN=Q2_K
261
- TYPE_ATTN=Q8_0
262
- TYPE_OUTPUT=Q8_0
263
- SRC_GGUF=/opt/workspace/gguf/Qwen2.5-14B-BF16.gguf
264
- DST_GGUF=/opt/workspace/gguf/Qwen2.5-14B-EQ8_0-FQ2_K-AQ8_0-OQ8_0.gguf
265
- N_THREADS=16
266
-
267
- ./llama.cpp/build/bin/llama-quantize --token-embedding-type $TYPE_EMBD --tensor-type ffn_down=$TYPE_FFN --tensor-type ffn_gate=$TYPE_FFN --tensor-type ffn_up=$TYPE_FFN --tensor-type attn_k=$TYPE_ATTN --tensor-type attn_q=$TYPE_ATTN --tensor-type attn_v=$TYPE_ATTN --tensor-type attn_out=$TYPE_ATTN --output-tensor-type $TYPE_OUTPUT $SRC_GGUF $DST_GGUF $TYPE_FFN $N_THREADS
268
- ```
269
-
270
- ---
271
-
272
- ### Qwen2.5-14B - Crush attention to Q2_K, otherwise Q8_0
273
-
274
- ```bash
275
- TYPE_EMBD=Q8_0
276
- TYPE_FFN=Q8_0
277
- TYPE_ATTN=Q2_K
278
- TYPE_OUTPUT=Q8_0
279
- SRC_GGUF=/opt/workspace/gguf/Qwen2.5-14B-BF16.gguf
280
- DST_GGUF=/opt/workspace/gguf/Qwen2.5-14B-EQ8_0-FQ8_0-AQ2_K-OQ8_0.gguf
281
- N_THREADS=16
282
-
283
- ./llama.cpp/build/bin/llama-quantize --token-embedding-type $TYPE_EMBD --tensor-type ffn_down=$TYPE_FFN --tensor-type ffn_gate=$TYPE_FFN --tensor-type ffn_up=$TYPE_FFN --tensor-type attn_k=$TYPE_ATTN --tensor-type attn_q=$TYPE_ATTN --tensor-type attn_v=$TYPE_ATTN --tensor-type attn_out=$TYPE_ATTN --output-tensor-type $TYPE_OUTPUT $SRC_GGUF $DST_GGUF $TYPE_FFN $N_THREADS
284
- ```
285
-
286
- ---
287
-
288
- ### Qwen2.5-14B - Crush output tensor to Q2_K, otherwise Q8_0
289
-
290
- ```bash
291
- TYPE_EMBD=Q8_0
292
- TYPE_FFN=Q8_0
293
- TYPE_ATTN=Q8_0
294
- TYPE_OUTPUT=Q2_K
295
- SRC_GGUF=/opt/workspace/gguf/Qwen2.5-14B-BF16.gguf
296
- DST_GGUF=/opt/workspace/gguf/Qwen2.5-14B-EQ8_0-FQ8_0-AQ8_0-OQ2_K.gguf
297
- N_THREADS=16
298
-
299
- ./llama.cpp/build/bin/llama-quantize --token-embedding-type $TYPE_EMBD --tensor-type ffn_down=$TYPE_FFN --tensor-type ffn_gate=$TYPE_FFN --tensor-type ffn_up=$TYPE_FFN --tensor-type attn_k=$TYPE_ATTN --tensor-type attn_q=$TYPE_ATTN --tensor-type attn_v=$TYPE_ATTN --tensor-type attn_out=$TYPE_ATTN --output-tensor-type $TYPE_OUTPUT $SRC_GGUF $DST_GGUF $TYPE_FFN $N_THREADS
300
- ```
301
-
302
- ---
303
-
304
- ## Raw results for Qwen2.5-14B
305
-
306
- ```
307
- Number of input texts: 10
308
- Shortest input length in tokens: 60
309
- Longest input length in tokens: 4801
310
- Average input length in tokens: 1589.3
311
- Total number of input tokens: 15893
312
- --------------------------------------------------------------------------------
313
- Evaluating baseline model Qwen2.5-14B-BF16.gguf...
314
- Load model...
315
- Evaluate prompts...
316
- Unload model...
317
- --------------------------------------------------------------------------------
318
- Now processing: Qwen2.5-14B-Q2_K.gguf
319
- Load model...
320
- Evaluate prompts...
321
- Unload model...
322
- Compute MSD...
323
- Mean-Squared Deviation - Qwen2.5-14B-BF16.gguf vs. Qwen2.5-14B-Q2_K.gguf:
324
- -- Prompt 0: 1.568434476852417
325
- -- Prompt 1: 1.8605916500091553
326
- -- Prompt 2: 1.2912431955337524
327
- -- Prompt 3: 1.3367090225219727
328
- -- Prompt 4: 1.1364308595657349
329
- -- Prompt 5: 2.3384993076324463
330
- -- Prompt 6: 1.2926896810531616
331
- -- Prompt 7: 1.4084643125534058
332
- -- Prompt 8: 0.32443684339523315
333
- -- Prompt 9: 1.3756331205368042
334
- Average MSD: 1.3933132886886597
335
- --------------------------------------------------------------------------------
336
- Now processing: Qwen2.5-14B-EQ2_K-FQ8_0-AQ8_0-OQ8_0.gguf
337
- Load model...
338
- Evaluate prompts...
339
- Unload model...
340
- Compute MSD...
341
- Mean-Squared Deviation - Qwen2.5-14B-BF16.gguf vs. Qwen2.5-14B-EQ2_K-FQ8_0-AQ8_0-OQ8_0.gguf:
342
- -- Prompt 0: 0.012962134554982185
343
- -- Prompt 1: 0.019185630604624748
344
- -- Prompt 2: 0.05430002510547638
345
- -- Prompt 3: 0.008174948394298553
346
- -- Prompt 4: 0.011592703871428967
347
- -- Prompt 5: 0.012105505913496017
348
- -- Prompt 6: 0.007557644974440336
349
- -- Prompt 7: 0.01957087405025959
350
- -- Prompt 8: 0.013395288027822971
351
- -- Prompt 9: 0.007488884497433901
352
- Average MSD: 0.01663336530327797
353
- --------------------------------------------------------------------------------
354
- Now processing: Qwen2.5-14B-EQ8_0-FQ2_K-AQ8_0-OQ8_0.gguf
355
- Load model...
356
- Evaluate prompts...
357
- Unload model...
358
- Compute MSD...
359
- Mean-Squared Deviation - Qwen2.5-14B-BF16.gguf vs. Qwen2.5-14B-EQ8_0-FQ2_K-AQ8_0-OQ8_0.gguf:
360
- -- Prompt 0: 2.483222246170044
361
- -- Prompt 1: 2.20788836479187
362
- -- Prompt 2: 2.2648935317993164
363
- -- Prompt 3: 2.175588607788086
364
- -- Prompt 4: 1.624481439590454
365
- -- Prompt 5: 4.104475498199463
366
- -- Prompt 6: 2.0161893367767334
367
- -- Prompt 7: 2.0660784244537354
368
- -- Prompt 8: 0.46407243609428406
369
- -- Prompt 9: 2.1939690113067627
370
- Average MSD: 2.160086154937744
371
- --------------------------------------------------------------------------------
372
- Now processing: Qwen2.5-14B-EQ8_0-FQ8_0-AQ2_K-OQ8_0.gguf
373
- Load model...
374
- Evaluate prompts...
375
- Unload model...
376
- Compute MSD...
377
- Mean-Squared Deviation - Qwen2.5-14B-BF16.gguf vs. Qwen2.5-14B-EQ8_0-FQ8_0-AQ2_K-OQ8_0.gguf:
378
- -- Prompt 0: 0.7283403277397156
379
- -- Prompt 1: 1.0912593603134155
380
- -- Prompt 2: 0.9022651314735413
381
- -- Prompt 3: 0.4880850911140442
382
- -- Prompt 4: 0.29713207483291626
383
- -- Prompt 5: 0.6994995474815369
384
- -- Prompt 6: 0.45846545696258545
385
- -- Prompt 7: 0.5286242365837097
386
- -- Prompt 8: 0.2947601079940796
387
- -- Prompt 9: 0.5722559690475464
388
- Average MSD: 0.6060687303543091
389
- --------------------------------------------------------------------------------
390
- Now processing: Qwen2.5-14B-EQ8_0-FQ8_0-AQ8_0-OQ2_K.gguf
391
- Load model...
392
- Evaluate prompts...
393
- Unload model...
394
- Compute MSD...
395
- Mean-Squared Deviation - Qwen2.5-14B-BF16.gguf vs. Qwen2.5-14B-EQ8_0-FQ8_0-AQ8_0-OQ2_K.gguf:
396
- -- Prompt 0: 1.2783535718917847
397
- -- Prompt 1: 0.4481557607650757
398
- -- Prompt 2: 1.1880418062210083
399
- -- Prompt 3: 1.0997036695480347
400
- -- Prompt 4: 0.8093082308769226
401
- -- Prompt 5: 0.6486296057701111
402
- -- Prompt 6: 1.1238276958465576
403
- -- Prompt 7: 1.1459368467330933
404
- -- Prompt 8: 0.23579858243465424
405
- -- Prompt 9: 1.238993525505066
406
- Average MSD: 0.9216748476028442
407
- --------------------------------------------------------------------------------
408
- Now processing: Qwen2.5-14B-Q8_0.gguf
409
- Load model...
410
- Evaluate prompts...
411
- Unload model...
412
- Compute MSD...
413
- Mean-Squared Deviation - Qwen2.5-14B-BF16.gguf vs. Qwen2.5-14B-Q8_0.gguf:
414
- -- Prompt 0: 0.0059487177059054375
415
- -- Prompt 1: 0.004823403432965279
416
- -- Prompt 2: 0.011750683188438416
417
- -- Prompt 3: 0.004459250718355179
418
- -- Prompt 4: 0.004037810489535332
419
- -- Prompt 5: 0.0039064036682248116
420
- -- Prompt 6: 0.004684466868638992
421
- -- Prompt 7: 0.004520604852586985
422
- -- Prompt 8: 0.004727284424006939
423
- -- Prompt 9: 0.004541514907032251
424
- Average MSD: 0.0053400141187012196
425
- --------------------------------------------------------------------------------
426
- Average Mean-Squared Deviation compared to Qwen2.5-14B-BF16.gguf:
427
- --------------------------------------------------------------------------------
428
- Qwen2.5-14B-Q2_K.gguf -- 1.3933132886886597
429
- Qwen2.5-14B-EQ2_K-FQ8_0-AQ8_0-OQ8_0.gguf -- 0.01663336530327797
430
- Qwen2.5-14B-EQ8_0-FQ2_K-AQ8_0-OQ8_0.gguf -- 2.160086154937744
431
- Qwen2.5-14B-EQ8_0-FQ8_0-AQ2_K-OQ8_0.gguf -- 0.6060687303543091
432
- Qwen2.5-14B-EQ8_0-FQ8_0-AQ8_0-OQ2_K.gguf -- 0.9216748476028442
433
- Qwen2.5-14B-Q8_0.gguf -- 0.0053400141187012196
434
- --------------------------------------------------------------------------------
435
- ```
436
-
437
- ---
438
-
439
- ## Summarized results for Qwen2.5-14B
440
-
441
- Approximate Mean-Squared Deviation as compared to BF16, average over 10 inputs (lower is better):
442
- - Standard Q8_0 quant: **0.005**
443
- - Crush token embeddings to Q2_K, otherwise Q8_0: **0.016**
444
- - Crush attention to Q2_K, otherwise Q8_0: **0.606**
445
- - Crush output tensor to Q2_K, otherwise Q8_0: **0.921**
446
- - Standard Q2_K quant: **1.393**
447
- - Crush FFN to Q2_K, otherwise Q8_0: **2.160**
448
-
449
- ---