notSnix commited on
Commit
c7bc852
·
verified ·
1 Parent(s): 852d29d

Add Q6_K MTP draft

Browse files
Files changed (4) hide show
  1. .gitattributes +1 -0
  2. README.md +4 -0
  3. SHA256SUMS +1 -0
  4. Step-3.7-Flash-MTP-Q6_K.gguf +3 -0
.gitattributes CHANGED
@@ -36,3 +36,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
36
  Step-3.7-Flash-MTP-BF16.gguf filter=lfs diff=lfs merge=lfs -text
37
  Step-3.7-Flash-MTP-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
38
  Step-3.7-Flash-MTP-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
 
 
36
  Step-3.7-Flash-MTP-BF16.gguf filter=lfs diff=lfs merge=lfs -text
37
  Step-3.7-Flash-MTP-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
38
  Step-3.7-Flash-MTP-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
39
+ Step-3.7-Flash-MTP-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -12,6 +12,7 @@ tags:
12
  - draft-model
13
  - speculative-decoding
14
  - q4_k_m
 
15
  - q8_0
16
  - bf16
17
  - stepfun
@@ -32,6 +33,7 @@ The draft GGUF is passed with `--model-draft`; the full model is passed with `--
32
  | File | Size | SHA256 | Purpose |
33
  |---|---:|---|---|
34
  | `Step-3.7-Flash-MTP-Q8_0.gguf` | 3.5 GB | `017de8990140621b5b4af431448f20873fbf0b052f6c50d2afac15f45802a98d` | Recommended MTP draft |
 
35
  | `Step-3.7-Flash-MTP-Q4_K_M.gguf` | 2.0 GB | `44118cfe64f45b38127ad6fb626e16bd94ee5a827cb34aa83d9e6df3450aebaf` | Smaller MTP draft |
36
  | `Step-3.7-Flash-MTP-BF16.gguf` | 6.5 GB | `fd811c81d14c786d314d8006655bba61971059abcfdfb6109ce83fd768f8b289` | Experimental BF16 MTP draft |
37
 
@@ -69,6 +71,8 @@ llama-server \
69
 
70
  Use `Step-3.7-Flash-MTP-Q8_0.gguf` first. It was the best local default in testing.
71
 
 
 
72
  Use `Step-3.7-Flash-MTP-Q4_K_M.gguf` if you want the smaller draft file.
73
 
74
  Use `Step-3.7-Flash-MTP-BF16.gguf` for experimentation.
 
12
  - draft-model
13
  - speculative-decoding
14
  - q4_k_m
15
+ - q6_k
16
  - q8_0
17
  - bf16
18
  - stepfun
 
33
  | File | Size | SHA256 | Purpose |
34
  |---|---:|---|---|
35
  | `Step-3.7-Flash-MTP-Q8_0.gguf` | 3.5 GB | `017de8990140621b5b4af431448f20873fbf0b052f6c50d2afac15f45802a98d` | Recommended MTP draft |
36
+ | `Step-3.7-Flash-MTP-Q6_K.gguf` | 2.7 GB | `f41736e0dcce133d0dd0b81e14bd2965091e27dff306a28cec11ceb19fadbf46` | Smaller Q6_K MTP draft |
37
  | `Step-3.7-Flash-MTP-Q4_K_M.gguf` | 2.0 GB | `44118cfe64f45b38127ad6fb626e16bd94ee5a827cb34aa83d9e6df3450aebaf` | Smaller MTP draft |
38
  | `Step-3.7-Flash-MTP-BF16.gguf` | 6.5 GB | `fd811c81d14c786d314d8006655bba61971059abcfdfb6109ce83fd768f8b289` | Experimental BF16 MTP draft |
39
 
 
71
 
72
  Use `Step-3.7-Flash-MTP-Q8_0.gguf` first. It was the best local default in testing.
73
 
74
+ Use `Step-3.7-Flash-MTP-Q6_K.gguf` if you want a smaller draft file while staying above Q4.
75
+
76
  Use `Step-3.7-Flash-MTP-Q4_K_M.gguf` if you want the smaller draft file.
77
 
78
  Use `Step-3.7-Flash-MTP-BF16.gguf` for experimentation.
SHA256SUMS CHANGED
@@ -1,3 +1,4 @@
1
  017de8990140621b5b4af431448f20873fbf0b052f6c50d2afac15f45802a98d Step-3.7-Flash-MTP-Q8_0.gguf
 
2
  44118cfe64f45b38127ad6fb626e16bd94ee5a827cb34aa83d9e6df3450aebaf Step-3.7-Flash-MTP-Q4_K_M.gguf
3
  fd811c81d14c786d314d8006655bba61971059abcfdfb6109ce83fd768f8b289 Step-3.7-Flash-MTP-BF16.gguf
 
1
  017de8990140621b5b4af431448f20873fbf0b052f6c50d2afac15f45802a98d Step-3.7-Flash-MTP-Q8_0.gguf
2
+ f41736e0dcce133d0dd0b81e14bd2965091e27dff306a28cec11ceb19fadbf46 Step-3.7-Flash-MTP-Q6_K.gguf
3
  44118cfe64f45b38127ad6fb626e16bd94ee5a827cb34aa83d9e6df3450aebaf Step-3.7-Flash-MTP-Q4_K_M.gguf
4
  fd811c81d14c786d314d8006655bba61971059abcfdfb6109ce83fd768f8b289 Step-3.7-Flash-MTP-BF16.gguf
Step-3.7-Flash-MTP-Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f41736e0dcce133d0dd0b81e14bd2965091e27dff306a28cec11ceb19fadbf46
3
+ size 2863478976