Update v1
Browse files
.gitattributes
CHANGED
|
@@ -34,3 +34,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-f16.gguf filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-Q4_K_M.gguf
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a7ee5e0496b9f16e791f31529cad8398ebff61f8f034d7b6f634885a72fe0c83
|
| 3 |
+
size 397932352
|
DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-Q8_0.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6cd8384ce8592bda1da205ed5d316f281e2b6aa4c48c8d7f9542dbb0a92f20c3
|
| 3 |
+
size 531192640
|
DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-f16.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2cbfa3658c7bb59872cdbdf9fd8f28814d823615a1d9d1ec0d41cdca5874d33c
|
| 3 |
+
size 994388800
|
README.md
CHANGED
|
@@ -17,12 +17,14 @@ tags:
|
|
| 17 |
|
| 18 |
# DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-GGUF
|
| 19 |
|
|
|
|
|
|
|
| 20 |
This model is trained on CODE outputs of <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B">deepseek-ai/DeepSeek-R1-Distill-Qwen-32B</a> and is meant to be used only as draft model for speculative decoding.
|
| 21 |
|
| 22 |
It's specifically intended for users of 3090/4090, allowing you to run the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version with 16k context and speeding up generation without sacrificing more context length or model quality.
|
| 23 |
|
| 24 |
# Data info
|
| 25 |
|
| 26 |
-
The data consists of code tasks collected from various datasets. It has been trained for
|
| 27 |
|
| 28 |
Since data generation was done using spare GPU time, I may publish a further trained version later.
|
|
|
|
| 17 |
|
| 18 |
# DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B-GGUF
|
| 19 |
|
| 20 |
+
**Updated**
|
| 21 |
+
|
| 22 |
This model is trained on CODE outputs of <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B">deepseek-ai/DeepSeek-R1-Distill-Qwen-32B</a> and is meant to be used only as draft model for speculative decoding.
|
| 23 |
|
| 24 |
It's specifically intended for users of 3090/4090, allowing you to run the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version with 16k context and speeding up generation without sacrificing more context length or model quality.
|
| 25 |
|
| 26 |
# Data info
|
| 27 |
|
| 28 |
+
The data consists of code tasks collected from various datasets. It has been trained for 2 epochs on 2.5k unique examples, for a total of 7.6 million tokens per epoch.
|
| 29 |
|
| 30 |
Since data generation was done using spare GPU time, I may publish a further trained version later.
|