lbourdois commited on
Commit
60462d4
·
verified ·
1 Parent(s): c994f81

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +40 -28
README.md CHANGED
@@ -1,29 +1,41 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - Qwen/Qwen2.5-0.5B
7
- datasets:
8
- - alamios/DeepSeek-R1-Distill-Qwen-32B-Conversations
9
- pipeline_tag: text-generation
10
- library_name: transformers
11
- tags:
12
- - qwen
13
- - qwen2.5
14
- - deepseek
15
- ---
16
-
17
- # DeepSeek-R1-DRAFT-Qwen2.5-0.5B
18
-
19
- **Updated to v1**
20
-
21
- This model is trained on outputs of <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B">deepseek-ai/DeepSeek-R1-Distill-Qwen-32B</a> and is meant to be used only as draft model for speculative decoding.
22
-
23
- It's specifically intended for users of 3090/4090, allowing you to run the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version with 16k context and speeding up generation without sacrificing more context length or model quality.
24
-
25
- # Data info
26
-
27
- The data consists of code, math, reasoning and general knowledge tasks collected from various datasets. It has been trained for 2 epochs on 7k unique examples, for a total of 26 million tokens per epoch.
28
-
 
 
 
 
 
 
 
 
 
 
 
 
29
  Since data generation was done using spare GPU time, I may publish a further trained version later.
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ base_model:
18
+ - Qwen/Qwen2.5-0.5B
19
+ datasets:
20
+ - alamios/DeepSeek-R1-Distill-Qwen-32B-Conversations
21
+ pipeline_tag: text-generation
22
+ library_name: transformers
23
+ tags:
24
+ - qwen
25
+ - qwen2.5
26
+ - deepseek
27
+ ---
28
+
29
+ # DeepSeek-R1-DRAFT-Qwen2.5-0.5B
30
+
31
+ **Updated to v1**
32
+
33
+ This model is trained on outputs of <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B">deepseek-ai/DeepSeek-R1-Distill-Qwen-32B</a> and is meant to be used only as draft model for speculative decoding.
34
+
35
+ It's specifically intended for users of 3090/4090, allowing you to run the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version with 16k context and speeding up generation without sacrificing more context length or model quality.
36
+
37
+ # Data info
38
+
39
+ The data consists of code, math, reasoning and general knowledge tasks collected from various datasets. It has been trained for 2 epochs on 7k unique examples, for a total of 26 million tokens per epoch.
40
+
41
  Since data generation was done using spare GPU time, I may publish a further trained version later.