lbourdois commited on
Commit
4d5dcbb
·
verified ·
1 Parent(s): de6ac7b

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +61 -48
README.md CHANGED
@@ -1,48 +1,61 @@
1
- ---
2
- base_model:
3
- - Qwen/QwQ-32B-Preview
4
- - Qwen/Qwen2.5-32B-Instruct
5
- - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
6
- library_name: transformers
7
- tags:
8
- - mergekit
9
- - merge
10
-
11
- ---
12
- # **Qwen2.5-32B-DeepSeek-R1-Instruct**
13
-
14
- This model is a merged pre-trained language model created using MergeKit with the TIES merge method. It uses **Qwen/Qwen2.5-32B-Instruct** as the base and combines **Qwen/QwQ-32B-Preview** and **deepseek-ai/DeepSeek-R1-Distill-Qwen-32B** with equal weight and density. The merge configuration includes normalization, int8 masking, and `bfloat16` precision for optimized performance.
15
- # **Merge**
16
-
17
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
18
-
19
- # **Merge Method**
20
-
21
- This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as a base.
22
-
23
- # **Models Merged**
24
-
25
- The following models were included in the merge:
26
- * [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview)
27
- * [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
28
-
29
- # **Configuration**
30
-
31
- The following YAML configuration was used to produce this model:
32
-
33
- ```yaml
34
- models:
35
- - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
36
- - model: Qwen/QwQ-32B-Preview
37
- parameters:
38
- weight: 1
39
- density: 1
40
- merge_method: ties
41
- base_model: Qwen/Qwen2.5-32B-Instruct
42
- parameters:
43
- weight: 1
44
- density: 1
45
- normalize: true
46
- int8_mask: true
47
- dtype: bfloat16
48
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/QwQ-32B-Preview
4
+ - Qwen/Qwen2.5-32B-Instruct
5
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
6
+ library_name: transformers
7
+ tags:
8
+ - mergekit
9
+ - merge
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ ---
25
+ # **Qwen2.5-32B-DeepSeek-R1-Instruct**
26
+
27
+ This model is a merged pre-trained language model created using MergeKit with the TIES merge method. It uses **Qwen/Qwen2.5-32B-Instruct** as the base and combines **Qwen/QwQ-32B-Preview** and **deepseek-ai/DeepSeek-R1-Distill-Qwen-32B** with equal weight and density. The merge configuration includes normalization, int8 masking, and `bfloat16` precision for optimized performance.
28
+ # **Merge**
29
+
30
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
31
+
32
+ # **Merge Method**
33
+
34
+ This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as a base.
35
+
36
+ # **Models Merged**
37
+
38
+ The following models were included in the merge:
39
+ * [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview)
40
+ * [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B)
41
+
42
+ # **Configuration**
43
+
44
+ The following YAML configuration was used to produce this model:
45
+
46
+ ```yaml
47
+ models:
48
+ - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
49
+ - model: Qwen/QwQ-32B-Preview
50
+ parameters:
51
+ weight: 1
52
+ density: 1
53
+ merge_method: ties
54
+ base_model: Qwen/Qwen2.5-32B-Instruct
55
+ parameters:
56
+ weight: 1
57
+ density: 1
58
+ normalize: true
59
+ int8_mask: true
60
+ dtype: bfloat16
61
+ ```