lbourdois commited on
Commit
39d4732
·
verified ·
1 Parent(s): a42acdf

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +63 -50
README.md CHANGED
@@ -1,50 +1,63 @@
1
- ---
2
- base_model:
3
- - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
4
- - Qwen/Qwen2.5-7B-Instruct
5
- - Qwen/Qwen2.5-7B-Instruct-1M
6
- library_name: transformers
7
- tags:
8
- - mergekit
9
- - merge
10
-
11
- ---
12
-
13
-
14
- # **Qwen2.5-7B-DeepSeek-R1-1M**
15
-
16
- This model is a merged pre-trained language model created using MergeKit with the TIES merge method. It uses **Qwen/Qwen2.5-7B-Instruct-1M** as the base and combines **deepseek-ai/DeepSeek-R1-Distill-Qwen-7B** and **Qwen/Qwen2.5-7B-Instruct** with equal weight and density. The merge configuration includes normalization, int8 masking, and `bfloat16` precision for optimized performance.
17
- # **Merge**
18
-
19
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
20
-
21
- # **Merge Method**
22
-
23
- This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen2.5-7B-Instruct-1M](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-1M) as a base.
24
-
25
- # **Models Merged**
26
-
27
- The following models were included in the merge:
28
- * [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)
29
- * [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
30
-
31
- # **Configuration**
32
-
33
- The following YAML configuration was used to produce this model:
34
-
35
- ```yaml
36
- models:
37
- - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
38
- - model: Qwen/Qwen2.5-7B-Instruct
39
- parameters:
40
- weight: 1
41
- density: 1
42
- merge_method: ties
43
- base_model: Qwen/Qwen2.5-7B-Instruct-1M
44
- parameters:
45
- weight: 1
46
- density: 1
47
- normalize: true
48
- int8_mask: true
49
- dtype: bfloat16
50
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
4
+ - Qwen/Qwen2.5-7B-Instruct
5
+ - Qwen/Qwen2.5-7B-Instruct-1M
6
+ library_name: transformers
7
+ tags:
8
+ - mergekit
9
+ - merge
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ ---
25
+
26
+
27
+ # **Qwen2.5-7B-DeepSeek-R1-1M**
28
+
29
+ This model is a merged pre-trained language model created using MergeKit with the TIES merge method. It uses **Qwen/Qwen2.5-7B-Instruct-1M** as the base and combines **deepseek-ai/DeepSeek-R1-Distill-Qwen-7B** and **Qwen/Qwen2.5-7B-Instruct** with equal weight and density. The merge configuration includes normalization, int8 masking, and `bfloat16` precision for optimized performance.
30
+ # **Merge**
31
+
32
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
33
+
34
+ # **Merge Method**
35
+
36
+ This model was merged using the [TIES](https://arxiv.org/abs/2306.01708) merge method using [Qwen/Qwen2.5-7B-Instruct-1M](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct-1M) as a base.
37
+
38
+ # **Models Merged**
39
+
40
+ The following models were included in the merge:
41
+ * [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)
42
+ * [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
43
+
44
+ # **Configuration**
45
+
46
+ The following YAML configuration was used to produce this model:
47
+
48
+ ```yaml
49
+ models:
50
+ - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
51
+ - model: Qwen/Qwen2.5-7B-Instruct
52
+ parameters:
53
+ weight: 1
54
+ density: 1
55
+ merge_method: ties
56
+ base_model: Qwen/Qwen2.5-7B-Instruct-1M
57
+ parameters:
58
+ weight: 1
59
+ density: 1
60
+ normalize: true
61
+ int8_mask: true
62
+ dtype: bfloat16
63
+ ```