Improve language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +90 -76
README.md CHANGED
@@ -1,77 +1,91 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-Coder-1.5B-Instruct
4
- - Qwen/Qwen2.5-Math-1.5B-Instruct
5
- - Qwen/Qwen2.5-1.5B
6
- - agentica-org/DeepScaleR-1.5B-Preview
7
- tags:
8
- - merge
9
- - mergekit
10
- - lazymergekit
11
- - Qwen/Qwen2.5-Coder-1.5B-Instruct
12
- - Qwen/Qwen2.5-Math-1.5B-Instruct
13
- - Qwen/Qwen2.5-1.5B
14
- - agentica-org/DeepScaleR-1.5B-Preview
15
- ---
16
-
17
- # DeepQwenScalerPlus
18
-
19
- DeepQwenScalerPlus is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
20
- * [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct)
21
- * [Qwen/Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct)
22
- * [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B)
23
- * [agentica-org/DeepScaleR-1.5B-Preview](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview)
24
-
25
- ## 🧩 Configuration
26
-
27
- ```yaml
28
- models:
29
- - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
30
- # no parameters necessary for base model
31
- - model: Qwen/Qwen2.5-Coder-1.5B-Instruct
32
- parameters:
33
- density: 0.5
34
- weight: 0.5
35
- - model: Qwen/Qwen2.5-Math-1.5B-Instruct
36
- parameters:
37
- density: 0.6
38
- weight: 0.5
39
- - model: Qwen/Qwen2.5-1.5B
40
- parameters:
41
- density: 0.6
42
- weight: 0.5
43
- - model: agentica-org/DeepScaleR-1.5B-Preview
44
- parameters:
45
- density: 0.4
46
- weight: 0.6
47
- merge_method: ties
48
- base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
49
- parameters:
50
- normalize: true
51
- dtype: float16
52
- ```
53
-
54
- ## 💻 Usage
55
-
56
- ```python
57
- !pip install -qU transformers accelerate
58
-
59
- from transformers import AutoTokenizer
60
- import transformers
61
- import torch
62
-
63
- model = "K00B404/DeepQwenScalerPlus"
64
- messages = [{"role": "user", "content": "What is a large language model?"}]
65
-
66
- tokenizer = AutoTokenizer.from_pretrained(model)
67
- prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
68
- pipeline = transformers.pipeline(
69
- "text-generation",
70
- model=model,
71
- torch_dtype=torch.float16,
72
- device_map="auto",
73
- )
74
-
75
- outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
76
- print(outputs[0]["generated_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
  ```
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-Coder-1.5B-Instruct
4
+ - Qwen/Qwen2.5-Math-1.5B-Instruct
5
+ - Qwen/Qwen2.5-1.5B
6
+ - agentica-org/DeepScaleR-1.5B-Preview
7
+ tags:
8
+ - merge
9
+ - mergekit
10
+ - lazymergekit
11
+ - Qwen/Qwen2.5-Coder-1.5B-Instruct
12
+ - Qwen/Qwen2.5-Math-1.5B-Instruct
13
+ - Qwen/Qwen2.5-1.5B
14
+ - agentica-org/DeepScaleR-1.5B-Preview
15
+ language:
16
+ - zho
17
+ - eng
18
+ - fra
19
+ - spa
20
+ - por
21
+ - deu
22
+ - ita
23
+ - rus
24
+ - jpn
25
+ - kor
26
+ - vie
27
+ - tha
28
+ - ara
29
+ ---
30
+
31
+ # DeepQwenScalerPlus
32
+
33
+ DeepQwenScalerPlus is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
34
+ * [Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct)
35
+ * [Qwen/Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct)
36
+ * [Qwen/Qwen2.5-1.5B](https://huggingface.co/Qwen/Qwen2.5-1.5B)
37
+ * [agentica-org/DeepScaleR-1.5B-Preview](https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview)
38
+
39
+ ## 🧩 Configuration
40
+
41
+ ```yaml
42
+ models:
43
+ - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
44
+ # no parameters necessary for base model
45
+ - model: Qwen/Qwen2.5-Coder-1.5B-Instruct
46
+ parameters:
47
+ density: 0.5
48
+ weight: 0.5
49
+ - model: Qwen/Qwen2.5-Math-1.5B-Instruct
50
+ parameters:
51
+ density: 0.6
52
+ weight: 0.5
53
+ - model: Qwen/Qwen2.5-1.5B
54
+ parameters:
55
+ density: 0.6
56
+ weight: 0.5
57
+ - model: agentica-org/DeepScaleR-1.5B-Preview
58
+ parameters:
59
+ density: 0.4
60
+ weight: 0.6
61
+ merge_method: ties
62
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
63
+ parameters:
64
+ normalize: true
65
+ dtype: float16
66
+ ```
67
+
68
+ ## 💻 Usage
69
+
70
+ ```python
71
+ !pip install -qU transformers accelerate
72
+
73
+ from transformers import AutoTokenizer
74
+ import transformers
75
+ import torch
76
+
77
+ model = "K00B404/DeepQwenScalerPlus"
78
+ messages = [{"role": "user", "content": "What is a large language model?"}]
79
+
80
+ tokenizer = AutoTokenizer.from_pretrained(model)
81
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
82
+ pipeline = transformers.pipeline(
83
+ "text-generation",
84
+ model=model,
85
+ torch_dtype=torch.float16,
86
+ device_map="auto",
87
+ )
88
+
89
+ outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
90
+ print(outputs[0]["generated_text"])
91
  ```