Improve language tag

#3
by lbourdois - opened
Files changed (1) hide show
  1. README.md +229 -223
README.md CHANGED
@@ -1,224 +1,230 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - Qwen/Qwen2.5-14B-Instruct
7
- pipeline_tag: text-generation
8
- library_name: transformers
9
- tags:
10
- - opus
11
- - elite
12
- - calcium
13
- - trl
14
- - qwen
15
- model-index:
16
- - name: Calcium-Opus-14B-Elite2
17
- results:
18
- - task:
19
- type: text-generation
20
- name: Text Generation
21
- dataset:
22
- name: IFEval (0-Shot)
23
- type: wis-k/instruction-following-eval
24
- split: train
25
- args:
26
- num_few_shot: 0
27
- metrics:
28
- - type: inst_level_strict_acc and prompt_level_strict_acc
29
- value: 61.76
30
- name: averaged accuracy
31
- source:
32
- url: >-
33
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
34
- name: Open LLM Leaderboard
35
- - task:
36
- type: text-generation
37
- name: Text Generation
38
- dataset:
39
- name: BBH (3-Shot)
40
- type: SaylorTwift/bbh
41
- split: test
42
- args:
43
- num_few_shot: 3
44
- metrics:
45
- - type: acc_norm
46
- value: 46.81
47
- name: normalized accuracy
48
- source:
49
- url: >-
50
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
51
- name: Open LLM Leaderboard
52
- - task:
53
- type: text-generation
54
- name: Text Generation
55
- dataset:
56
- name: MATH Lvl 5 (4-Shot)
57
- type: lighteval/MATH-Hard
58
- split: test
59
- args:
60
- num_few_shot: 4
61
- metrics:
62
- - type: exact_match
63
- value: 36.1
64
- name: exact match
65
- source:
66
- url: >-
67
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
68
- name: Open LLM Leaderboard
69
- - task:
70
- type: text-generation
71
- name: Text Generation
72
- dataset:
73
- name: GPQA (0-shot)
74
- type: Idavidrein/gpqa
75
- split: train
76
- args:
77
- num_few_shot: 0
78
- metrics:
79
- - type: acc_norm
80
- value: 16
81
- name: acc_norm
82
- source:
83
- url: >-
84
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
85
- name: Open LLM Leaderboard
86
- - task:
87
- type: text-generation
88
- name: Text Generation
89
- dataset:
90
- name: MuSR (0-shot)
91
- type: TAUR-Lab/MuSR
92
- args:
93
- num_few_shot: 0
94
- metrics:
95
- - type: acc_norm
96
- value: 22.24
97
- name: acc_norm
98
- source:
99
- url: >-
100
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
101
- name: Open LLM Leaderboard
102
- - task:
103
- type: text-generation
104
- name: Text Generation
105
- dataset:
106
- name: MMLU-PRO (5-shot)
107
- type: TIGER-Lab/MMLU-Pro
108
- config: main
109
- split: test
110
- args:
111
- num_few_shot: 5
112
- metrics:
113
- - type: acc
114
- value: 47.79
115
- name: accuracy
116
- source:
117
- url: >-
118
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
119
- name: Open LLM Leaderboard
120
- ---
121
-
122
- ![e2.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/CbSQZghlvWbMo2EdMacXp.gif)
123
-
124
- # **Calcium-Opus-14B-Elite2**
125
-
126
- Calcium-Opus-14B-Elite2 is based on the Qwen 2.5 14B modality architecture, designed to enhance the reasoning capabilities of 14B-parameter models. These models have proven effective in context understanding, reasoning, and mathematical problem-solving. It has been fine-tuned using a long chain-of-thought reasoning model and specialized datasets, with a focus on chain-of-thought (CoT) reasoning for problem-solving. This model is optimized for tasks requiring logical reasoning, detailed explanations, and multi-step problem-solving, making it ideal for applications such as instruction-following, text generation, and complex reasoning tasks.
127
-
128
- Key improvements include:
129
- 1. **Enhanced Knowledge and Expertise**: The model demonstrates significantly more knowledge and greatly improved capabilities in coding and mathematics, thanks to specialized expert models in these domains.
130
- 2. **Improved Instruction Following**: It shows significant advancements in following instructions, generating long texts (over 8K tokens), understanding structured data (e.g., tables), and producing structured outputs, especially in JSON format.
131
- 3. **Better Adaptability**: The model is more resilient to diverse system prompts, enabling enhanced role-playing implementations and condition-setting for chatbots.
132
- 4. **Long-Context Support**: It offers long-context support of up to 128K tokens and can generate up to 8K tokens in a single output.
133
- 5. **Multilingual Proficiency**: The model supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
134
-
135
- # **Quickstart with transformers**
136
-
137
- Here is a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate content:
138
-
139
- ```python
140
- from transformers import AutoModelForCausalLM, AutoTokenizer
141
-
142
- model_name = "prithivMLmods/Calcium-Opus-14B-Elite2"
143
-
144
- model = AutoModelForCausalLM.from_pretrained(
145
- model_name,
146
- torch_dtype="auto",
147
- device_map="auto"
148
- )
149
- tokenizer = AutoTokenizer.from_pretrained(model_name)
150
-
151
- prompt = "Give me a short introduction to large language model."
152
- messages = [
153
- {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
154
- {"role": "user", "content": prompt}
155
- ]
156
- text = tokenizer.apply_chat_template(
157
- messages,
158
- tokenize=False,
159
- add_generation_prompt=True
160
- )
161
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
162
-
163
- generated_ids = model.generate(
164
- **model_inputs,
165
- max_new_tokens=512
166
- )
167
- generated_ids = [
168
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
169
- ]
170
-
171
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
172
- ```
173
-
174
- # **Intended Use**
175
- 1. **Reasoning and Context Understanding**:
176
- Designed to assist with complex reasoning tasks, contextual understanding, and solving problems requiring logical deduction and critical thinking.
177
-
178
- 2. **Mathematical Problem-Solving**:
179
- Specialized for performing advanced mathematical reasoning and calculations, making it suitable for educational, scientific, and research-oriented applications.
180
-
181
- 3. **Code Generation and Debugging**:
182
- Offers robust support for coding tasks, including writing, debugging, and optimizing code in various programming languages, ideal for developers and software engineers.
183
-
184
- 4. **Structured Data Analysis**:
185
- Excels in processing and analyzing structured data, such as tables and JSON, and generating structured outputs, which is useful for data analysts and automation workflows.
186
-
187
- 5. **Multilingual Applications**:
188
- Supports over 29 languages, making it versatile for global applications like multilingual chatbots, content generation, and translations.
189
-
190
- 6. **Extended Content Generation**:
191
- Capable of generating long-form content (over 8K tokens), useful for writing reports, articles, and creating detailed instructional guides.
192
-
193
- # **Limitations**
194
- 1. **Hardware Requirements**:
195
- Due to its 20B parameter size and support for long-context inputs, running the model requires significant computational resources, including high-memory GPUs or TPUs.
196
-
197
- 2. **Potential Bias in Multilingual Outputs**:
198
- While it supports 29 languages, the quality and accuracy of outputs may vary depending on the language, especially for less-resourced languages.
199
-
200
- 3. **Inconsistent Outputs for Creative Tasks**:
201
- The model may occasionally produce inconsistent or repetitive results in creative writing, storytelling, or highly subjective tasks.
202
-
203
- 4. **Limited Real-World Awareness**:
204
- It lacks real-time knowledge of current events beyond its training cutoff, which may limit its ability to respond accurately to the latest information.
205
-
206
- 5. **Error Propagation in Long-Text Outputs**:
207
- In generating long texts, minor errors in early outputs can sometimes propagate, reducing the overall coherence and accuracy of the response.
208
-
209
- 6. **Dependency on High-Quality Prompts**:
210
- Performance may depend on the quality and specificity of the input prompt, requiring users to carefully design queries for optimal results.
211
-
212
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
213
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Calcium-Opus-14B-Elite2-details)!
214
- Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FCalcium-Opus-14B-Elite2&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
215
-
216
- | Metric |Value (%)|
217
- |-------------------|--------:|
218
- |**Average** | 40.25|
219
- |IFEval (0-Shot) | 61.76|
220
- |BBH (3-Shot) | 46.81|
221
- |MATH Lvl 5 (4-Shot)| 46.90|
222
- |GPQA (0-shot) | 16.00|
223
- |MuSR (0-shot) | 22.24|
 
 
 
 
 
 
224
  |MMLU-PRO (5-shot) | 47.79|
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ base_model:
18
+ - Qwen/Qwen2.5-14B-Instruct
19
+ pipeline_tag: text-generation
20
+ library_name: transformers
21
+ tags:
22
+ - opus
23
+ - elite
24
+ - calcium
25
+ - trl
26
+ - qwen
27
+ model-index:
28
+ - name: Calcium-Opus-14B-Elite2
29
+ results:
30
+ - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ name: IFEval (0-Shot)
35
+ type: wis-k/instruction-following-eval
36
+ split: train
37
+ args:
38
+ num_few_shot: 0
39
+ metrics:
40
+ - type: inst_level_strict_acc and prompt_level_strict_acc
41
+ value: 61.76
42
+ name: averaged accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: BBH (3-Shot)
51
+ type: SaylorTwift/bbh
52
+ split: test
53
+ args:
54
+ num_few_shot: 3
55
+ metrics:
56
+ - type: acc_norm
57
+ value: 46.81
58
+ name: normalized accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
61
+ name: Open LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: MATH Lvl 5 (4-Shot)
67
+ type: lighteval/MATH-Hard
68
+ split: test
69
+ args:
70
+ num_few_shot: 4
71
+ metrics:
72
+ - type: exact_match
73
+ value: 36.1
74
+ name: exact match
75
+ source:
76
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
77
+ name: Open LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: GPQA (0-shot)
83
+ type: Idavidrein/gpqa
84
+ split: train
85
+ args:
86
+ num_few_shot: 0
87
+ metrics:
88
+ - type: acc_norm
89
+ value: 16
90
+ name: acc_norm
91
+ source:
92
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
93
+ name: Open LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: MuSR (0-shot)
99
+ type: TAUR-Lab/MuSR
100
+ args:
101
+ num_few_shot: 0
102
+ metrics:
103
+ - type: acc_norm
104
+ value: 22.24
105
+ name: acc_norm
106
+ source:
107
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
108
+ name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: MMLU-PRO (5-shot)
114
+ type: TIGER-Lab/MMLU-Pro
115
+ config: main
116
+ split: test
117
+ args:
118
+ num_few_shot: 5
119
+ metrics:
120
+ - type: acc
121
+ value: 47.79
122
+ name: accuracy
123
+ source:
124
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite2
125
+ name: Open LLM Leaderboard
126
+ ---
127
+
128
+ ![e2.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/CbSQZghlvWbMo2EdMacXp.gif)
129
+
130
+ # **Calcium-Opus-14B-Elite2**
131
+
132
+ Calcium-Opus-14B-Elite2 is based on the Qwen 2.5 14B modality architecture, designed to enhance the reasoning capabilities of 14B-parameter models. These models have proven effective in context understanding, reasoning, and mathematical problem-solving. It has been fine-tuned using a long chain-of-thought reasoning model and specialized datasets, with a focus on chain-of-thought (CoT) reasoning for problem-solving. This model is optimized for tasks requiring logical reasoning, detailed explanations, and multi-step problem-solving, making it ideal for applications such as instruction-following, text generation, and complex reasoning tasks.
133
+
134
+ Key improvements include:
135
+ 1. **Enhanced Knowledge and Expertise**: The model demonstrates significantly more knowledge and greatly improved capabilities in coding and mathematics, thanks to specialized expert models in these domains.
136
+ 2. **Improved Instruction Following**: It shows significant advancements in following instructions, generating long texts (over 8K tokens), understanding structured data (e.g., tables), and producing structured outputs, especially in JSON format.
137
+ 3. **Better Adaptability**: The model is more resilient to diverse system prompts, enabling enhanced role-playing implementations and condition-setting for chatbots.
138
+ 4. **Long-Context Support**: It offers long-context support of up to 128K tokens and can generate up to 8K tokens in a single output.
139
+ 5. **Multilingual Proficiency**: The model supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
140
+
141
+ # **Quickstart with transformers**
142
+
143
+ Here is a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate content:
144
+
145
+ ```python
146
+ from transformers import AutoModelForCausalLM, AutoTokenizer
147
+
148
+ model_name = "prithivMLmods/Calcium-Opus-14B-Elite2"
149
+
150
+ model = AutoModelForCausalLM.from_pretrained(
151
+ model_name,
152
+ torch_dtype="auto",
153
+ device_map="auto"
154
+ )
155
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
156
+
157
+ prompt = "Give me a short introduction to large language model."
158
+ messages = [
159
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
160
+ {"role": "user", "content": prompt}
161
+ ]
162
+ text = tokenizer.apply_chat_template(
163
+ messages,
164
+ tokenize=False,
165
+ add_generation_prompt=True
166
+ )
167
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
168
+
169
+ generated_ids = model.generate(
170
+ **model_inputs,
171
+ max_new_tokens=512
172
+ )
173
+ generated_ids = [
174
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
175
+ ]
176
+
177
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
178
+ ```
179
+
180
+ # **Intended Use**
181
+ 1. **Reasoning and Context Understanding**:
182
+ Designed to assist with complex reasoning tasks, contextual understanding, and solving problems requiring logical deduction and critical thinking.
183
+
184
+ 2. **Mathematical Problem-Solving**:
185
+ Specialized for performing advanced mathematical reasoning and calculations, making it suitable for educational, scientific, and research-oriented applications.
186
+
187
+ 3. **Code Generation and Debugging**:
188
+ Offers robust support for coding tasks, including writing, debugging, and optimizing code in various programming languages, ideal for developers and software engineers.
189
+
190
+ 4. **Structured Data Analysis**:
191
+ Excels in processing and analyzing structured data, such as tables and JSON, and generating structured outputs, which is useful for data analysts and automation workflows.
192
+
193
+ 5. **Multilingual Applications**:
194
+ Supports over 29 languages, making it versatile for global applications like multilingual chatbots, content generation, and translations.
195
+
196
+ 6. **Extended Content Generation**:
197
+ Capable of generating long-form content (over 8K tokens), useful for writing reports, articles, and creating detailed instructional guides.
198
+
199
+ # **Limitations**
200
+ 1. **Hardware Requirements**:
201
+ Due to its 20B parameter size and support for long-context inputs, running the model requires significant computational resources, including high-memory GPUs or TPUs.
202
+
203
+ 2. **Potential Bias in Multilingual Outputs**:
204
+ While it supports 29 languages, the quality and accuracy of outputs may vary depending on the language, especially for less-resourced languages.
205
+
206
+ 3. **Inconsistent Outputs for Creative Tasks**:
207
+ The model may occasionally produce inconsistent or repetitive results in creative writing, storytelling, or highly subjective tasks.
208
+
209
+ 4. **Limited Real-World Awareness**:
210
+ It lacks real-time knowledge of current events beyond its training cutoff, which may limit its ability to respond accurately to the latest information.
211
+
212
+ 5. **Error Propagation in Long-Text Outputs**:
213
+ In generating long texts, minor errors in early outputs can sometimes propagate, reducing the overall coherence and accuracy of the response.
214
+
215
+ 6. **Dependency on High-Quality Prompts**:
216
+ Performance may depend on the quality and specificity of the input prompt, requiring users to carefully design queries for optimal results.
217
+
218
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
219
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Calcium-Opus-14B-Elite2-details)!
220
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FCalcium-Opus-14B-Elite2&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
221
+
222
+ | Metric |Value (%)|
223
+ |-------------------|--------:|
224
+ |**Average** | 40.25|
225
+ |IFEval (0-Shot) | 61.76|
226
+ |BBH (3-Shot) | 46.81|
227
+ |MATH Lvl 5 (4-Shot)| 46.90|
228
+ |GPQA (0-shot) | 16.00|
229
+ |MuSR (0-shot) | 22.24|
230
  |MMLU-PRO (5-shot) | 47.79|