Improve language tag

#2
by lbourdois - opened
Files changed (1) hide show
  1. README.md +227 -221
README.md CHANGED
@@ -1,222 +1,228 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - Qwen/Qwen2.5-14B-Instruct
7
- pipeline_tag: text-generation
8
- library_name: transformers
9
- tags:
10
- - qwen
11
- - Calcium
12
- - Opus
13
- - 14B
14
- - qwq
15
- model-index:
16
- - name: Calcium-Opus-14B-Elite4
17
- results:
18
- - task:
19
- type: text-generation
20
- name: Text Generation
21
- dataset:
22
- name: IFEval (0-Shot)
23
- type: wis-k/instruction-following-eval
24
- split: train
25
- args:
26
- num_few_shot: 0
27
- metrics:
28
- - type: inst_level_strict_acc and prompt_level_strict_acc
29
- value: 61.12
30
- name: averaged accuracy
31
- source:
32
- url: >-
33
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
34
- name: Open LLM Leaderboard
35
- - task:
36
- type: text-generation
37
- name: Text Generation
38
- dataset:
39
- name: BBH (3-Shot)
40
- type: SaylorTwift/bbh
41
- split: test
42
- args:
43
- num_few_shot: 3
44
- metrics:
45
- - type: acc_norm
46
- value: 45.21
47
- name: normalized accuracy
48
- source:
49
- url: >-
50
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
51
- name: Open LLM Leaderboard
52
- - task:
53
- type: text-generation
54
- name: Text Generation
55
- dataset:
56
- name: MATH Lvl 5 (4-Shot)
57
- type: lighteval/MATH-Hard
58
- split: test
59
- args:
60
- num_few_shot: 4
61
- metrics:
62
- - type: exact_match
63
- value: 23.04
64
- name: exact match
65
- source:
66
- url: >-
67
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
68
- name: Open LLM Leaderboard
69
- - task:
70
- type: text-generation
71
- name: Text Generation
72
- dataset:
73
- name: GPQA (0-shot)
74
- type: Idavidrein/gpqa
75
- split: train
76
- args:
77
- num_few_shot: 0
78
- metrics:
79
- - type: acc_norm
80
- value: 14.09
81
- name: acc_norm
82
- source:
83
- url: >-
84
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
85
- name: Open LLM Leaderboard
86
- - task:
87
- type: text-generation
88
- name: Text Generation
89
- dataset:
90
- name: MuSR (0-shot)
91
- type: TAUR-Lab/MuSR
92
- args:
93
- num_few_shot: 0
94
- metrics:
95
- - type: acc_norm
96
- value: 17.69
97
- name: acc_norm
98
- source:
99
- url: >-
100
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
101
- name: Open LLM Leaderboard
102
- - task:
103
- type: text-generation
104
- name: Text Generation
105
- dataset:
106
- name: MMLU-PRO (5-shot)
107
- type: TIGER-Lab/MMLU-Pro
108
- config: main
109
- split: test
110
- args:
111
- num_few_shot: 5
112
- metrics:
113
- - type: acc
114
- value: 46.1
115
- name: accuracy
116
- source:
117
- url: >-
118
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
119
- name: Open LLM Leaderboard
120
- ---
121
-
122
- ![e4.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/XZ2mwzGAOdtV3gQ-DvEU_.gif)
123
-
124
- # **Calcium-Opus-14B-Elite4**
125
-
126
- Calcium-Opus-14B-Elite4 is based on the Qwen 2.5 14B modality architecture, designed to enhance the reasoning capabilities of 14B-parameter models. These models have proven effective in context understanding, reasoning, and mathematical problem-solving. It has been fine-tuned using a long chain-of-thought reasoning model and specialized datasets, with a focus on chain-of-thought (CoT) reasoning for problem-solving. This model is optimized for tasks requiring logical reasoning, detailed explanations, and multi-step problem-solving, making it ideal for applications such as instruction-following, text generation, and complex reasoning tasks.
127
-
128
- Key improvements include:
129
- 1. **Enhanced Knowledge and Expertise**: The model demonstrates significantly more knowledge and greatly improved capabilities in coding and mathematics, thanks to specialized expert models in these domains.
130
- 2. **Improved Instruction Following**: It shows significant advancements in following instructions, generating long texts (over 8K tokens), understanding structured data (e.g., tables), and producing structured outputs, especially in JSON format.
131
- 3. **Better Adaptability**: The model is more resilient to diverse system prompts, enabling enhanced role-playing implementations and condition-setting for chatbots.
132
- 4. **Long-Context Support**: It offers long-context support of up to 128K tokens and can generate up to 8K tokens in a single output.
133
- 5. **Multilingual Proficiency**: The model supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
134
-
135
- # **Quickstart with transformers**
136
-
137
- Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
138
-
139
- ```python
140
- from transformers import AutoModelForCausalLM, AutoTokenizer
141
-
142
- model_name = "prithivMLmods/Calcium-Opus-14B-Elite4"
143
-
144
- model = AutoModelForCausalLM.from_pretrained(
145
- model_name,
146
- torch_dtype="auto",
147
- device_map="auto"
148
- )
149
- tokenizer = AutoTokenizer.from_pretrained(model_name)
150
-
151
- prompt = "Give me a short introduction to large language model."
152
- messages = [
153
- {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
154
- {"role": "user", "content": prompt}
155
- ]
156
- text = tokenizer.apply_chat_template(
157
- messages,
158
- tokenize=False,
159
- add_generation_prompt=True
160
- )
161
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
162
-
163
- generated_ids = model.generate(
164
- **model_inputs,
165
- max_new_tokens=512
166
- )
167
- generated_ids = [
168
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
169
- ]
170
-
171
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
172
- ```
173
- # **Intended Use**
174
- 1. **Reasoning and Context Understanding**:
175
- Designed to assist with complex reasoning tasks, contextual understanding, and solving problems requiring logical deduction and critical thinking.
176
-
177
- 2. **Mathematical Problem-Solving**:
178
- Specialized for performing advanced mathematical reasoning and calculations, making it suitable for educational, scientific, and research-oriented applications.
179
-
180
- 3. **Code Generation and Debugging**:
181
- Offers robust support for coding tasks, including writing, debugging, and optimizing code in various programming languages, ideal for developers and software engineers.
182
-
183
- 4. **Structured Data Analysis**:
184
- Excels in processing and analyzing structured data, such as tables and JSON, and generating structured outputs, which is useful for data analysts and automation workflows.
185
-
186
- 5. **Multilingual Applications**:
187
- Supports over 29 languages, making it versatile for global applications like multilingual chatbots, content generation, and translations.
188
-
189
- 6. **Extended Content Generation**:
190
- Capable of generating long-form content (over 8K tokens), useful for writing reports, articles, and creating detailed instructional guides.
191
-
192
- # **Limitations**
193
- 1. **Hardware Requirements**:
194
- Due to its 20B parameter size and support for long-context inputs, running the model requires significant computational resources, including high-memory GPUs or TPUs.
195
-
196
- 2. **Potential Bias in Multilingual Outputs**:
197
- While it supports 29 languages, the quality and accuracy of outputs may vary depending on the language, especially for less-resourced languages.
198
-
199
- 3. **Inconsistent Outputs for Creative Tasks**:
200
- The model may occasionally produce inconsistent or repetitive results in creative writing, storytelling, or highly subjective tasks.
201
-
202
- 4. **Limited Real-World Awareness**:
203
- It lacks real-time knowledge of current events beyond its training cutoff, which may limit its ability to respond accurately to the latest information.
204
-
205
- 5. **Error Propagation in Long-Text Outputs**:
206
- In generating long texts, minor errors in early outputs can sometimes propagate, reducing the overall coherence and accuracy of the response.
207
-
208
- 6. **Dependency on High-Quality Prompts**:
209
- Performance may depend on the quality and specificity of the input prompt, requiring users to carefully design queries for optimal results.
210
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
211
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Calcium-Opus-14B-Elite4-details)!
212
- Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FCalcium-Opus-14B-Elite4&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
213
-
214
- | Metric |Value (%)|
215
- |-------------------|--------:|
216
- |**Average** | 34.54|
217
- |IFEval (0-Shot) | 61.12|
218
- |BBH (3-Shot) | 45.21|
219
- |MATH Lvl 5 (4-Shot)| 23.04|
220
- |GPQA (0-shot) | 14.09|
221
- |MuSR (0-shot) | 17.69|
 
 
 
 
 
 
222
  |MMLU-PRO (5-shot) | 46.10|
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ base_model:
18
+ - Qwen/Qwen2.5-14B-Instruct
19
+ pipeline_tag: text-generation
20
+ library_name: transformers
21
+ tags:
22
+ - qwen
23
+ - Calcium
24
+ - Opus
25
+ - 14B
26
+ - qwq
27
+ model-index:
28
+ - name: Calcium-Opus-14B-Elite4
29
+ results:
30
+ - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ name: IFEval (0-Shot)
35
+ type: wis-k/instruction-following-eval
36
+ split: train
37
+ args:
38
+ num_few_shot: 0
39
+ metrics:
40
+ - type: inst_level_strict_acc and prompt_level_strict_acc
41
+ value: 61.12
42
+ name: averaged accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: BBH (3-Shot)
51
+ type: SaylorTwift/bbh
52
+ split: test
53
+ args:
54
+ num_few_shot: 3
55
+ metrics:
56
+ - type: acc_norm
57
+ value: 45.21
58
+ name: normalized accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
61
+ name: Open LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: MATH Lvl 5 (4-Shot)
67
+ type: lighteval/MATH-Hard
68
+ split: test
69
+ args:
70
+ num_few_shot: 4
71
+ metrics:
72
+ - type: exact_match
73
+ value: 23.04
74
+ name: exact match
75
+ source:
76
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
77
+ name: Open LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: GPQA (0-shot)
83
+ type: Idavidrein/gpqa
84
+ split: train
85
+ args:
86
+ num_few_shot: 0
87
+ metrics:
88
+ - type: acc_norm
89
+ value: 14.09
90
+ name: acc_norm
91
+ source:
92
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
93
+ name: Open LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: MuSR (0-shot)
99
+ type: TAUR-Lab/MuSR
100
+ args:
101
+ num_few_shot: 0
102
+ metrics:
103
+ - type: acc_norm
104
+ value: 17.69
105
+ name: acc_norm
106
+ source:
107
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
108
+ name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: MMLU-PRO (5-shot)
114
+ type: TIGER-Lab/MMLU-Pro
115
+ config: main
116
+ split: test
117
+ args:
118
+ num_few_shot: 5
119
+ metrics:
120
+ - type: acc
121
+ value: 46.1
122
+ name: accuracy
123
+ source:
124
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite4
125
+ name: Open LLM Leaderboard
126
+ ---
127
+
128
+ ![e4.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/XZ2mwzGAOdtV3gQ-DvEU_.gif)
129
+
130
+ # **Calcium-Opus-14B-Elite4**
131
+
132
+ Calcium-Opus-14B-Elite4 is based on the Qwen 2.5 14B modality architecture, designed to enhance the reasoning capabilities of 14B-parameter models. These models have proven effective in context understanding, reasoning, and mathematical problem-solving. It has been fine-tuned using a long chain-of-thought reasoning model and specialized datasets, with a focus on chain-of-thought (CoT) reasoning for problem-solving. This model is optimized for tasks requiring logical reasoning, detailed explanations, and multi-step problem-solving, making it ideal for applications such as instruction-following, text generation, and complex reasoning tasks.
133
+
134
+ Key improvements include:
135
+ 1. **Enhanced Knowledge and Expertise**: The model demonstrates significantly more knowledge and greatly improved capabilities in coding and mathematics, thanks to specialized expert models in these domains.
136
+ 2. **Improved Instruction Following**: It shows significant advancements in following instructions, generating long texts (over 8K tokens), understanding structured data (e.g., tables), and producing structured outputs, especially in JSON format.
137
+ 3. **Better Adaptability**: The model is more resilient to diverse system prompts, enabling enhanced role-playing implementations and condition-setting for chatbots.
138
+ 4. **Long-Context Support**: It offers long-context support of up to 128K tokens and can generate up to 8K tokens in a single output.
139
+ 5. **Multilingual Proficiency**: The model supports over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
140
+
141
+ # **Quickstart with transformers**
142
+
143
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
144
+
145
+ ```python
146
+ from transformers import AutoModelForCausalLM, AutoTokenizer
147
+
148
+ model_name = "prithivMLmods/Calcium-Opus-14B-Elite4"
149
+
150
+ model = AutoModelForCausalLM.from_pretrained(
151
+ model_name,
152
+ torch_dtype="auto",
153
+ device_map="auto"
154
+ )
155
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
156
+
157
+ prompt = "Give me a short introduction to large language model."
158
+ messages = [
159
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
160
+ {"role": "user", "content": prompt}
161
+ ]
162
+ text = tokenizer.apply_chat_template(
163
+ messages,
164
+ tokenize=False,
165
+ add_generation_prompt=True
166
+ )
167
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
168
+
169
+ generated_ids = model.generate(
170
+ **model_inputs,
171
+ max_new_tokens=512
172
+ )
173
+ generated_ids = [
174
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
175
+ ]
176
+
177
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
178
+ ```
179
+ # **Intended Use**
180
+ 1. **Reasoning and Context Understanding**:
181
+ Designed to assist with complex reasoning tasks, contextual understanding, and solving problems requiring logical deduction and critical thinking.
182
+
183
+ 2. **Mathematical Problem-Solving**:
184
+ Specialized for performing advanced mathematical reasoning and calculations, making it suitable for educational, scientific, and research-oriented applications.
185
+
186
+ 3. **Code Generation and Debugging**:
187
+ Offers robust support for coding tasks, including writing, debugging, and optimizing code in various programming languages, ideal for developers and software engineers.
188
+
189
+ 4. **Structured Data Analysis**:
190
+ Excels in processing and analyzing structured data, such as tables and JSON, and generating structured outputs, which is useful for data analysts and automation workflows.
191
+
192
+ 5. **Multilingual Applications**:
193
+ Supports over 29 languages, making it versatile for global applications like multilingual chatbots, content generation, and translations.
194
+
195
+ 6. **Extended Content Generation**:
196
+ Capable of generating long-form content (over 8K tokens), useful for writing reports, articles, and creating detailed instructional guides.
197
+
198
+ # **Limitations**
199
+ 1. **Hardware Requirements**:
200
+ Due to its 20B parameter size and support for long-context inputs, running the model requires significant computational resources, including high-memory GPUs or TPUs.
201
+
202
+ 2. **Potential Bias in Multilingual Outputs**:
203
+ While it supports 29 languages, the quality and accuracy of outputs may vary depending on the language, especially for less-resourced languages.
204
+
205
+ 3. **Inconsistent Outputs for Creative Tasks**:
206
+ The model may occasionally produce inconsistent or repetitive results in creative writing, storytelling, or highly subjective tasks.
207
+
208
+ 4. **Limited Real-World Awareness**:
209
+ It lacks real-time knowledge of current events beyond its training cutoff, which may limit its ability to respond accurately to the latest information.
210
+
211
+ 5. **Error Propagation in Long-Text Outputs**:
212
+ In generating long texts, minor errors in early outputs can sometimes propagate, reducing the overall coherence and accuracy of the response.
213
+
214
+ 6. **Dependency on High-Quality Prompts**:
215
+ Performance may depend on the quality and specificity of the input prompt, requiring users to carefully design queries for optimal results.
216
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
217
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Calcium-Opus-14B-Elite4-details)!
218
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FCalcium-Opus-14B-Elite4&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
219
+
220
+ | Metric |Value (%)|
221
+ |-------------------|--------:|
222
+ |**Average** | 34.54|
223
+ |IFEval (0-Shot) | 61.12|
224
+ |BBH (3-Shot) | 45.21|
225
+ |MATH Lvl 5 (4-Shot)| 23.04|
226
+ |GPQA (0-shot) | 14.09|
227
+ |MuSR (0-shot) | 17.69|
228
  |MMLU-PRO (5-shot) | 46.10|