Improve language tag

#3
by lbourdois - opened
Files changed (1) hide show
  1. README.md +234 -228
README.md CHANGED
@@ -1,229 +1,235 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- library_name: transformers
6
- base_model:
7
- - Qwen/Qwen2.5-1.5B-Instruct
8
- pipeline_tag: text-generation
9
- model-index:
10
- - name: Bellatrix-1.5B-xElite
11
- results:
12
- - task:
13
- type: text-generation
14
- name: Text Generation
15
- dataset:
16
- name: IFEval (0-Shot)
17
- type: wis-k/instruction-following-eval
18
- split: train
19
- args:
20
- num_few_shot: 0
21
- metrics:
22
- - type: inst_level_strict_acc and prompt_level_strict_acc
23
- value: 19.64
24
- name: averaged accuracy
25
- source:
26
- url: >-
27
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
28
- name: Open LLM Leaderboard
29
- - task:
30
- type: text-generation
31
- name: Text Generation
32
- dataset:
33
- name: BBH (3-Shot)
34
- type: SaylorTwift/bbh
35
- split: test
36
- args:
37
- num_few_shot: 3
38
- metrics:
39
- - type: acc_norm
40
- value: 9.49
41
- name: normalized accuracy
42
- source:
43
- url: >-
44
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
45
- name: Open LLM Leaderboard
46
- - task:
47
- type: text-generation
48
- name: Text Generation
49
- dataset:
50
- name: MATH Lvl 5 (4-Shot)
51
- type: lighteval/MATH-Hard
52
- split: test
53
- args:
54
- num_few_shot: 4
55
- metrics:
56
- - type: exact_match
57
- value: 12.61
58
- name: exact match
59
- source:
60
- url: >-
61
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
62
- name: Open LLM Leaderboard
63
- - task:
64
- type: text-generation
65
- name: Text Generation
66
- dataset:
67
- name: GPQA (0-shot)
68
- type: Idavidrein/gpqa
69
- split: train
70
- args:
71
- num_few_shot: 0
72
- metrics:
73
- - type: acc_norm
74
- value: 3.8
75
- name: acc_norm
76
- source:
77
- url: >-
78
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
79
- name: Open LLM Leaderboard
80
- - task:
81
- type: text-generation
82
- name: Text Generation
83
- dataset:
84
- name: MuSR (0-shot)
85
- type: TAUR-Lab/MuSR
86
- args:
87
- num_few_shot: 0
88
- metrics:
89
- - type: acc_norm
90
- value: 4.44
91
- name: acc_norm
92
- source:
93
- url: >-
94
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
95
- name: Open LLM Leaderboard
96
- - task:
97
- type: text-generation
98
- name: Text Generation
99
- dataset:
100
- name: MMLU-PRO (5-shot)
101
- type: TIGER-Lab/MMLU-Pro
102
- config: main
103
- split: test
104
- args:
105
- num_few_shot: 5
106
- metrics:
107
- - type: acc
108
- value: 7.3
109
- name: accuracy
110
- source:
111
- url: >-
112
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
113
- name: Open LLM Leaderboard
114
- tags:
115
- - qwen
116
- - qwq
117
- ---
118
- <pre align="center">
119
- ____ ____ __ __ __ ____ ____ ____ _ _
120
- ( _ \( ___)( ) ( ) /__\ (_ _)( _ \(_ _)( \/ )
121
- ) _ < )__) )(__ )(__ /(__)\ )( ) / _)(_ ) (
122
- (____/(____)(____)(____)(__)(__)(__) (_)\_)(____)(_/\_)
123
- </pre>
124
-
125
- # **Bellatrix-1.5B-xElite**
126
-
127
- Bellatrix-1.5B-xElite is based on a reasoning-based model designed for the QWQ synthetic dataset entries. The pipeline's instruction-tuned, text-only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. These models outperform many of the available open-source options. Bellatrix is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions utilize supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).
128
-
129
- # **Quickstart with Transformers**
130
-
131
- Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
132
-
133
- ```python
134
- from transformers import AutoModelForCausalLM, AutoTokenizer
135
-
136
- model_name = "prithivMLmods/Bellatrix-1.5B-xElite"
137
-
138
- model = AutoModelForCausalLM.from_pretrained(
139
- model_name,
140
- torch_dtype="auto",
141
- device_map="auto"
142
- )
143
- tokenizer = AutoTokenizer.from_pretrained(model_name)
144
-
145
- prompt = "Give me a short introduction to large language model."
146
- messages = [
147
- {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
148
- {"role": "user", "content": prompt}
149
- ]
150
- text = tokenizer.apply_chat_template(
151
- messages,
152
- tokenize=False,
153
- add_generation_prompt=True
154
- )
155
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
156
-
157
- generated_ids = model.generate(
158
- **model_inputs,
159
- max_new_tokens=512
160
- )
161
- generated_ids = [
162
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
163
- ]
164
-
165
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
166
- ```
167
-
168
- # **Intended Use:**
169
-
170
- 1. **Multilingual Dialogue Systems:**
171
- - Designed for conversational AI applications, capable of handling dialogue across multiple languages.
172
- - Useful in customer service, chatbots, and other dialogue-centric use cases.
173
-
174
- 2. **Reasoning and QWQ Dataset Applications:**
175
- - Optimized for tasks requiring logical reasoning and contextual understanding, particularly in synthetic datasets like QWQ.
176
-
177
- 3. **Agentic Retrieval:**
178
- - Supports retrieval-augmented generation tasks, helping systems fetch and synthesize information effectively.
179
-
180
- 4. **Summarization Tasks:**
181
- - Excels in summarizing long or complex text while maintaining coherence and relevance.
182
-
183
- 5. **Instruction-Following Tasks:**
184
- - Can execute tasks based on specific user instructions due to instruction-tuning during training.
185
-
186
- 6. **Language Generation:**
187
- - Suitable for generating coherent and contextually relevant text in various domains and styles.
188
-
189
- # **Limitations:**
190
-
191
- 1. **Synthetic Dataset Bias:**
192
- - Optimization for QWQ and similar datasets may make the model less effective on real-world or less structured data.
193
-
194
- 2. **Data Dependency:**
195
- - Performance may degrade on tasks or languages not well-represented in the training dataset.
196
-
197
- 3. **Computational Requirements:**
198
- - The optimized transformer architecture may demand significant computational resources, especially for fine-tuning or large-scale deployments.
199
-
200
- 4. **Potential Hallucinations:**
201
- - Like most auto-regressive models, it may generate plausible-sounding but factually incorrect or nonsensical outputs.
202
-
203
- 5. **RLHF-Specific Biases:**
204
- - Reinforcement Learning with Human Feedback (RLHF) can introduce biases based on the preferences of the annotators involved in the feedback process.
205
-
206
- 6. **Limited Domain Adaptability:**
207
- - While effective in reasoning and dialogue tasks, it may struggle with highly specialized domains or out-of-distribution tasks.
208
-
209
- 7. **Multilingual Limitations:**
210
- - Although optimized for multilingual use, certain low-resource languages may exhibit poorer performance compared to high-resource ones.
211
-
212
- 8. **Ethical Concerns:**
213
- - May inadvertently generate inappropriate or harmful content if safeguards are not applied, particularly in sensitive applications.
214
-
215
- 9. **Real-Time Usability:**
216
- - Latency in inference time could limit its effectiveness in real-time applications or when scaling to large user bases.
217
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
218
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Bellatrix-1.5B-xElite-details)!
219
- Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FBellatrix-1.5B-xElite&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
220
-
221
- | Metric |Value (%)|
222
- |-------------------|--------:|
223
- |**Average** | 9.55|
224
- |IFEval (0-Shot) | 19.64|
225
- |BBH (3-Shot) | 9.49|
226
- |MATH Lvl 5 (4-Shot)| 12.61|
227
- |GPQA (0-shot) | 3.80|
228
- |MuSR (0-shot) | 4.44|
 
 
 
 
 
 
229
  |MMLU-PRO (5-shot) | 7.30|
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ library_name: transformers
18
+ base_model:
19
+ - Qwen/Qwen2.5-1.5B-Instruct
20
+ pipeline_tag: text-generation
21
+ tags:
22
+ - qwen
23
+ - qwq
24
+ model-index:
25
+ - name: Bellatrix-1.5B-xElite
26
+ results:
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: IFEval (0-Shot)
32
+ type: wis-k/instruction-following-eval
33
+ split: train
34
+ args:
35
+ num_few_shot: 0
36
+ metrics:
37
+ - type: inst_level_strict_acc and prompt_level_strict_acc
38
+ value: 19.64
39
+ name: averaged accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
42
+ name: Open LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: BBH (3-Shot)
48
+ type: SaylorTwift/bbh
49
+ split: test
50
+ args:
51
+ num_few_shot: 3
52
+ metrics:
53
+ - type: acc_norm
54
+ value: 9.49
55
+ name: normalized accuracy
56
+ source:
57
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
58
+ name: Open LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: MATH Lvl 5 (4-Shot)
64
+ type: lighteval/MATH-Hard
65
+ split: test
66
+ args:
67
+ num_few_shot: 4
68
+ metrics:
69
+ - type: exact_match
70
+ value: 12.61
71
+ name: exact match
72
+ source:
73
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
74
+ name: Open LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: GPQA (0-shot)
80
+ type: Idavidrein/gpqa
81
+ split: train
82
+ args:
83
+ num_few_shot: 0
84
+ metrics:
85
+ - type: acc_norm
86
+ value: 3.8
87
+ name: acc_norm
88
+ source:
89
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
90
+ name: Open LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: MuSR (0-shot)
96
+ type: TAUR-Lab/MuSR
97
+ args:
98
+ num_few_shot: 0
99
+ metrics:
100
+ - type: acc_norm
101
+ value: 4.44
102
+ name: acc_norm
103
+ source:
104
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
105
+ name: Open LLM Leaderboard
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: MMLU-PRO (5-shot)
111
+ type: TIGER-Lab/MMLU-Pro
112
+ config: main
113
+ split: test
114
+ args:
115
+ num_few_shot: 5
116
+ metrics:
117
+ - type: acc
118
+ value: 7.3
119
+ name: accuracy
120
+ source:
121
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FBellatrix-1.5B-xElite
122
+ name: Open LLM Leaderboard
123
+ ---
124
+ <pre align="center">
125
+ ____ ____ __ __ __ ____ ____ ____ _ _
126
+ ( _ \( ___)( ) ( ) /__\ (_ _)( _ \(_ _)( \/ )
127
+ ) _ < )__) )(__ )(__ /(__)\ )( ) / _)(_ ) (
128
+ (____/(____)(____)(____)(__)(__)(__) (_)\_)(____)(_/\_)
129
+ </pre>
130
+
131
+ # **Bellatrix-1.5B-xElite**
132
+
133
+ Bellatrix-1.5B-xElite is based on a reasoning-based model designed for the QWQ synthetic dataset entries. The pipeline's instruction-tuned, text-only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. These models outperform many of the available open-source options. Bellatrix is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions utilize supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).
134
+
135
+ # **Quickstart with Transformers**
136
+
137
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
138
+
139
+ ```python
140
+ from transformers import AutoModelForCausalLM, AutoTokenizer
141
+
142
+ model_name = "prithivMLmods/Bellatrix-1.5B-xElite"
143
+
144
+ model = AutoModelForCausalLM.from_pretrained(
145
+ model_name,
146
+ torch_dtype="auto",
147
+ device_map="auto"
148
+ )
149
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
150
+
151
+ prompt = "Give me a short introduction to large language model."
152
+ messages = [
153
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
154
+ {"role": "user", "content": prompt}
155
+ ]
156
+ text = tokenizer.apply_chat_template(
157
+ messages,
158
+ tokenize=False,
159
+ add_generation_prompt=True
160
+ )
161
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
162
+
163
+ generated_ids = model.generate(
164
+ **model_inputs,
165
+ max_new_tokens=512
166
+ )
167
+ generated_ids = [
168
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
169
+ ]
170
+
171
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
172
+ ```
173
+
174
+ # **Intended Use:**
175
+
176
+ 1. **Multilingual Dialogue Systems:**
177
+ - Designed for conversational AI applications, capable of handling dialogue across multiple languages.
178
+ - Useful in customer service, chatbots, and other dialogue-centric use cases.
179
+
180
+ 2. **Reasoning and QWQ Dataset Applications:**
181
+ - Optimized for tasks requiring logical reasoning and contextual understanding, particularly in synthetic datasets like QWQ.
182
+
183
+ 3. **Agentic Retrieval:**
184
+ - Supports retrieval-augmented generation tasks, helping systems fetch and synthesize information effectively.
185
+
186
+ 4. **Summarization Tasks:**
187
+ - Excels in summarizing long or complex text while maintaining coherence and relevance.
188
+
189
+ 5. **Instruction-Following Tasks:**
190
+ - Can execute tasks based on specific user instructions due to instruction-tuning during training.
191
+
192
+ 6. **Language Generation:**
193
+ - Suitable for generating coherent and contextually relevant text in various domains and styles.
194
+
195
+ # **Limitations:**
196
+
197
+ 1. **Synthetic Dataset Bias:**
198
+ - Optimization for QWQ and similar datasets may make the model less effective on real-world or less structured data.
199
+
200
+ 2. **Data Dependency:**
201
+ - Performance may degrade on tasks or languages not well-represented in the training dataset.
202
+
203
+ 3. **Computational Requirements:**
204
+ - The optimized transformer architecture may demand significant computational resources, especially for fine-tuning or large-scale deployments.
205
+
206
+ 4. **Potential Hallucinations:**
207
+ - Like most auto-regressive models, it may generate plausible-sounding but factually incorrect or nonsensical outputs.
208
+
209
+ 5. **RLHF-Specific Biases:**
210
+ - Reinforcement Learning with Human Feedback (RLHF) can introduce biases based on the preferences of the annotators involved in the feedback process.
211
+
212
+ 6. **Limited Domain Adaptability:**
213
+ - While effective in reasoning and dialogue tasks, it may struggle with highly specialized domains or out-of-distribution tasks.
214
+
215
+ 7. **Multilingual Limitations:**
216
+ - Although optimized for multilingual use, certain low-resource languages may exhibit poorer performance compared to high-resource ones.
217
+
218
+ 8. **Ethical Concerns:**
219
+ - May inadvertently generate inappropriate or harmful content if safeguards are not applied, particularly in sensitive applications.
220
+
221
+ 9. **Real-Time Usability:**
222
+ - Latency in inference time could limit its effectiveness in real-time applications or when scaling to large user bases.
223
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
224
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Bellatrix-1.5B-xElite-details)!
225
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FBellatrix-1.5B-xElite&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
226
+
227
+ | Metric |Value (%)|
228
+ |-------------------|--------:|
229
+ |**Average** | 9.55|
230
+ |IFEval (0-Shot) | 19.64|
231
+ |BBH (3-Shot) | 9.49|
232
+ |MATH Lvl 5 (4-Shot)| 12.61|
233
+ |GPQA (0-shot) | 3.80|
234
+ |MuSR (0-shot) | 4.44|
235
  |MMLU-PRO (5-shot) | 7.30|