comoZ commited on
Commit
391573b
·
verified ·
1 Parent(s): 8432c62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +273 -149
README.md CHANGED
@@ -1,199 +1,323 @@
1
  ---
2
- library_name: transformers
3
- tags: []
4
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
 
 
11
 
12
- ## Model Details
13
 
14
- ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
 
36
- ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
- ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
45
 
46
- ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
 
50
- [More Information Needed]
51
 
52
- ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
- [More Information Needed]
 
 
 
 
57
 
58
- ## Bias, Risks, and Limitations
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
- [More Information Needed]
63
 
64
- ### Recommendations
 
 
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
 
 
 
 
69
 
70
- ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
- ## Training Details
77
 
78
- ### Training Data
 
 
 
 
 
 
 
 
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
81
 
82
- [More Information Needed]
83
 
84
- ### Training Procedure
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
 
88
- #### Preprocessing [optional]
89
 
90
- [More Information Needed]
 
 
 
91
 
 
92
 
93
- #### Training Hyperparameters
 
 
 
 
 
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 
 
 
 
 
 
 
 
 
96
 
97
- #### Speeds, Sizes, Times [optional]
 
 
 
 
 
 
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
 
 
 
 
 
100
 
101
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
  ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  ---
3
+ license: other
4
+ license_name: trillion
5
+ license_link: LICENSE
6
+ tags:
7
+ - finetuned
8
+ - chat
9
+ language:
10
+ - en
11
+ - ko
12
+ - ja
13
+ pipeline_tag: text-generation
14
+ library_name: transformers
15
+ extra_gated_prompt: >-
16
+ **TRILLION LABS AI MODEL LICENSE AGREEMENT**
17
+ Tri- Model Series Version Effective Date: February 1, 2025
18
 
19
+ "**Agreement**" means the terms and conditions for use, reproduction, distribution and modification of the Trillion Labs AI Model series set forth herein.
 
 
 
20
 
21
+ "**Documentation**" means the specifications, manuals and documentation accompanying the Tri- Model series distributed by Trillion Labs.
22
 
23
+ "**Licensee**" or "**you**" means you, or your employer or any other person or entity (if you are entering into this Agreement on such person or entity's behalf), of the age required under applicable laws, rules or regulations to provide legal consent and that has legal authority to bind your employer or such other person or entity if you are entering in this Agreement on their behalf.
24
 
25
+ "**Model**" means the artificial intelligence model series provided by Licensor ("Tri-" series), including software, algorithms, machine learning models, and related components provided by Licensor, including all updates, enhancements, improvements, bug fixes, patches, or other modifications.
26
 
27
+ "**Trillion Labs**" or "**we**" means Trillion Labs, the owner, developer, and provider of the Model, holding all rights, title, and interest in the Model.
28
 
29
+ By clicking "I Accept" below or by using or distributing any portion or element of the Model, you agree to be bound by this Agreement.
30
 
31
+ 1\. **License Grant and Redistribution**.
 
 
 
 
 
 
32
 
33
+ a. Grant of Rights. You are granted a limited, non-exclusive, non-transferable, worldwide, revocable license under Trillion Labs' intellectual property or other rights to use, reproduce, distribute, and make modifications to the Model for research purposes.
34
 
35
+ b. Redistribution and Use.
36
 
37
+ i. If you distribute or make available the Model (or any derivative works thereof), or a product or service that contains any of them, you shall (A) provide a copy of this Agreement with any such Model; and (B) prominently display "Built with Tri-" on a related website, user interface, blogpost, about page, or product documentation. If you use the Model to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include "Tri-" followed by the original Model version at the beginning of any such AI model name.
 
 
38
 
39
+ ii. You must retain in all copies of the Model that you distribute the following attribution notice within a "Notice" text file distributed as a part of such copies: "Tri- Model Series is licensed under the Trillion Labs AI Model License Agreement, Copyright © Trillion Labs. All Rights Reserved."
40
 
41
+ iii. Your use of the Model must comply with applicable laws and regulations (including trade compliance laws and regulations).
42
 
43
+ 2\. **Additional Commercial Terms**. If the monthly active users of the products or services made available by or for Licensee, or Licensee's affiliates, is greater than 1 million monthly active users OR Annual Recurring Revenue is greater than $10 million USD, you must request a commercial license from Trillion Labs, and you are not authorized to exercise any commercial rights under this Agreement unless or until Trillion Labs otherwise expressly grants you such rights.
44
 
45
+ 3\. **Disclaimer of Warranty**. THE MODEL, DERIVATIVES, AND OUTPUT ARE PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND, AND TRILLION LABS DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
46
 
47
+ 4\. **Limitation of Liability**. IN NO EVENT WILL TRILLION LABS BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES.
48
 
49
+ 5\. **Intellectual Property**.
50
 
51
+ a. No trademark licenses are granted under this Agreement, and in connection with the Model, neither Trillion Labs nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates, except as required for reasonable and customary use in describing and redistributing the Model or as set forth in this Section 5(a).
52
 
53
+ b. All rights, title, and interest in the Model, including modifications, Derivatives, and documentation, remain exclusively with Trillion Labs.
54
 
55
+ 6\. **Term and Termination**. The term of this Agreement will commence upon your acceptance of this Agreement or access to the Model and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Trillion Labs may terminate this Agreement if you are in breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete and cease use of the Model. Sections 3, 4 and 5 shall survive the termination of this Agreement.
56
 
57
+ 7\. **Governing Law and Jurisdiction**. This Agreement will be governed and construed under the laws of the State of California without regard to choice of law principles. The courts of California shall have exclusive jurisdiction of any dispute arising out of this Agreement.
58
+ extra_gated_fields:
59
+ First Name: text
60
+ Last Name: text
61
+ Affiliation: text
62
+ Job title:
63
+ type: select
64
+ options:
65
+ - Student
66
+ - Research Graduate
67
+ - AI researcher
68
+ - AI developer/engineer
69
+ - Reporter
70
+ - Other
71
+ geo: ip_location
72
+ By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Trillion Labs Privacy Policy: checkbox
73
+ extra_gated_description: >-
74
+ The information you provide will be collected, stored, processed and shared in
75
+ accordance with the Trillion Labs Privacy Policy.
76
+ extra_gated_button_content: Submit
77
+ extra_gated_heading: "Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate."
78
+ ---
79
 
80
+ <p align="center">
81
+ <picture>
82
+ <img src="https://raw.githubusercontent.com/trillion-labs/.github/main/Tri-21B.png" alt="Tri-21B", style="width: 80%;">
83
+ </picture>
84
+ </p>
85
 
86
+ ## Introduction
87
 
88
+ **Tri-21B**를 4bit으로 양자화한 모델
89
 
90
+ We introduce **Tri-21B**, our flagship large language model that redefines the efficiency frontier in LLM training. By achieving state-of-the-art performance with only 2.3T training tokens, we demonstrate that exceptional capabilities don't require excessive computational resources.
91
 
92
+ <p align="center">
93
+ <img src="https://raw.githubusercontent.com/trillion-labs/.github/main/pareto-2507.png" alt="Average Performance vs. Approximate Training FLOPs" style="width: 100%; max-width: 1400px;">
94
+ </p>
95
 
 
96
 
97
+ ### Key Highlights
98
+ * **Unprecedented Training Efficiency**: Trained on just 2.3T tokens—significantly less than comparable models—while achieving 70.3% average accuracy across MMLU/KMMLU/Global MMLU benchmarks
99
+ * **Pushing the Pareto Frontier**: With only 2.95E+23 FLOPs, Tri-21B outperforms models requiring 2-10x more compute, setting a new standard for efficient scaling
100
+ * **Enhanced Reasoning**: Modified training dataset mixture specifically optimized for reasoning capabilities
101
+ * **Advanced Post-Training**: Significantly improved RL training pipeline focusing on mathematical reasoning and everyday usage
102
+ * **Multi-lingual**: Specially optimized for Korean, English, and Japanese.
103
 
104
+ Our **Tri-21B** represents a paradigm shift in efficient model development. When comparing performance to training FLOPs, our model dramatically pushes the Pareto frontier—achieving performance comparable to or exceeding models like Qwen2.5-32B (74.6% at 3.46E+24 FLOPs) and Gemma 3 IT 27B (67.6% at 2.27E+24 FLOPs) while using approximately 8-12x fewer computational resources.
105
 
 
106
 
107
+ ### Model Specifications
108
+
109
+ #### Tri-21B
110
+ - Type: Causal Language Model
111
+ - Training Stage: Pre-training & Post-training
112
+ - Architecture: Transformer Decoder with RoPE, SwiGLU, RMSNorm, and GQA
113
+ - Number of Parameters: 20.73B
114
+ - Number of Layers: 32
115
+ - Number of Attention Heads: 32 (Query) / 8 (Key, Value)
116
+ - Context Length: 8,192
117
+ - Number of Tokens Seen: 2.3T
118
+ - Vocab Size: 124,416
119
+
120
+ ## Training Efficiency Analysis
121
 
122
+ Our approach to training efficiency sets new benchmarks in the field. The following comparison demonstrates how Tri-21B achieves superior performance per FLOP compared to other state-of-the-art models of similar scale:
123
 
124
+ | Model | FLOPs | Avg. Accuracy¹ | Efficiency Ratio² |
125
+ |:------|:------|:--------------|:-----------------|
126
+ | **Tri-21B** | **2.95E+23** | **70.3%** | **1.00x (baseline)** |
127
+ | Gemma2-9b | 4.42E+23 | 61.5% | 0.48x |
128
+ | Qwen2.5-7B | 8.22E+23 | 63.4% | 0.29x |
129
+ | Exaone-3.5-32B | 1.25E+24 | 58.5% | 0.19x |
130
+ | Gemma 3 IT 27B | 2.27E+24 | 67.6% | 0.11x |
131
+ | Qwen2.5-32B | 3.46E+24 | 74.6% | 0.10x |
132
+ | Qwen3-32B | 5.77E+24 | 73.5% | 0.06x |
133
 
134
+ ¹ Average of MMLU / KMMLU / Global MMLU (ja)
135
+ ² Performance per FLOP relative to Tri-21B
136
 
137
+ This efficiency breakthrough enables organizations to deploy state-of-the-art language models without the traditional computational barriers, democratizing access to advanced AI capabilities.
138
 
 
139
 
140
+ ## Quickstart
141
 
142
+ Here is a code snippet with `apply_chat_template` that demonstrates how to load the tokenizer and model and generate text.
143
 
144
+ ### Tri-21B Usage
145
+ ```python
146
+ import torch
147
+ from transformers import AutoModelForCausalLM, AutoTokenizer
148
 
149
+ model_name = "trillionlabs/Tri-21B"
150
 
151
+ model = AutoModelForCausalLM.from_pretrained(
152
+ model_name,
153
+ torch_dtype=torch.bfloat16,
154
+ device_map="auto"
155
+ )
156
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
157
 
158
+ prompt = "Explain the concept of quantum computing in simple terms."
159
+ messages = [
160
+ {"role": "user", "content": prompt}
161
+ ]
162
+ text = tokenizer.apply_chat_template(
163
+ messages,
164
+ tokenize=False,
165
+ add_generation_prompt=True
166
+ )
167
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
168
 
169
+ generated_ids = model.generate(
170
+ **model_inputs,
171
+ max_new_tokens=512
172
+ )
173
+ generated_ids = [
174
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
175
+ ]
176
 
177
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
178
+ print(response)
179
+ ```
180
+
181
+ ### vLLM, SGLang Deployment
182
+
183
+ Tri-21B is also available with [vLLM](https://docs.vllm.ai/en/latest/) and [SGLang](https://docs.sglang.ai/)!
184
 
185
+ ```bash
186
+ # vLLM
187
+ vllm serve trillionlabs/Tri-21B --dtype bfloat16 --max-model-len 8192
188
+
189
+ # vLLM with custom options
190
+ vllm serve trillionlabs/Tri-21B \
191
+ --dtype bfloat16 \
192
+ --max-model-len 8192 \
193
+ --gpu-memory-utilization 0.95 \
194
+ --port 8000
195
+ ````
196
+
197
+ ```bash
198
+ # SGLang
199
+ python3 -m sglang.launch_server --model-path trillionlabs/Tri-21B --dtype bfloat16
200
+
201
+ # SGLang with custom options
202
+ python3 -m sglang.launch_server \
203
+ --model-path trillionlabs/Tri-21B \
204
+ --dtype bfloat16 \
205
+ --context-length 8192 \
206
+ --port 30000 \
207
+ --host 0.0.0.0
208
+ ```
209
 
210
  ## Evaluation
211
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
212
 
213
+ We evaluated Tri-21B across a comprehensive suite of benchmarks assessing general reasoning, knowledge recall, coding abilities, mathematical reasoning, and instruction-following capabilities. We compare our model against state-of-the-art models of similar scale: Gemmma-3-IT-27B and Qwen3-32B to demonstrate its competitive performance.
214
+
215
+ <details>
216
+ <summary> Full evaluation settings </summary>
217
+ # Benchmark Evaluation Settings
218
+
219
+ | Benchmark | Language | Evaluation Setting | Metric |
220
+ |:----------|:---------|:------------------|:-------|
221
+ | **General Reasoning and Factuality** | | | |
222
+ | • HellaSwag | English | 0-shot | accuracy |
223
+ | • ARC:C | English | 0-shot | accuracy |
224
+ | • HAERAE | Korean | 3-shot | accuracy |
225
+ | • CLIcK | Korean | 0-shot | accuracy |
226
+ | • KoBEST | Korean | 5-shot | accuracy |
227
+ | **Knowledge and Reasoning** | | | |
228
+ | • KMMLU | Korean | 5-shot (0-shot, CoT) | accuracy (exact-match) |
229
+ | • MMLU | English | 5-shot (0-shot, CoT) | accuracy (exact-match) |
230
+ | • MMLU-Pro | English | 0-shot, CoT | exact-match |
231
+ | • Global-MMLU-Lite-ja | Japaneses | 5-shot | accuracy |
232
+ | **Coding** | | | |
233
+ | • HumanEval | English | 0-shot | pass@1 |
234
+ | • MBPPPlus | English | 0-shot | pass@1 |
235
+ | **Mathematical Reasoning** | | | |
236
+ | • GSM8k | English | 0-shot, CoT | exact-match |
237
+ | • MATH | English | 0-shot, CoT | exact-match |
238
+ | • GPQA | English | 4-shot | accuracy |
239
+ | • GPQA Diamond | English | 0-shot, CoT | accuracy |
240
+ | • HRM8k | Korean | 0-shot, CoT | exact-match |
241
+ | **Instruction Following and Chat** | | | |
242
+ | • IFEval | English | 0-shot | strict-average |
243
+ | • koIFEval | Korean | 0-shot | strict-average |
244
+ | • MT-Bench | English | LLM-as-a-judge (gpt-4o) | LLM score |
245
+ | • KO-MT-Bench | Korean | LLM-as-a-judge (gpt-4o) | LLM score |
246
+ | • systemIFEval | English | 0-shot | strict-average |
247
+
248
+ - *Note that koIFEval, systemIFEval, and KoRuler are our in-house evaluation benchmarks adapted for Korean to better assess model capabilities in Korean language tasks.
249
+ - **Note that MT-Bench, KO-MT-Bench, and LogicKor use a 10-point scale.
250
+
251
+ </details>
252
+
253
+ ### Benchmark Results
254
+
255
+ Models compared:
256
+
257
+ - **Tri-21B**: Our flagship 21B parameter model
258
+ - **Qwen3-32B**: Qwen's 32B parameter model
259
+ - **Gemma3-IT-27B**: Google's Gemma 3 instruction-tuned 27B model
260
+
261
+
262
+ ### General Reasoning and Factuality
263
+
264
+ | Benchmark | Tri-21B | Qwen3-32B | Gemma3-IT-27B |
265
+ | --- | --- | --- | --- |
266
+ | HAERAE | 86.16 | 71.67 | 78.09 |
267
+ | KoBEST | 85.92 | 83.39 | 87.66 |
268
+ | CLIcK | 72.32 | 66.89 | 67.54 |
269
+ | KMMLU | 61.89 (69.90) | 61.73 (67.55)| 55.03 (60.61)|
270
+ | MMLU | 77.62 (85.02) | 81.86 (84.46) | 77.42 (84.09) |
271
+ | MMLU-Pro | 64.74 | 70.53 | 64.26 |
272
+ | Global-MMLU-Lite-ja | 70.25 | 77.00 | 72.00 |
273
+
274
+ ### Coding
275
+
276
+ | Benchmark | Tri-21B | Qwen3-32B | Gemma3-IT-27B |
277
+ | --- | --- | --- | --- |
278
+ | HumanEval | 75.61 | 74.39 | 87.80 |
279
+ | MBPPPlus | 73.02 | 74.40 | 84.92 |
280
+
281
+ ### Mathematical Reasoning
282
+
283
+ | Benchmark | Tri-21B | Qwen3-32B | Gemma3-IT-27B |
284
+ | --- | --- | --- | --- |
285
+ | GSM8k | 87.95 | 86.66 | 90.52 |
286
+ | MATH | 77.60 | 81.40 | 85.00 |
287
+ | GPQA | 39.73 | 41.07 | 37.95 |
288
+ | GPQA-Diamond | 44.95 | 54.04 | 44.44 |
289
+ | HRM8k | 56.70 | 66.24 | 63.90 |
290
+
291
+ ### Instruction Following and Chat
292
+
293
+ | Benchmark | Tri-21B | Qwen3-32B | Gemma3-IT-27B |
294
+ | --- | --- | --- | --- |
295
+ | IFEval | 80.75 | 86.08 | 80.78 |
296
+ | koIFEval | 66.51 | 62.93 | 69.24 |
297
+ | MT-Bench | 8.21 | 8.52 | 8.53 |
298
+ | KO-MT-Bench | 7.79 | 8.47 | 8.46 |
299
+ | systemIFEval | 77.40 | 77.92 | 77.94 |
300
+
301
+ ### Base Model Evaluation
302
+
303
+ The following table shows the performance of Tri-21B base model (before instruction tuning) on key benchmarks:
304
+
305
+ | Benchmark | Tri-21B Base |
306
+ | --- | --- |
307
+ | MMLU | 76.99 |
308
+ | KMMLU | 62.37 |
309
+ | KoBEST | 85.07 |
310
+ | BBH | 77.19 |
311
+ | GSM8K | 70.36 |
312
+ | MBPPPlus | 75.40 |
313
+
314
+ ## Limitations
315
+
316
+ - Language Support: The models are optimized for English, Korean, and Japanese. Usage with other languages may result in degraded performance.
317
+ - Knowledge Cutoff: The model's information is limited to data available up to Febuary, 2025.
318
+
319
+ ## License
320
+ This model repository is licensed under the Trillion License.
321
+
322
+ ## Contact
323
+ For inquiries, please contact: info@trillionlabs.co