omitakahiro commited on
Commit
06bab66
·
verified ·
1 Parent(s): 8528699

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -3
README.md CHANGED
@@ -12,11 +12,11 @@ language:
12
 
13
  ## Model description
14
 
15
- **Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. Compared to the previous version ([Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)), the instruction following ability is improved.
16
 
17
  This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
18
 
19
- ## Features
20
 
21
  - Model Type: Causal Language Model
22
  - Number of Parameters: 96B
@@ -25,7 +25,14 @@ This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_s
25
  - Context Length: 32k
26
  - Supported Languages: Japanese and English
27
 
28
- ## Evaluation
 
 
 
 
 
 
 
29
 
30
  ## How to use
31
 
@@ -84,6 +91,10 @@ for output in outputs:
84
  print(generated_text)
85
  ```
86
 
 
 
 
 
87
  ## License
88
 
89
  [MIT](https://opensource.org/licenses/MIT)
 
12
 
13
  ## Model description
14
 
15
+ **Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. This version improves instruction-following ability and adds support for long-context (32k), compared to the previous version ([Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)).
16
 
17
  This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
18
 
19
+ ### Features
20
 
21
  - Model Type: Causal Language Model
22
  - Number of Parameters: 96B
 
25
  - Context Length: 32k
26
  - Supported Languages: Japanese and English
27
 
28
+ ## Model performance
29
+
30
+ ### Japanese MT-benth
31
+ | Model | Average | coding | extraction | humanities | math | reasoning | roleplay | stem |
32
+ |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
33
+ | Stockmark-2-100B-Instruct | 7.87 | 7.07 | 8.35 | 8.73 | 7.57 | 5.45 | 8.65 | 8.33 | 8.83 |
34
+ | Stockmark-2-100B-Instruct-beta | 7.71 | 6.73 | 8.23 | 8.63 | 7.01 | 5.85 | 8.54 | 8.07 | 8.61 |
35
+
36
 
37
  ## How to use
38
 
 
91
  print(generated_text)
92
  ```
93
 
94
+ ## Libraries used for training
95
+ - Pretraining: [NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
96
+ - Posttraining: [huggingface/trl](https://github.com/huggingface/trl)
97
+
98
  ## License
99
 
100
  [MIT](https://opensource.org/licenses/MIT)