Update README.md
Browse files
README.md
CHANGED
|
@@ -12,11 +12,11 @@ language:
|
|
| 12 |
|
| 13 |
## Model description
|
| 14 |
|
| 15 |
-
**Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions.
|
| 16 |
|
| 17 |
This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
- Model Type: Causal Language Model
|
| 22 |
- Number of Parameters: 96B
|
|
@@ -25,7 +25,14 @@ This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_s
|
|
| 25 |
- Context Length: 32k
|
| 26 |
- Supported Languages: Japanese and English
|
| 27 |
|
| 28 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## How to use
|
| 31 |
|
|
@@ -84,6 +91,10 @@ for output in outputs:
|
|
| 84 |
print(generated_text)
|
| 85 |
```
|
| 86 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
## License
|
| 88 |
|
| 89 |
[MIT](https://opensource.org/licenses/MIT)
|
|
|
|
| 12 |
|
| 13 |
## Model description
|
| 14 |
|
| 15 |
+
**Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. This version improves instruction-following ability and adds support for long-context (32k), compared to the previous version ([Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)).
|
| 16 |
|
| 17 |
This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
|
| 18 |
|
| 19 |
+
### Features
|
| 20 |
|
| 21 |
- Model Type: Causal Language Model
|
| 22 |
- Number of Parameters: 96B
|
|
|
|
| 25 |
- Context Length: 32k
|
| 26 |
- Supported Languages: Japanese and English
|
| 27 |
|
| 28 |
+
## Model performance
|
| 29 |
+
|
| 30 |
+
### Japanese MT-benth
|
| 31 |
+
| Model | Average | coding | extraction | humanities | math | reasoning | roleplay | stem |
|
| 32 |
+
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 33 |
+
| Stockmark-2-100B-Instruct | 7.87 | 7.07 | 8.35 | 8.73 | 7.57 | 5.45 | 8.65 | 8.33 | 8.83 |
|
| 34 |
+
| Stockmark-2-100B-Instruct-beta | 7.71 | 6.73 | 8.23 | 8.63 | 7.01 | 5.85 | 8.54 | 8.07 | 8.61 |
|
| 35 |
+
|
| 36 |
|
| 37 |
## How to use
|
| 38 |
|
|
|
|
| 91 |
print(generated_text)
|
| 92 |
```
|
| 93 |
|
| 94 |
+
## Libraries used for training
|
| 95 |
+
- Pretraining: [NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
|
| 96 |
+
- Posttraining: [huggingface/trl](https://github.com/huggingface/trl)
|
| 97 |
+
|
| 98 |
## License
|
| 99 |
|
| 100 |
[MIT](https://opensource.org/licenses/MIT)
|