stockmark
/

Stockmark-2-100B-Instruct

Text Generation

text-generation-inference

Model card Files Files and versions

omitakahiro commited on Aug 20, 2025

Commit

06bab66

·

verified ·

1 Parent(s): 8528699

Update README.md

Files changed (1) hide show

README.md +14 -3

README.md CHANGED Viewed

@@ -12,11 +12,11 @@ language:
 ## Model description
-**Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. Compared to the previous version ([Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)), the instruction following ability is improved.
 This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
-## Features
 - Model Type: Causal Language Model
 - Number of Parameters: 96B
@@ -25,7 +25,14 @@ This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_s
 - Context Length: 32k
 - Supported Languages: Japanese and English
-## Evaluation
 ## How to use
@@ -84,6 +91,10 @@ for output in outputs:
     print(generated_text)
 ```
 ## License
 [MIT](https://opensource.org/licenses/MIT)

 ## Model description
+**Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. This version improves instruction-following ability and adds support for long-context (32k), compared to the previous version ([Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)).
 This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
+### Features
 - Model Type: Causal Language Model
 - Number of Parameters: 96B
 - Context Length: 32k
 - Supported Languages: Japanese and English
+## Model performance
+### Japanese MT-benth
+| Model | Average | coding | extraction | humanities | math | reasoning | roleplay | stem |
+|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| Stockmark-2-100B-Instruct | 7.87 | 7.07 | 8.35 | 8.73 | 7.57 | 5.45 | 8.65 | 8.33 | 8.83 |
+| Stockmark-2-100B-Instruct-beta | 7.71 | 6.73 | 8.23 | 8.63 | 7.01 | 5.85 | 8.54 | 8.07 | 8.61 |
 ## How to use
     print(generated_text)
 ```
+## Libraries used for training
+- Pretraining: [NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
+- Posttraining: [huggingface/trl](https://github.com/huggingface/trl)
 ## License
 [MIT](https://opensource.org/licenses/MIT)