stockmark
/

Stockmark-2-100B-Instruct

Text Generation

text-generation-inference

Model card Files Files and versions

omitakahiro commited on Aug 20, 2025

Commit

d89cd8d

·

verified ·

1 Parent(s): ca47897

Update README.md

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -12,10 +12,21 @@ language:
 ## Model description
-**Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions.
 This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
 ## How to use
 ### transformers

 ## Model description
+**Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. Compared to the previous version ([Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)), the instruction following ability is improved.
 This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
+## Features
+- Model Type: Causal Language Model
+- Number of Parameters: 96B
+- Number of Layers: 86
+- Number of Attention Heads (GQA): 72 for Q and 8 for KV
+- Context Length: 32k
+- Supported Languages: Japanese and English
+## Evaluation
 ## How to use
 ### transformers