omitakahiro commited on
Commit
d89cd8d
·
verified ·
1 Parent(s): ca47897

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -12,10 +12,21 @@ language:
12
 
13
  ## Model description
14
 
15
- **Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions.
16
 
17
  This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
18
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## How to use
20
 
21
  ### transformers
 
12
 
13
  ## Model description
14
 
15
+ **Stockmark-2-100B-Instruct** is a 100-billion-parameter large language model built from scratch, with a particular focus on Japanese. It was pre-trained on approximately 2.0 trillion tokens of data, consisting of 60% English, 30% Japanese, and 10% code. Following pretraining, the model underwent post-training (SFT and DPO) with synthetic data in Japanese to enhance its ability to follow instructions. Compared to the previous version ([Stockmark-2-100B-Instruct-beta](https://huggingface.co/stockmark/Stockmark-2-100B-Instruct-beta)), the instruction following ability is improved.
16
 
17
  This project was supported by [GENIAC](https://www.meti.go.jp/policy/mono_info_service/geniac/index.html).
18
 
19
+ ## Features
20
+
21
+ - Model Type: Causal Language Model
22
+ - Number of Parameters: 96B
23
+ - Number of Layers: 86
24
+ - Number of Attention Heads (GQA): 72 for Q and 8 for KV
25
+ - Context Length: 32k
26
+ - Supported Languages: Japanese and English
27
+
28
+ ## Evaluation
29
+
30
  ## How to use
31
 
32
  ### transformers